Conda (CPU)
===========

.. toctree::
   :maxdepth: 1
   :caption: Contents:

.. contents:: Table of Contents
   :depth: 3


.. role:: underline
    :class: underline

.. warning::

   ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs.

   NEVER run ``conda create``, ``conda install``, ``conda update``, or any other conda commands directly on login nodes.

   Login nodes are shared resources. Running conda-related I/O intensive operations on login nodes:

   - Violates resource accounting policies
   - Degrades performance for all users
   - Can result in account restrictions or termination

Overview
--------

This documentation describes how to install packages using conda in the project environment in a way that matches all requirements for versions and dependencies of the provisioned project. Conda environments on Discoverer are created using the ``--prefix`` option to specify a custom location (typically in your project directory). This approach avoids using ``conda activate`` and instead relies on environment variable configuration to ensure proper version and dependency management.

.. important::

   ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs.

.. warning::

   Do NOT use ``conda init`` or ``conda activate``.

   Avoid using ``conda init`` and ``conda activate`` for the following reasons:

   - Interference with other virtual environments: ``conda activate`` can interfere with other virtual environment managers (e.g., ``venv``, ``virtualenv``, ``pipenv``) and cause conflicts
   - Pollution of ``.bashrc``: ``conda init`` modifies your ``.bashrc`` file, which can cause issues in shared environments and HPC systems
   - Less explicit control: The activation mechanism is less transparent than explicit environment variable exports
   - Potential conflicts: Automatic conda initialization can interfere with system modules and other environment setups

   Instead, use explicit environment variable exports as shown in this documentation:

   .. code-block:: bash

      export PATH=/valhalla/projects/<your_project_name>/venv-1/bin:${PATH}
      export LD_LIBRARY_PATH=/valhalla/projects/<your_project_name>/venv-1/lib:${LD_LIBRARY_PATH}
      export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

   This approach is:

   - Clear and explicit: You can see exactly which environment is being used
   - Non-intrusive: Does not modify system configuration files
   - Compatible: Works well with other virtual environment tools
   - HPC-friendly: Ideal for shared computing environments

.. tip::

   Important: Do NOT install Miniconda or Anaconda in your home directory.

   Installing Miniconda or Anaconda in your home folder consumes significant disk space unnecessarily. The system provides conda through the ``anaconda3`` environment module, which is already available and can be used to install any package that is installable with conda.

   Always use the system-provided conda by loading the module:

   .. code-block:: bash

      module load anaconda3

   This module is automatically loaded in all SLURM batch scripts shown in this documentation. There is no need to install your own conda distribution.

Conda Configuration: Fixing "No space left on device" Error
------------------------------------------------------------

.. warning::

   **Important:** Before creating conda environments, ensure your conda configuration uses directories with sufficient disk space. By default, conda may try to use ``/tmp`` which has limited space (typically 320M and 95% full), causing "No space left on device" errors even when your target directory has plenty of space.

Problem
~~~~~~~

Conda fails with ``NoSpaceLeftError: No space left on devices`` even when the target directory has plenty of space. This happens because conda's configuration file (``~/.condarc``) has ``pkgs_dirs`` and ``envs_dirs`` pointing to ``/tmp``, which is on a different filesystem with limited space.

Diagnosis
~~~~~~~~~

Check filesystem space usage:

.. code-block:: bash

   # Check space on /tmp and your target directory
   df -h /tmp /valhalla/projects/<your_project_name>/

   # Check which filesystem /tmp is on
   df -h /tmp

**Expected output shows:**
- ``/tmp`` is on ``/dev/mapper/live-rw`` with only 320M free (95% full)
- ``/valhalla`` has 50T available

Check conda's configuration:

.. code-block:: bash

   module load anaconda3
   conda info

Look for:
- ``package cache : /tmp/...`` - This is where conda stores downloaded packages
- ``envs directories : /tmp/...`` - This is where conda stores environments

Check your conda config file:

.. code-block:: bash

   module load anaconda3
   cat ~/.condarc

**The problem:** Your ``~/.condarc`` likely contains:

.. code-block:: yaml

   pkgs_dirs:
     - /tmp/moose.ntCRtFtw/_env/.pkgs
   envs_dirs:
     - /tmp/moose.ntCRtFtw/_env/.envs

These directories are on ``/tmp`` which is 95% full. Conda needs space in ``pkgs_dirs`` to:
- Download packages
- Extract packages
- Build packages from source
- Cache package metadata

These operations can require significant space (often several GB for Python 3.12 with dependencies).

Solution
~~~~~~~~

**Option 1: Update conda configuration (Recommended - Permanent Fix)**

Modify your ``~/.condarc`` file to use ``/valhalla`` for package cache and environments:

.. code-block:: bash

   module load anaconda3

   # Create directories on /valhalla
   mkdir -p /valhalla/projects/<your_project_name>/conda/pkgs
   mkdir -p /valhalla/projects/<your_project_name>/conda/envs

   # Update conda configuration (use --add for list parameters)
   conda config --add pkgs_dirs /valhalla/projects/<your_project_name>/conda/pkgs
   conda config --add envs_dirs /valhalla/projects/<your_project_name>/conda/envs

   # Verify the change
   conda info | grep -E "(package cache|envs directories)"

**Note:** Use ``--add`` (not ``--set``) because ``pkgs_dirs`` and ``envs_dirs`` are list parameters. The ``--add`` command adds the directory to the beginning of the list, and conda uses the first writable location.

**Option 2: Edit ~/.condarc manually**

Alternatively, you can edit ``~/.condarc`` directly:

.. code-block:: bash

   # Backup your current config
   cp ~/.condarc ~/.condarc.backup

   # Edit the file
   nano ~/.condarc  # or use your preferred editor

Change:

.. code-block:: yaml

   pkgs_dirs:
     - /tmp/moose.ntCRtFtw/_env/.pkgs
   envs_dirs:
     - /tmp/moose.ntCRtFtw/_env/.envs

To:

.. code-block:: yaml

   pkgs_dirs:
     - /valhalla/projects/<your_project_name>/conda/pkgs
   envs_dirs:
     - /valhalla/projects/<your_project_name>/conda/envs

**Option 3: Use environment variable for single command**

For a one-time fix without changing your config:

.. code-block:: bash

   module load anaconda3

   # Create temp directory on /valhalla
   mkdir -p /valhalla/projects/<your_project_name>/conda/pkgs

   # Override pkgs_dirs for this command only
   CONDA_PKGS_DIRS=/valhalla/projects/<your_project_name>/conda/pkgs conda create --prefix /valhalla/projects/<your_project_name>/venv/llvmlite python3.12 -y

**Option 4: Set TMPDIR (for additional temp operations)**

Even after fixing ``pkgs_dirs``, you may also want to set ``TMPDIR`` for other temporary operations:

.. code-block:: bash

   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

Verification
~~~~~~~~~~~~

After updating conda configuration, verify it's using the correct directories:

.. code-block:: bash

   module load anaconda3
   conda info | grep -E "(package cache|envs directories)"

You should see:

.. code-block:: text

   package cache : /valhalla/projects/<your_project_name>/conda/pkgs
   envs directories : /valhalla/projects/<your_project_name>/conda/envs
                     /opt/software/anaconda3/envs
                     /home/<username>/.conda/envs

Also verify your config file:

.. code-block:: bash

   cat ~/.condarc | grep -A 2 -E "(pkgs_dirs|envs_dirs)"

Initial setup
-------------

.. warning::

   ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs.

The initial setup process consists of creating a SLURM batch script file (on the login node, using a text editor) and then submitting it to run on compute nodes. All conda operations execute on compute nodes.

Step 1: Create the SLURM installation script file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**On the login node:** Create a text file named ``install_conda_env.sh`` using your preferred text editor (``nano``, ``vim``, ``emacs``, etc.). This is the ONLY step that involves working on the login node - you are just creating a text file.

The file should contain the following SLURM batch script (change the package list to the one that fits your goals):

.. code-block:: bash

   #!/bin/bash
   #SBATCH --job-name=install_conda_env
   #SBATCH --output=install_env_%j.out
   #SBATCH --error=install_env_%j.err
   #SBATCH --time=02:00:00
   #SBATCH --ntasks-per-node=2
   #SBATCH --cpus-per-task=1
   #SBATCH --nodes=1
   #SBATCH --partition=cn
   #SBATCH --account=your_project
   #SBATCH --mem=16G

   # Load Anaconda module
   module load anaconda3

   # Set TMPDIR to project directory to avoid using /tmp
   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

   # Set virtual environment path (replace with your project name)
   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

   # Create conda environment
   echo "Creating conda environment at ${VIRTUAL_ENV}"
   conda create --prefix ${VIRTUAL_ENV} python=3.12.2 -y

   # Set up environment variables
   export PATH=${VIRTUAL_ENV}/bin:${PATH}
   export LD_LIBRARY_PATH=${VIRTUAL_ENV}/lib:${LD_LIBRARY_PATH}

   # Install packages from conda-forge
   echo "Installing packages from conda-forge"
   conda install --prefix ${VIRTUAL_ENV} -c conda-forge scikit-learn scipy numpy matplotlib mdanalysis pandas -y

   # Verify installation
   echo "Verifying installation..."
   which python
   python --version
   conda list --prefix ${VIRTUAL_ENV}

   echo "Installation complete!"

**Configuration parameters:**
- Replace ``<your_project_name>`` with your actual project name
- Adjust ``--time`` based on the number of packages (installation can take 30 minutes to 2 hours)
- Increase ``--mem`` if installing large packages (16G is usually sufficient)
- Modify the package list as needed
- The ``-y`` flag automatically confirms the installation
- Specify the Python version that meets your requirements

Alternative: Interactive installation using ``srun``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Instead of submitting a batch job, you can run the installation interactively on a compute node using ``srun``. This is useful if you want to see the output in real-time or interact with the installation process.

**On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node:

.. code-block:: bash

   srun --job-name=install_conda_env \
        --time=02:00:00 \
        --ntasks-per-node=2 \
        --cpus-per-task=1 \
        --nodes=1 \
        --partition=cn \
        --account=your_project \
        --mem=16G \
        --pty bash

Once the interactive session starts, execute the installation commands:

.. code-block:: bash

   # Load Anaconda module
   module load anaconda3

   # Set TMPDIR to project directory to avoid using /tmp
   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

   # Set virtual environment path (replace with your project name)
   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

   # Create conda environment
   echo "Creating conda environment at ${VIRTUAL_ENV}"
   conda create --prefix ${VIRTUAL_ENV} python=3.12.2 -y

   # Set up environment variables
   export PATH=${VIRTUAL_ENV}/bin:${PATH}
   export LD_LIBRARY_PATH=${VIRTUAL_ENV}/lib:${LD_LIBRARY_PATH}

   # Install packages from conda-forge
   echo "Installing packages from conda-forge"
   conda install --prefix ${VIRTUAL_ENV} -c conda-forge scikit-learn scipy numpy matplotlib mdanalysis pandas -y

   # Verify installation
   echo "Verifying installation..."
   which python
   python --version
   conda list --prefix ${VIRTUAL_ENV}

   echo "Installation complete!"

**Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session.

Step 2: Submit the SLURM job
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**On the login node:** Submit the batch script to the SLURM scheduler. The job will execute on compute nodes where all conda operations will run:

.. code-block:: bash

   sbatch install_conda_env.sh

After submission, SLURM will display a job ID (e.g., ``Submitted batch job 12345``). This job ID is used in the output and error filenames.

**On the login node:** Verify that your job is queued (this is a read-only operation, safe on login node):

.. code-block:: bash

   squeue --me

This command shows all jobs submitted by your user account. You can monitor the job status and wait for it to complete.

Step 3: Check the results
~~~~~~~~~~~~~~~~~~~~~~~~~~

**On the login node:** After the job completes, check the output and error files:

.. code-block:: bash

   # View the output file (replace 12345 with your job ID)
   cat install_env_12345.out

   # Check for errors
   cat install_env_12345.err

If the installation was successful, you should see messages indicating that packages were installed and the environment was created.

When to create a new environment vs. reuse existing
---------------------------------------------------

**Scenario 1: Adding compatible packages**
- **Action**: Reuse existing environment
- **Reason**: New packages (e.g., ``pandas``, ``matplotlib``) are compatible with existing packages

**Scenario 2: Need Python 3.11 instead of 3.12**
- **Action**: Create new environment (``venv-python311``)
- **Reason**: Python version requirement differs

**Scenario 3: Package conflict detected**
- **Action**: Create new environment for the conflicting package set
- **Reason**: Conda reports dependency conflicts that cannot be resolved

**Scenario 4: Starting a new project**
- **Action**: Evaluate if existing environment meets requirements

  - If yes: Reuse
  - If no or conflicts: Create new environment with project-specific name

Using the virtual environment
-----------------------------

Step 1: Set the environment variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before running your Python scripts, configure the environment variables:

.. code-block:: bash

   export PATH=/valhalla/projects/<your_project_name>/venv-1/bin:${PATH}
   export LD_LIBRARY_PATH=/valhalla/projects/<your_project_name>/venv-1/lib:${LD_LIBRARY_PATH}
   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

Step 2: Run your Python script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Execute your Python script normally:

.. code-block:: bash

   python script.py

All packages installed in the conda environment will be available to your script.

SLURM batch script example
--------------------------

Here's how to set up the environment in a SLURM batch script:

.. code-block:: bash

   #!/bin/bash
   #SBATCH --job-name=my_job
   #SBATCH --output=job_%j.out
   #SBATCH --error=job_%j.err
   #SBATCH --time=01:00:00
   #SBATCH --ntasks-per-node=1
   #SBATCH --cpus-per-task=4
   #SBATCH --nodes=1
   #SBATCH --partition=cn
   #SBATCH --account=your_project
   #SBATCH --mem=8G

   # Load Anaconda module
   module load anaconda3

   # Set TMPDIR to project directory to avoid using /tmp
   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

   # Set up conda environment
   export PATH=/valhalla/projects/<your_project_name>/venv-1/bin:${PATH}
   export LD_LIBRARY_PATH=/valhalla/projects/<your_project_name>/venv-1/lib:${LD_LIBRARY_PATH}
   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

   # Run your Python script
   python script.py

Implementation details
----------------------

1. **No** ``conda activate`` **or** ``conda init``: This method uses explicit environment variable exports instead of ``conda activate`` or ``conda init``. This avoids interference with other virtual environments, prevents pollution of ``.bashrc``, and provides clearer control over the environment setup.
2. **Custom paths**: Environments are created in project directories using ``--prefix``
3. **Persistent setup**: Environment variables must be set each time you want to use the environment
4. **SLURM compatibility**: This approach works well in SLURM batch scripts
5. **Login node restrictions**: All conda create, install, and update operations must be performed through SLURM batch jobs on compute nodes, not on login nodes
6. **Disk space management**: Ensure conda is configured to use ``/valhalla`` for package cache and environments to avoid "No space left on device" errors

Troubleshooting
---------------

Verify environment setup
~~~~~~~~~~~~~~~~~~~~~~~~

**On the login node:** Lightweight, read-only verification commands can be run on login nodes after setting environment variables (these do not execute conda commands):

.. code-block:: bash

   # Set environment variables (read-only operation, safe on login node)
   export PATH=/valhalla/projects/<your_project_name>/venv-1/bin:${PATH}
   export LD_LIBRARY_PATH=/valhalla/projects/<your_project_name>/venv-1/lib:${LD_LIBRARY_PATH}
   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1

   # Verify Python path (read-only, safe on login node)
   which python
   # Should show: /valhalla/projects/<your_project_name>/venv-1/bin/python

   python -c "import sys; print(sys.executable)"
   # Should show the path to your environment's Python

**Note:** Any verification that requires conda commands must be performed in a SLURM batch job, not on the login node.

Check the installed packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

   The ``conda list`` command MUST be run on compute nodes via SLURM batch jobs or srun interactive runs, NOT on the login node.

To check installed packages, create and submit a SLURM batch script:

.. code-block:: bash

   #!/bin/bash
   #SBATCH --job-name=list_packages
   #SBATCH --output=list_packages_%j.out
   #SBATCH --error=list_packages_%j.err
   #SBATCH --time=00:05:00
   #SBATCH --ntasks-per-node=1
   #SBATCH --cpus-per-task=1
   #SBATCH --nodes=1
   #SBATCH --partition=cn
   #SBATCH --account=your_project
   #SBATCH --mem=2G

   module load anaconda3
   conda list --prefix /valhalla/projects/<your_project_name>/venv-1

Alternative: Interactive check using ``srun``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Instead of submitting a batch job, you can check installed packages interactively on a compute node using ``srun``.

**On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node:

.. code-block:: bash

   srun --job-name=list_packages \
        --time=00:05:00 \
        --ntasks-per-node=1 \
        --cpus-per-task=1 \
        --nodes=1 \
        --partition=cn \
        --account=your_project \
        --mem=2G \
        --pty bash

Once the interactive session starts, execute the command:

.. code-block:: bash

   module load anaconda3
   conda list --prefix /valhalla/projects/<your_project_name>/venv-1

**Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session.

Update packages
~~~~~~~~~~~~~~~

.. warning::

   Package updates MUST be performed on compute nodes via SLURM batch jobs or srun interactive runs, NOT on login nodes.

Update packages in the environment using a SLURM batch script:

.. code-block:: bash

   #!/bin/bash
   #SBATCH --job-name=update_packages
   #SBATCH --output=update_packages_%j.out
   #SBATCH --error=update_packages_%j.err
   #SBATCH --time=01:00:00
   #SBATCH --ntasks-per-node=2
   #SBATCH --cpus-per-task=1
   #SBATCH --nodes=1
   #SBATCH --partition=cn
   #SBATCH --account=your_project
   #SBATCH --mem=16G

   module load anaconda3

   # Set TMPDIR to project directory to avoid using /tmp
   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1
   conda update --prefix ${VIRTUAL_ENV} --all -y

Alternative: Interactive update using ``srun``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Instead of submitting a batch job, you can update packages interactively on a compute node using ``srun``.

**On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node:

.. code-block:: bash

   srun --job-name=update_packages \
        --time=01:00:00 \
        --ntasks-per-node=2 \
        --cpus-per-task=1 \
        --nodes=1 \
        --partition=cn \
        --account=your_project \
        --mem=16G \
        --pty bash

Once the interactive session starts, execute the update commands:

.. code-block:: bash

   module load anaconda3

   # Set TMPDIR to project directory to avoid using /tmp
   export TMPDIR=/valhalla/projects/<your_project_name>/tmp
   mkdir -p $TMPDIR

   export VIRTUAL_ENV=/valhalla/projects/<your_project_name>/venv-1
   conda update --prefix ${VIRTUAL_ENV} --all -y

**Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session.

Remove environment
~~~~~~~~~~~~~~~~~~

To remove an environment:

.. code-block:: bash

   rm -rf /valhalla/projects/<your_project_name>/venv-1

Environment management guidelines
---------------------------------

1. **Use project directories**: Store environments in your project directory (e.g., ``/valhalla/projects/<your_project_name>/``)
2. **Name environments clearly**: Use descriptive names like ``venv-1``, ``venv-numba``, ``venv-scikit``
3. **Document dependencies**: Keep track of which packages you install for reproducibility
4. **Version control**: Consider documenting your environment setup in your project documentation
5. **Test in SLURM**: Always test your environment setup in a SLURM batch script before running large jobs
6. **Configure conda properly**: Ensure conda is configured to use ``/valhalla`` for package cache and environments to avoid disk space issues

Additional Notes
----------------

1. **Cleanup**: The temp directory on ``/valhalla`` will accumulate files over time. You can clean it periodically:

   .. code-block:: bash

      rm -rf /valhalla/projects/<your_project_name>/tmp/*

2. **Disk space monitoring**: To monitor which filesystem conda is trying to use, you can check conda's verbose output:

   .. code-block:: bash

      conda create --prefix /valhalla/projects/<your_project_name>/venv/llvmlite python3.12 -y -v

3. **Alternative**: If you can't modify ``TMPDIR``, you could also try using ``mktemp`` to create a temp directory on ``/valhalla``:

   .. code-block:: bash

      TMPDIR=$(mktemp -d -p /valhalla/projects/<your_project_name>) conda create --prefix /valhalla/projects/<your_project_name>/venv/llvmlite python3.12 -y