Conda (CPU) =========== .. toctree:: :maxdepth: 1 :caption: Contents: .. contents:: Table of Contents :depth: 3 .. role:: underline :class: underline .. warning:: ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs. NEVER run ``conda create``, ``conda install``, ``conda update``, or any other conda commands directly on login nodes. Login nodes are shared resources. Running conda-related I/O intensive operations on login nodes: - Violates resource accounting policies - Degrades performance for all users - Can result in account restrictions or termination Overview -------- This documentation describes how to install packages using conda in the project environment in a way that matches all requirements for versions and dependencies of the provisioned project. Conda environments on Discoverer are created using the ``--prefix`` option to specify a custom location (typically in your project directory). This approach avoids using ``conda activate`` and instead relies on environment variable configuration to ensure proper version and dependency management. .. important:: ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs. .. warning:: Do NOT use ``conda init`` or ``conda activate``. Avoid using ``conda init`` and ``conda activate`` for the following reasons: - Interference with other virtual environments: ``conda activate`` can interfere with other virtual environment managers (e.g., ``venv``, ``virtualenv``, ``pipenv``) and cause conflicts - Pollution of ``.bashrc``: ``conda init`` modifies your ``.bashrc`` file, which can cause issues in shared environments and HPC systems - Less explicit control: The activation mechanism is less transparent than explicit environment variable exports - Potential conflicts: Automatic conda initialization can interfere with system modules and other environment setups Instead, use explicit environment variable exports as shown in this documentation: .. code-block:: bash export PATH=/valhalla/projects//venv-1/bin:${PATH} export LD_LIBRARY_PATH=/valhalla/projects//venv-1/lib:${LD_LIBRARY_PATH} export VIRTUAL_ENV=/valhalla/projects//venv-1 This approach is: - Clear and explicit: You can see exactly which environment is being used - Non-intrusive: Does not modify system configuration files - Compatible: Works well with other virtual environment tools - HPC-friendly: Ideal for shared computing environments .. tip:: Important: Do NOT install Miniconda or Anaconda in your home directory. Installing Miniconda or Anaconda in your home folder consumes significant disk space unnecessarily. The system provides conda through the ``anaconda3`` environment module, which is already available and can be used to install any package that is installable with conda. Always use the system-provided conda by loading the module: .. code-block:: bash module load anaconda3 This module is automatically loaded in all SLURM batch scripts shown in this documentation. There is no need to install your own conda distribution. Conda Configuration: Fixing "No space left on device" Error ------------------------------------------------------------ .. warning:: **Important:** Before creating conda environments, ensure your conda configuration uses directories with sufficient disk space. By default, conda may try to use ``/tmp`` which has limited space (typically 320M and 95% full), causing "No space left on device" errors even when your target directory has plenty of space. Problem ~~~~~~~ Conda fails with ``NoSpaceLeftError: No space left on devices`` even when the target directory has plenty of space. This happens because conda's configuration file (``~/.condarc``) has ``pkgs_dirs`` and ``envs_dirs`` pointing to ``/tmp``, which is on a different filesystem with limited space. Diagnosis ~~~~~~~~~ Check filesystem space usage: .. code-block:: bash # Check space on /tmp and your target directory df -h /tmp /valhalla/projects// # Check which filesystem /tmp is on df -h /tmp **Expected output shows:** - ``/tmp`` is on ``/dev/mapper/live-rw`` with only 320M free (95% full) - ``/valhalla`` has 50T available Check conda's configuration: .. code-block:: bash module load anaconda3 conda info Look for: - ``package cache : /tmp/...`` - This is where conda stores downloaded packages - ``envs directories : /tmp/...`` - This is where conda stores environments Check your conda config file: .. code-block:: bash module load anaconda3 cat ~/.condarc **The problem:** Your ``~/.condarc`` likely contains: .. code-block:: yaml pkgs_dirs: - /tmp/moose.ntCRtFtw/_env/.pkgs envs_dirs: - /tmp/moose.ntCRtFtw/_env/.envs These directories are on ``/tmp`` which is 95% full. Conda needs space in ``pkgs_dirs`` to: - Download packages - Extract packages - Build packages from source - Cache package metadata These operations can require significant space (often several GB for Python 3.12 with dependencies). Solution ~~~~~~~~ **Option 1: Update conda configuration (Recommended - Permanent Fix)** Modify your ``~/.condarc`` file to use ``/valhalla`` for package cache and environments: .. code-block:: bash module load anaconda3 # Create directories on /valhalla mkdir -p /valhalla/projects//conda/pkgs mkdir -p /valhalla/projects//conda/envs # Update conda configuration (use --add for list parameters) conda config --add pkgs_dirs /valhalla/projects//conda/pkgs conda config --add envs_dirs /valhalla/projects//conda/envs # Verify the change conda info | grep -E "(package cache|envs directories)" **Note:** Use ``--add`` (not ``--set``) because ``pkgs_dirs`` and ``envs_dirs`` are list parameters. The ``--add`` command adds the directory to the beginning of the list, and conda uses the first writable location. **Option 2: Edit ~/.condarc manually** Alternatively, you can edit ``~/.condarc`` directly: .. code-block:: bash # Backup your current config cp ~/.condarc ~/.condarc.backup # Edit the file nano ~/.condarc # or use your preferred editor Change: .. code-block:: yaml pkgs_dirs: - /tmp/moose.ntCRtFtw/_env/.pkgs envs_dirs: - /tmp/moose.ntCRtFtw/_env/.envs To: .. code-block:: yaml pkgs_dirs: - /valhalla/projects//conda/pkgs envs_dirs: - /valhalla/projects//conda/envs **Option 3: Use environment variable for single command** For a one-time fix without changing your config: .. code-block:: bash module load anaconda3 # Create temp directory on /valhalla mkdir -p /valhalla/projects//conda/pkgs # Override pkgs_dirs for this command only CONDA_PKGS_DIRS=/valhalla/projects//conda/pkgs conda create --prefix /valhalla/projects//venv/llvmlite python3.12 -y **Option 4: Set TMPDIR (for additional temp operations)** Even after fixing ``pkgs_dirs``, you may also want to set ``TMPDIR`` for other temporary operations: .. code-block:: bash export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR Verification ~~~~~~~~~~~~ After updating conda configuration, verify it's using the correct directories: .. code-block:: bash module load anaconda3 conda info | grep -E "(package cache|envs directories)" You should see: .. code-block:: text package cache : /valhalla/projects//conda/pkgs envs directories : /valhalla/projects//conda/envs /opt/software/anaconda3/envs /home//.conda/envs Also verify your config file: .. code-block:: bash cat ~/.condarc | grep -A 2 -E "(pkgs_dirs|envs_dirs)" Initial setup ------------- .. warning:: ALL conda operations MUST be performed on the compute nodes via SLURM batch jobs or srun interactive runs. The initial setup process consists of creating a SLURM batch script file (on the login node, using a text editor) and then submitting it to run on compute nodes. All conda operations execute on compute nodes. Step 1: Create the SLURM installation script file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **On the login node:** Create a text file named ``install_conda_env.sh`` using your preferred text editor (``nano``, ``vim``, ``emacs``, etc.). This is the ONLY step that involves working on the login node - you are just creating a text file. The file should contain the following SLURM batch script (change the package list to the one that fits your goals): .. code-block:: bash #!/bin/bash #SBATCH --job-name=install_conda_env #SBATCH --output=install_env_%j.out #SBATCH --error=install_env_%j.err #SBATCH --time=02:00:00 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=1 #SBATCH --nodes=1 #SBATCH --partition=cn #SBATCH --account=your_project #SBATCH --mem=16G # Load Anaconda module module load anaconda3 # Set TMPDIR to project directory to avoid using /tmp export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR # Set virtual environment path (replace with your project name) export VIRTUAL_ENV=/valhalla/projects//venv-1 # Create conda environment echo "Creating conda environment at ${VIRTUAL_ENV}" conda create --prefix ${VIRTUAL_ENV} python=3.12.2 -y # Set up environment variables export PATH=${VIRTUAL_ENV}/bin:${PATH} export LD_LIBRARY_PATH=${VIRTUAL_ENV}/lib:${LD_LIBRARY_PATH} # Install packages from conda-forge echo "Installing packages from conda-forge" conda install --prefix ${VIRTUAL_ENV} -c conda-forge scikit-learn scipy numpy matplotlib mdanalysis pandas -y # Verify installation echo "Verifying installation..." which python python --version conda list --prefix ${VIRTUAL_ENV} echo "Installation complete!" **Configuration parameters:** - Replace ```` with your actual project name - Adjust ``--time`` based on the number of packages (installation can take 30 minutes to 2 hours) - Increase ``--mem`` if installing large packages (16G is usually sufficient) - Modify the package list as needed - The ``-y`` flag automatically confirms the installation - Specify the Python version that meets your requirements Alternative: Interactive installation using ``srun`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of submitting a batch job, you can run the installation interactively on a compute node using ``srun``. This is useful if you want to see the output in real-time or interact with the installation process. **On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node: .. code-block:: bash srun --job-name=install_conda_env \ --time=02:00:00 \ --ntasks-per-node=2 \ --cpus-per-task=1 \ --nodes=1 \ --partition=cn \ --account=your_project \ --mem=16G \ --pty bash Once the interactive session starts, execute the installation commands: .. code-block:: bash # Load Anaconda module module load anaconda3 # Set TMPDIR to project directory to avoid using /tmp export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR # Set virtual environment path (replace with your project name) export VIRTUAL_ENV=/valhalla/projects//venv-1 # Create conda environment echo "Creating conda environment at ${VIRTUAL_ENV}" conda create --prefix ${VIRTUAL_ENV} python=3.12.2 -y # Set up environment variables export PATH=${VIRTUAL_ENV}/bin:${PATH} export LD_LIBRARY_PATH=${VIRTUAL_ENV}/lib:${LD_LIBRARY_PATH} # Install packages from conda-forge echo "Installing packages from conda-forge" conda install --prefix ${VIRTUAL_ENV} -c conda-forge scikit-learn scipy numpy matplotlib mdanalysis pandas -y # Verify installation echo "Verifying installation..." which python python --version conda list --prefix ${VIRTUAL_ENV} echo "Installation complete!" **Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session. Step 2: Submit the SLURM job ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **On the login node:** Submit the batch script to the SLURM scheduler. The job will execute on compute nodes where all conda operations will run: .. code-block:: bash sbatch install_conda_env.sh After submission, SLURM will display a job ID (e.g., ``Submitted batch job 12345``). This job ID is used in the output and error filenames. **On the login node:** Verify that your job is queued (this is a read-only operation, safe on login node): .. code-block:: bash squeue --me This command shows all jobs submitted by your user account. You can monitor the job status and wait for it to complete. Step 3: Check the results ~~~~~~~~~~~~~~~~~~~~~~~~~~ **On the login node:** After the job completes, check the output and error files: .. code-block:: bash # View the output file (replace 12345 with your job ID) cat install_env_12345.out # Check for errors cat install_env_12345.err If the installation was successful, you should see messages indicating that packages were installed and the environment was created. When to create a new environment vs. reuse existing --------------------------------------------------- **Scenario 1: Adding compatible packages** - **Action**: Reuse existing environment - **Reason**: New packages (e.g., ``pandas``, ``matplotlib``) are compatible with existing packages **Scenario 2: Need Python 3.11 instead of 3.12** - **Action**: Create new environment (``venv-python311``) - **Reason**: Python version requirement differs **Scenario 3: Package conflict detected** - **Action**: Create new environment for the conflicting package set - **Reason**: Conda reports dependency conflicts that cannot be resolved **Scenario 4: Starting a new project** - **Action**: Evaluate if existing environment meets requirements - If yes: Reuse - If no or conflicts: Create new environment with project-specific name Using the virtual environment ----------------------------- Step 1: Set the environment variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before running your Python scripts, configure the environment variables: .. code-block:: bash export PATH=/valhalla/projects//venv-1/bin:${PATH} export LD_LIBRARY_PATH=/valhalla/projects//venv-1/lib:${LD_LIBRARY_PATH} export VIRTUAL_ENV=/valhalla/projects//venv-1 Step 2: Run your Python script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Execute your Python script normally: .. code-block:: bash python script.py All packages installed in the conda environment will be available to your script. SLURM batch script example -------------------------- Here's how to set up the environment in a SLURM batch script: .. code-block:: bash #!/bin/bash #SBATCH --job-name=my_job #SBATCH --output=job_%j.out #SBATCH --error=job_%j.err #SBATCH --time=01:00:00 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --nodes=1 #SBATCH --partition=cn #SBATCH --account=your_project #SBATCH --mem=8G # Load Anaconda module module load anaconda3 # Set TMPDIR to project directory to avoid using /tmp export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR # Set up conda environment export PATH=/valhalla/projects//venv-1/bin:${PATH} export LD_LIBRARY_PATH=/valhalla/projects//venv-1/lib:${LD_LIBRARY_PATH} export VIRTUAL_ENV=/valhalla/projects//venv-1 # Run your Python script python script.py Implementation details ---------------------- 1. **No** ``conda activate`` **or** ``conda init``: This method uses explicit environment variable exports instead of ``conda activate`` or ``conda init``. This avoids interference with other virtual environments, prevents pollution of ``.bashrc``, and provides clearer control over the environment setup. 2. **Custom paths**: Environments are created in project directories using ``--prefix`` 3. **Persistent setup**: Environment variables must be set each time you want to use the environment 4. **SLURM compatibility**: This approach works well in SLURM batch scripts 5. **Login node restrictions**: All conda create, install, and update operations must be performed through SLURM batch jobs on compute nodes, not on login nodes 6. **Disk space management**: Ensure conda is configured to use ``/valhalla`` for package cache and environments to avoid "No space left on device" errors Troubleshooting --------------- Verify environment setup ~~~~~~~~~~~~~~~~~~~~~~~~ **On the login node:** Lightweight, read-only verification commands can be run on login nodes after setting environment variables (these do not execute conda commands): .. code-block:: bash # Set environment variables (read-only operation, safe on login node) export PATH=/valhalla/projects//venv-1/bin:${PATH} export LD_LIBRARY_PATH=/valhalla/projects//venv-1/lib:${LD_LIBRARY_PATH} export VIRTUAL_ENV=/valhalla/projects//venv-1 # Verify Python path (read-only, safe on login node) which python # Should show: /valhalla/projects//venv-1/bin/python python -c "import sys; print(sys.executable)" # Should show the path to your environment's Python **Note:** Any verification that requires conda commands must be performed in a SLURM batch job, not on the login node. Check the installed packages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. warning:: The ``conda list`` command MUST be run on compute nodes via SLURM batch jobs or srun interactive runs, NOT on the login node. To check installed packages, create and submit a SLURM batch script: .. code-block:: bash #!/bin/bash #SBATCH --job-name=list_packages #SBATCH --output=list_packages_%j.out #SBATCH --error=list_packages_%j.err #SBATCH --time=00:05:00 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --nodes=1 #SBATCH --partition=cn #SBATCH --account=your_project #SBATCH --mem=2G module load anaconda3 conda list --prefix /valhalla/projects//venv-1 Alternative: Interactive check using ``srun`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of submitting a batch job, you can check installed packages interactively on a compute node using ``srun``. **On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node: .. code-block:: bash srun --job-name=list_packages \ --time=00:05:00 \ --ntasks-per-node=1 \ --cpus-per-task=1 \ --nodes=1 \ --partition=cn \ --account=your_project \ --mem=2G \ --pty bash Once the interactive session starts, execute the command: .. code-block:: bash module load anaconda3 conda list --prefix /valhalla/projects//venv-1 **Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session. Update packages ~~~~~~~~~~~~~~~ .. warning:: Package updates MUST be performed on compute nodes via SLURM batch jobs or srun interactive runs, NOT on login nodes. Update packages in the environment using a SLURM batch script: .. code-block:: bash #!/bin/bash #SBATCH --job-name=update_packages #SBATCH --output=update_packages_%j.out #SBATCH --error=update_packages_%j.err #SBATCH --time=01:00:00 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=1 #SBATCH --nodes=1 #SBATCH --partition=cn #SBATCH --account=your_project #SBATCH --mem=16G module load anaconda3 # Set TMPDIR to project directory to avoid using /tmp export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR export VIRTUAL_ENV=/valhalla/projects//venv-1 conda update --prefix ${VIRTUAL_ENV} --all -y Alternative: Interactive update using ``srun`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of submitting a batch job, you can update packages interactively on a compute node using ``srun``. **On the login node (login.discoverer.bg):** Run the following command to start an interactive session on a compute node: .. code-block:: bash srun --job-name=update_packages \ --time=01:00:00 \ --ntasks-per-node=2 \ --cpus-per-task=1 \ --nodes=1 \ --partition=cn \ --account=your_project \ --mem=16G \ --pty bash Once the interactive session starts, execute the update commands: .. code-block:: bash module load anaconda3 # Set TMPDIR to project directory to avoid using /tmp export TMPDIR=/valhalla/projects//tmp mkdir -p $TMPDIR export VIRTUAL_ENV=/valhalla/projects//venv-1 conda update --prefix ${VIRTUAL_ENV} --all -y **Note:** The ``--pty bash`` flag allocates a pseudo-terminal, allowing you to interact with the session. When you're done, type ``exit`` to end the interactive session. Remove environment ~~~~~~~~~~~~~~~~~~ To remove an environment: .. code-block:: bash rm -rf /valhalla/projects//venv-1 Environment management guidelines --------------------------------- 1. **Use project directories**: Store environments in your project directory (e.g., ``/valhalla/projects//``) 2. **Name environments clearly**: Use descriptive names like ``venv-1``, ``venv-numba``, ``venv-scikit`` 3. **Document dependencies**: Keep track of which packages you install for reproducibility 4. **Version control**: Consider documenting your environment setup in your project documentation 5. **Test in SLURM**: Always test your environment setup in a SLURM batch script before running large jobs 6. **Configure conda properly**: Ensure conda is configured to use ``/valhalla`` for package cache and environments to avoid disk space issues Additional Notes ---------------- 1. **Cleanup**: The temp directory on ``/valhalla`` will accumulate files over time. You can clean it periodically: .. code-block:: bash rm -rf /valhalla/projects//tmp/* 2. **Disk space monitoring**: To monitor which filesystem conda is trying to use, you can check conda's verbose output: .. code-block:: bash conda create --prefix /valhalla/projects//venv/llvmlite python3.12 -y -v 3. **Alternative**: If you can't modify ``TMPDIR``, you could also try using ``mktemp`` to create a temp directory on ``/valhalla``: .. code-block:: bash TMPDIR=$(mktemp -d -p /valhalla/projects/) conda create --prefix /valhalla/projects//venv/llvmlite python3.12 -y