Python virtual environments (GPU)

About

This document explains why Python virtual environments are the preferred method to install and use software packages on a per-user or per-project basis on the Discoverer+ GPU cluster.

Important

Understanding this approach is essential for effectively running your GPU-accelerated computing tasks on the Discoverer+.

Even if we provide a set of packages delivered through environment modules, they can help in boosting the productivity at CPU level, whenever that is critical. However, the most important packages, whose productivity depends on GPU acceleration through CUDA libraries, need to be installed in separate Python virtual environments.

Benefit of Python virtual environments

Note

Python virtual environments address situations where different environments need to host different packages in different versions, specific to a certain task or class of tasks. When working on multiple tasks with different topics and package dependencies, each with unique dependency requirements, managing packages in the same shared global Python installation becomes problematic. In that case the creation and utilisation of a different Python virtual environment for each different topic is probably the best approach.

By hosting different Python virtual environments, we ensure that no conflicts between installed packages can occur. This type of isolation prevents version incompatibilities that could break existing projects when new packages are installed or updated. We can even create different Python virtual environments for the same topic, but containing different versions of the same packages.

To summarise, each Python virtual environment maintains its own independent set of Python packages, allowing users to:

  • Install packages without affecting other projects or users
  • Use specific package versions required by each project
  • Maintain reproducibility across different tasks and timeframes
  • Avoid dependency conflicts that can cause runtime errors

Utilising Conda on Discoverer+ for managing Python virtual environments

Important

On Discoverer+, Conda tool and basic channel of locally installable packages come with centralised Anaconda installation accessible through loading the corresponding environment modules.

module load anaconda3

Warning

We urge our users not to install Anaconda or Miniconda by themselves in the home or project folders on Discoverer+, because that creates overutilisation of the storage space. In fact, the users use Conda quite seldom - once or twice per week on average, or sometimes several times during the entire project lifecycle. From that perspective, installing separate Anaconda or Miniconda distributions in the home or project folders, seems totally inefficient.

To understand how to create and manage Python virtual environments with Conda on Discoverer+, you can browse some of the most important use cases:

Important

The productivity of CUDA-linked packages included in Conda channels (later installed in the corresponding Python virtual environments) depends mainly on the quality and optimisation of the CUDA libraries provided by NVIDIA, rather than the performance of the CPU accelerated packages. These CUDA libraries form the foundation for GPU-accelerated computations and are the primary determinant of performance in this environment.

Therefore, even if the packages in the used Conda channels do not match the highest productivity at a CPU level, we can live with that downside. This is because we are expecting to process mainly tasks that rely on CUDA to accelerate code on GPU, instead of running massive CPU-accelerated tasks on the host.

To summarise, performance characteristics of packages at the CPU level are less critical in this context, as:

  • The primary computation workload is offloaded to GPU accelerators
  • NVIDIA H200 GPUs provide the computational power for intensive tasks
  • The host CPU primarily manages data transfer, job scheduling, and coordination
  • Package productivity is measured by GPU acceleration capabilities, not CPU performance

This approach to productivity allows us to prioritise compatibility and stability of CUDA-linked packages over maximum CPU-level performance, ensuring that the packages installed in the Python virtual environments support our GPU-focused computing paradigm effectively.