PyTorch ======= .. toctree:: :maxdepth: 1 :caption: Contents: .. important:: Discoverer HPC provides public access to the version of `PyTorch`_ included in `Intel Distribution for Python`_. See :doc:`python` for mode details about the benefits of running `Intel Distribution for Python`_. .. warning:: No GPU accelerators are currently available on Discoverer HPC. Torchvision cannot be run without a working CUDA DNN. Versions supported ------------------ The version of PyTorch supported on Discoverer HPC as part of the Intel Distribution for Python publicly available in the public software repository. Running that version does not require setting a virtual environment by Conda or pip. .. note:: PyTorch comes with Torchvision module installed. Running PyTorch --------------- .. warning:: Do not run PyTorch directly on the login node. Always do that as a Slurm batch job. To load the PyTorch environment, load the module ``intel.universe.pytorch`` from within your Slurm batch script: .. code-block:: bash module load intel.universe.pytorch Once loaded, that module provides access to the correct Python interpreter and PyTorch 2 module. In case you need to combine PyTorch 2 with some specific modules that are not included by default in the distribution, you can create a virtual environment based on the same Python interpreter. Checking the version .................... The easiest way to check the version of the PyTorch and Torchvision available in the software repository is to execute the following Slurm batch script: .. code:: bash #!/bin/bash # #SBATCH --partition=cn # Partition name (ask the support team to clarify it) #SBATCH --job-name=torch_version #SBATCH --time=00:01:00 # WallTime - one minute is more than enough here #SBATCH --nodes 1 # May vary #SBATCH --ntasks-per-node 1 # Must be 1 #SBATCH --cpus-per-task 1 # Must be 1 #SBATCH -o slurm.check_pytorch_version.out # STDOUT #SBATCH -e slurm.check_pytorch_version.err # STDERR module purge module load intel.universe.pytorch cd $SLURM_SUBMIT_DIR python -c "import torch;print('Torch:',torch.version.__version__)" python -c "import torchvision;print('Torchvision:',torchvision.version.__version__)" To achieve that, store the script content into a file, for example ``/discofs/${USER}/check_pytorch_version.sbatch`` and submit it as a job to the queue: .. code:: bash cd /discofs/${USER}/check_pytorch_version.sbatch sbatch check_pytorch_version.sbatch Then check the content of the file ``slurm.check_pytorch_version.out`` to find out which versions of PyTorch and Torchvision are reported there. Thread control .............. The version of PyTorch that comes with Intel Distribution for Python adopts `TBB`_ thread model. To understand better the way the thread control can be imposed on PyTorch, read the following document by keeping in mind the PyTorch installed on Discoverer HPC is built against MKL-DNN: `CPU threading and TorchScript inference`_ Slurm batch script (example) ............................ Given below is an example of a Slurm batch script that runs a Python code invoking PyTorch: .. code:: bash #!/bin/bash # #SBATCH --partition=cn # Partition name (ask the support team to clarify it) #SBATCH --job-name=torch_run #SBATCH --time=512:00:00 # WallTime - set it accordningly #SBATCH --nodes 1 # May vary #SBATCH --ntasks-per-node 1 # Must be 1 for non-MPI processes #SBATCH --cpus-per-task 16 # See the 'Thread control` above to understand what number # to supply here instead of 16 (16 is an example). You may # run a series of benchmarks varying that number until reach # an optimal speed. #SBATCH -o slurm.%j.out # STDOUT #SBATCH -e slurm.%j.err # STDERR module purge module load intel.universe.pytorch cd $SLURM_SUBMIT_DIR python my_torch_based_code.py where ``my_torch_based_code.py`` is your PyTorch-based Python code. Specify the parameters and resources required for successfully running and completing the job: - Slurm partition of compute nodes, based on your project resource reservation (``--partition``) - job name, under which the job will be seen in the queue (``--job-name``) - wall time for running the job (``--time``) - number of threads to use (``--cpus-per-task``) - see "Thread control" above Save the complete Slurm job description as a file, for example ``/discofs/$USER/run_torch/torch.batch`` and submit it to the queue: .. code:: bash cd /discofs/$USER/run_torch sbatch torch.batch Follow the information stored by the running job in ``slurm.%j.out`` and ``slurm.%j.err``, where ``%j`` stands for the actual ID number assigned to the job in the queue (you will get that number upon submission). Getting help ------------ See :doc:`help` .. _`Intel Distribution for Python`: https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html .. _`PyTorch`: https://pytorch.org/ .. _`CPU threading and TorchScript inference`: https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html .. _`TBB`: https://www.intel.com/content/www/us/en/develop/documentation/onetbb-documentation/top.html