Hugging Face Transformers (GPU)¶
Table of Contents
About¶
This document shows how to install and use Hugging Face Transformers in a Python virtual environment on Discoverer+ GPU cluster. Note that the method used does not lock the shell environment into the virtual environment.
The guide covers the complete workflow from creating a conda environment to running Hugging Face Transformers jobs, ensuring that users can overcome common Slurm configuration challenges and successfully utilize the GPU resources available on the cluster.
Prerequisites¶
Important
Hugging Face Transformers has a hard dependency on PyTorch (or TensorFlow). This guide assumes you have already installed PyTorch with CUDA support following the PyTorch installation guide. The Hugging Face Transformers library will be installed in the same conda environment as PyTorch.
Hugging Face Transformers is essentially a high-level wrapper around PyTorch that provides easy access to pre-trained transformer models. All GPU operations, model loading, and inference are handled through PyTorch’s CUDA interface.
Use Conda to install Hugging Face Transformers with NVIDIA CUDA support on Discoverer+ GPU cluster¶
Note that we need to use a Python version that is appropriate for the latest stable PyTorch release. In our case, that is 3.11. While Python 3.13 and 3.14 are available, PyTorch doesn’t have full support for these newer versions yet, and we cannot rely on bleeding-edge technology for running productive jobs on HPC systems.
Here we use Slurm interactive session bind to the project Slurm account, but only on CPU basis. This way no GPU resources from the account will be spent. This is supported by the QoS with name “2cpu-single-host”.
Start an interactive Bash session on some of your compute nodes (that implies the invocation of srun
tool). The example below creates an interactive Bash session that will last 30 minutes:
srun -N 1 -n 2 --partition=common \
--account=your_slurm_project_account_name \
--qos 2cpu-single-host --time=00:30:00 --pty /bin/bash
Wait for the session to start. Only then follow the instructions given below.
We will use the running session to install Hugging Face Transformers in the existing PyTorch environment. That means all commands provided below are related to that same Bash interactive session. Do not execute those commands directly on the login node.
module load anaconda3
module load nvidia/cuda/12/12.8
conda install \
--prefix /valhalla/projects/your_slurm_project_account_name/pytorch_env/ \
conda-forge::transformers -c pytorch -c nvidia
Of course, you need to type “y”, whenever Conda asks you about allowing the installation of packages.
In case of success (no errors displayed), you will have Hugging Face Transformers installed in your existing Python 3.11 virtual environment with PyTorch and CUDA support. The environment is located in the following folder:
/valhalla/projects/your_slurm_project_account_name/pytorch_env/
You can test the integrity of the installation in that same interactive Bash session (or another interactive session):
/valhalla/projects/your_slurm_project_account_name/pytorch_env/bin/python \
-c "import transformers; print('Transformers version:', transformers.__version__)"
/valhalla/projects/your_slurm_project_account_name/pytorch_env/bin/python \
-c "import torch; print('PyTorch version:', torch.__version__)"
/valhalla/projects/your_slurm_project_account_name/pytorch_env/bin/python \
-c "import torch; print('CUDA available:', torch.cuda.is_available())"
You should get results like these:
Transformers version: 4.44.0
PyTorch version: 2.5.1
CUDA available: True
Now you can type Ctrl-D and terminate the interactive Bash session controlled by Slurm. Otherwise, you may leave that session open, but Slurm will terminate it after it runs for more than 30 minutes.
Running Hugging Face Transformers on Discoverer+¶
Once the installation is performed successfully as explained above, the Hugging Face Transformers installation can be utilized through a Slurm job, or run interactively by utilizing srun
. In this case, the Slurm must utilize the default QoS to the Slurm account, which in this case is the QoS named “your_slurm_project_account_name”. Otherwise Transformers will not be able to access the GPU devices on the compute nodes.
Running Hugging Face Transformers interactively¶
This is not a recommended way of running Hugging Face Transformers. Use this example for checks only!
For the sake of tests, we need a Python helper code that can be downloaded at:
https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/checks/huggingface_gpu_detection.py
To download the code:
cd /valhalla/projects/your_slurm_project_account_name/
wget https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/checks/huggingface_gpu_detection.py
In the example below we request the utilization of 2 GPUs (--gres=gpu:2
):
srun -N 1 -n 2 --gres=gpu:2 \
--partition=common \
--account=your_slurm_project_account_name \
--qos your_slurm_project_account_name \
--time=00:30:00 --pty /bin/bash
Once the interactive session is started, we need to access the CUDA library and run the test Python script that calls Hugging Face Transformers:
module load nvidia/cuda/12/12.8
export PATH="/valhalla/projects/your_slurm_project_account_name/pytorch_env/bin:$PATH"
export VIRTUAL_ENV="/valhalla/projects/your_slurm_project_account_name/pytorch_env"
python /valhalla/projects/your_slurm_project_account_name/huggingface_gpu_detection.py
In case of successful execution, the following result will be displayed:
============================================================
Hugging Face Transformers GPU Detection Script
============================================================
Library Import Check
--------------------
✓ Transformers version: 4.57.1
✓ PyTorch version: 2.5.1
✓ Tokenizers version: 0.22.1
CUDA and GPU Information
------------------------
CUDA available: True
CUDA version: 12.1
cuDNN version: 90100
Number of GPUs: 1
GPU Details
-----------
GPU 0:
Name: NVIDIA H200
Memory Total: 139.83 GB
Memory Allocated: 0.00 GB
Memory Cached: 0.00 GB
Compute Capability: 9.0
Multiprocessors: 132
Warp Size: 32
Tokenizer Test
--------------
tokenizer_config.json: 100%|██████████████████████| 48.0/48.0 [00:00<00:00, 569kB/s]
config.json: 100%|██████████████████████| 570/570 [00:00<00:00, 8.45MB/s]
vocab.txt: 100%|██████████████████████| 232k/232k [00:00<00:00, 1.96MB/s]
tokenizer.json: 100%|██████████████████████| 466k/466k [00:00<00:00, 3.87MB/s]
Text: "Hello, world!"
Tokens: ['hello', ',', 'world', '!']
Token IDs length: 6
Text: "This is a longer sentence with multiple words."
Tokens: ['this', 'is', 'a', 'longer', 'sentence', 'with', 'multiple', 'words', '.']
Token IDs length: 11
Text: "Special characters: @#$%^&*()_+-=[]{}|;':",./<>?"
Tokens: ['special', 'characters', ':', '@', '#', '$', '%', '^', '&', '*']...
Token IDs length: 33
[SUCCESS] Tokenizer test completed successfully!
Model Loading Test
------------------
Testing bert-base-uncased...
model.safetensors: 100%|██████████████████████| 440M/440M [00:02<00:00, 211MB/s]
✓ bert-base-uncased loaded successfully
Testing distilbert-base-uncased...
tokenizer_config.json: 100%|██████████████████████| 48.0/48.0 [00:00<00:00, 597kB/s]
config.json: 100%|██████████████████████| 483/483 [00:00<00:00, 7.85MB/s]
vocab.txt: 100%|██████████████████████| 232k/232k [00:00<00:00, 991kB/s]
tokenizer.json: 100%|██████████████████████| 466k/466k [00:00<00:00, 1.99MB/s]
model.safetensors: 100%|██████████████████████| 268M/268M [00:01<00:00, 263MB/s]
✓ distilbert-base-uncased loaded successfully
Testing roberta-base...
tokenizer_config.json: 100%|██████████████████████| 25.0/25.0 [00:00<00:00, 387kB/s]
config.json: 100%|██████████████████████| 481/481 [00:00<00:00, 7.31MB/s]
vocab.json: 100%|██████████████████████| 899k/899k [00:00<00:00, 3.84MB/s]
merges.txt: 100%|██████████████████████| 456k/456k [00:00<00:00, 2.02MB/s]
tokenizer.json: 100%|██████████████████████| 1.36M/1.36M [00:00<00:00, 2.89MB/s]
model.safetensors: 100%|██████████████████████| 499M/499M [00:01<00:00, 321MB/s]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✓ roberta-base loaded successfully
[SUCCESS] Model loading test completed!
Transformers GPU Test
---------------------
Loading BERT model...
Model loaded successfully on cuda:0
Input: "Hello, this is a test sentence for Hugging Face Transformers."
Running inference test...
Output shape: torch.Size([1, 14, 768])
Inference time: 0.6935 seconds
Device: cuda:0
GPU memory after inference: 0.44 GB
Memory cleaned up successfully
[SUCCESS] Hugging Face Transformers GPU test completed successfully!
Environment Information
-----------------------
Python version: 3.11.13 (main, Jun 5 2025, 13:12:00) [GCC 11.2.0]
Platform: linux
Current working directory: /home/tfraunholz
CUDA_HOME: /usr/local/cuda-12.8
CUDA_PATH: /usr/local/cuda-12.8
LD_LIBRARY_PATH: /opt/software/anaconda3/lib:/usr/local/cuda-12.8/lib64
VIRTUAL_ENV: /valhalla/projects/ehpc-aif-2025pg01-214/pytorch_env
============================================================
Test Summary
============================================================
Tests passed: 3/3
[SUCCESS] All tests passed! Hugging Face Transformers is working correctly.
Running Hugging Face Transformers within a Slurm batch script¶
Create the following Slurm batch script:
#!/bin/bash
#SBATCH --partition=common
#SBATCH --job-name=test_huggingface
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:2
#SBATCH --account=your_slurm_project_account_name
#SBATCH --qos your_slurm_project_account_name
#SBATCH -o test_huggingface.%j.out
#SBATCH -e test_huggingface.%j.err
export PATH="/valhalla/projects/your_slurm_project_account_name/pytorch_env/bin:$PATH"
export VIRTUAL_ENV="/valhalla/projects/your_slurm_project_account_name/pytorch_env"
module load nvidia/cuda/12/12.8
cd $SLURM_SUBMIT_DIR
python /valhalla/projects/your_slurm_project_account_name/huggingface_gpu_detection.py
and save it as /valhalla/projects/your_slurm_project_account_name/test_huggingface.sbatch
.
If you don’t find huggingface_gpu_detection.py
download it from here:
https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/checks/huggingface_gpu_detection.py
To submit the job to the queue:
sbatch /valhalla/projects/your_slurm_project_account_name/test_huggingface.sbatch
Once successfully submitted, you can check if the job is running by executing:
squeue --me
If the job is running at the moment, information about its execution will be presented as:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1980 common test_hug username R 0:06 1 dgx1
The execution of the job will create two files in the current directory - one capturing the standard output, and another - where the standard error messages are collected:
test_huggingface.1980.err
test_huggingface.1980.out
Here 1980
is the job id. That number in your case will be different.
The file test_huggingface.1980.out
will contain the results (should be the same as those reported for the interactive execution).
Additional Hugging Face Libraries¶
You may also want to install additional Hugging Face libraries depending on your use case:
# For datasets
conda install \
--prefix /valhalla/projects/your_slurm_project_account_name/pytorch_env/ \
conda-forge::datasets -c pytorch -c nvidia
# For tokenizers
conda install \
--prefix /valhalla/projects/your_slurm_project_account_name/pytorch_env/ \
conda-forge::tokenizers -c pytorch -c nvidia
# For accelerate (for distributed training)
conda install \
--prefix /valhalla/projects/your_slurm_project_account_name/pytorch_env/ \
conda-forge::accelerate -c pytorch -c nvidia
# For evaluate (for model evaluation)
conda install \
--prefix /valhalla/projects/your_slurm_project_account_name/pytorch_env/ \
conda-forge::evaluate -c pytorch -c nvidia
Example Usage¶
Here’s a simple example of how to use Hugging Face Transformers in your Python scripts:
from transformers import AutoTokenizer, AutoModel
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
# Move model to GPU if available
if torch.cuda.is_available():
model = model.cuda()
device = "cuda"
else:
device = "cpu"
# Tokenize input text
text = "Hello, this is a test sentence."
inputs = tokenizer(text, return_tensors="pt")
# Move inputs to the same device as model
inputs = {k: v.to(device) for k, v in inputs.items()}
# Run inference
with torch.no_grad():
outputs = model(**inputs)
print(f"Output shape: {outputs.last_hidden_state.shape}")
print(f"Device: {device}")