GROMACS¶
Versions available¶
Supported versions¶
Note
The versions of GROMACS installed in the software repository are built and supported by the Discoverer HPC team. The MPI builds should be employed for running the actual simulations (mdrun) and deriving trajectories, while the non-MPI ones should be regarded mostly a tool set for trajectory post-processing.
To check which GROMACS versions are currently supported on Discoverer, execute on the login node:
module avail gromacs
The following environment module naming convention is applied for the modules servicing the access to the software repository:
gromacs/MAJOR_N/MAJOR_N.MINOR_N-comp-num_lib-gpuavail-mpi_lib
where:
MAJOR_N
- the major number of the GROMACS version (example: 2022)MINOR_N
- the minor number of the GROMACS version (example: 1, which stands for 2022.1)comp
- the compiler collection employed for compiling the source code (example: intel)num_lib
- the numerical methods’ library providing BLAS and FFTW the libgromacs is linked against (example: openblas)gpuavail
- shows if the build supports GPU acceleration (example: nogpu, which means no GPU support)mpi_lib
- the MPI library the libgromacs is linked against (example: openmpi, which means Open MPI library)
The installed versions are compiled based on the following recipes:
https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/gromacs
Available single and double precission executables¶
All installations of GROMACS available in the software repository contain both single and double precision executables.
The single precision executables names are:
gmx_mpi
(for the MPI-versions of GROMACS)gmx
(for the non-MPI and ThreadMPI versions of GROMACS)
The double precision executables names are:
gmx_mpi_d
(for the MPI-versions of GROMACS)gmx_d
(for the non-MPI and ThreadMPI versions of GROMACS)
User-supported versions¶
Users are welcome to bring, or compile, and use their own builds of GROMACS but those builds will not be supported by Discoverer HPC team.
Running simulations (mdrun)¶
Running simulations means invoking mdrun
for generating trajectories based on a given TPR file.
Warning
You MUST NOT execute simulation directly upon the login node (login.discoverer.bg). You have to run your simulations as Slurm jobs only.
Warning
Write your trajectories and result of analysis only inside your Personal scratch and storage folder (/discofs/username) and DO NOT use for that purpose (under any circumstances) your Home folder (/home/username)!
Multi-node (MPI)¶
To organize your GROMACS multi-node MPI simulations, use the following Slurm batch template:
#!/bin/bash
#
#SBATCH --partition=cn ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_8_nodes
#SBATCH --time=512:00:00 ### WallTime - set it accordningly
#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>
#SBATCH --nodes 8 # May vary
#SBATCH --ntasks-per-core 1 # Bind one MPI tasks to one CPU core
#SBATCH --ntasks-per-node 128 # Must be less/equal to the number of CPU cores
#SBATCH --cpus-per-task 2 # Must be 2, unless you have a better guess
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-openmpi
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
export UCX_NET_DEVICES=mlx5_0:1
cd $SLURM_SUBMIT_DIR
mpirun gmx_mpi mdrun -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
# prefix of your TRR file
Note
If you need to run simulations using the double precision MPI-based version of GROMACS, replace gmx_mpi
with gmx_mpi_d
.
Specify the parameters and resources required for successfully running and completing the job:
- Slurm partition of compute nodes, based on your project resource reservation (
--partition
)- job name, under which the job will be seen in the queue (
--job-name
)- wall time for running the job (
--time
)- number of occupied compute nodes (
--nodes
)- number of MPI proccesses per node (
--ntasks-per-node
)- number of threads (OpenMP threads) per MPI process (
--cpus-per-task
)- version of GROMACS to run after
module load
(see Supported versions)
Note
The number of MPI processes in total (across the nodes) should not exceed those assumed by the domain decomposition. Using this template, one may achieve maximum thread affinity on AMD Zen2 CPUs.
You may download a copy of that template: GROMACS 2021 multi-node | GROMACS 2022 multi-node.
Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_multinode_mpi.batch
, and submit it to the queue:
cd /discofs/$USER/run_gromacs/
sbatch gromacs_multinode_mpi.batch
Upon successful submission, the standard output will be directed by Slurm into the file /discofs/$USER/run_gromacs/slurm.%j.out
(where %j
stands for the Slurm job ID), while the standard error output will be stored in /discofs/$USER/run_gromacs/slurm.%j.err
.
Single-node (MPI)¶
In the template proposed in Multi-node (MPI) set #SBATCH --nodes 1
since the tools do not require parallel run.
Single-node (tMPI)¶
Thread-MPI is an internal MPI protocol. Unlike the classic MPI implementations (Intel MPI, MPICH, OpenMPI), it performs MPI processing based on threads, and does not operate across multiple compute nodes.
Running thread-MPI mdrun
simulation is a bit “tricky” when it comes to describing the number of cores and logical CPUs in the resource reservation section of the Slurm batch file.
If mdrun
runs as thread-MPI process with N MPI “tasks” and M OpenMP threads per MPI task, and if each MPI task is considered one thread, then the Slurm resource management has to handle one task running N x M threads. Since that idea contradicts OpenMP model, no export OMP_NUM_THREADS
should remain in the Slurm batch (neither the other exports from the template above):
#!/bin/bash
#
#SBATCH --partition=cn ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_1_node
#SBATCH --time=512:00:00 ### WallTime - set it accordningly
#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>
#SBATCH --nodes 1 # MUST BE 1
#SBATCH --ntasks-per-node 1 # MUST BE 1
#SBATCH --cpus-per-task 256 # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
# which is ntomp x ntmpi (see gmx mdrun line below)
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-threadmpi
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
cd $SLURM_SUBMIT_DIR
gmx mdrun -ntomp 2 -ntmpi 128 -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
# prefix of your TRR file
Note
If you need to run simulations using the double precision ThreadMPI version of GROMACS, replace gmx
with gmx_d
.
Modify the parameters to specify the required resources for running the job:
- Slurm partition of compute nodes, based on your project resource reservation (
--partition
)- job name, under which the job will be seen in the queue (
--job-name
)- wall time for running the job (
--time
)- number of threads (OpenMP threads) per MPI process (
--cpus-per-task
), if it should not be 256- specify the version of GROMACS to run after
module load
(see Supported versions)- change
-ntomp
and-ntmpi
values, if necessary (ntomp
xntmpi
=cpus-per-task
)
You may download a copy of that template: GROMACS 2021 thread-MPI | GROMACS 2022 thread-MPI.
Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_thread_mpi.batch
) and submit it to the queue:
cd /discofs/$USER/run_gromacs/
sbatch gromacs_thread_mpi.batch
Upon successful submission, the standard output will be directed by Slurm into the file /discofs/$USER/run_gromacs/slurm.%j.out
(where %j
stands for the Slurm job ID), while the standard error output will be stored in /discofs/$USER/run_gromacs/slurm.%j.err
. Note that the following content will appear (among the other lines) inside the file used for capturing the standard error (/discofs/$USER/run_gromacs/slurm.%j.out
):
Using 128 MPI threads
Using 2 OpenMP threads per tMPI thread
which is and indication for running thread-MPI mdrun
process.
Running GROMACS tools¶
GROMACS tools should be used to compose TPR file (before running mdrun
), or handle the pre-processing of the trajectories after the completion of the simulation.
To run any of the tools, modify the following Slurm batch template:
#!/bin/bash
#
#SBATCH --partition=cn ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_1_node
#SBATCH --time=512:00:00 ### WallTime - set it accordningly
#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>
#SBATCH --nodes 1 # MUST BE 1
#SBATCH --ntasks-per-node 1 # MUST BE 1
#SBATCH --cpus-per-task 256 # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
# which is ntomp x ntmpi (see gmx mdrun line below)
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-nompi
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
cd $SLURM_SUBMIT_DIR
gmx tool_name options ..
Note
If you need to run simulations using the double precision non-MPI version of GROMACS, replace gmx
with gmx_d
.
Modify the parameters to specify the required resources for running the job:
- Slurm partition of compute nodes, based on your project resource reservation (
--partition
)- job name, under which the job will be seen in the queue (
--job-name
)- wall time for running the job (
--time
)- version of GROMACS to run after
module load
(see Supported versions)tool_name options ..
should be replaced with the name of the GROMACS tool..
should be replaced with the particular set of options to pass to the running tool
Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_tool_name.batch
, and submit it to the queue:
cd /discofs/$USER/run_gromacs/
sbatch gromacs_tool_name.batch
[Optional] Suitable compilers, tools, and libraries for compiling GROMACS code¶
Use the most recent version of CMake.
Unless your goal is to build your own FFTW library and avoid linking libgromacs to Intel oneMKL, you do not need to employ Intel oneAPI Compiler Collection.
The recent GNU Compiler Collection can link librgromacs to Intel oneAPI MKL and MPI and the produced mdrun binary code will show excellent productivity.
Note
Since version 2022 the GROMACS source code should be compiled by LLVM compilers (that includes Intel oneAPI LLVM C/C++ compilers icx
and icpx
), while some classic compilers are still supported (legacy).
If you want to compile the GROMACS code by yourself, it might be helpful to check the recipes followed by the Discoverer HPC team:
https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/gromacs
Scalability test (CPU vs CPU+GPU)¶
Important
At the present time, the Discoverer’s compute nodes do not possess GPU modules. Nevertheless, GROMACS simulations enjoy high productivity there.
To roughly check if the GROMACS MD simulations run reasonably fast on Discoverer, one can compare the completion time of a model simulation, executed on the CPU-only based partition of Discoverer, to the one measured based on running the same simulation on a modern GPU-accelerated cluster, with nodes equipped with NVIDIA V100 tensor core GPU.
Below, we provide data for a model simulation, which completion time depends on very intensive computation of non-bonded interactions between the participating atoms. One can expect that adding powerful GPU accelerators to a multicore CPU system may shorten the completion time significantly. While the latter may be the case for running molecular dynamics simulations on certain complex molecular systems, the CPU-only MD runs on Discoverer are still very competitive.
Content of the simulation box and MD integration settings¶
The simulation box accommodates 347814 atoms, engaged into an entity of 44685 water molecules (based on TIP3P water model), 7938 propylene glycol molecules, and 1701 1-monoolein molecules. The GRO formatted file, containing the atom coordinates of all atoms, along with the components of the box vectors, is available for download at:
The TPR file, containing the adopted force field parameters and MD integration settings, is also available for download:
Completion times¶
Discoverer’s CPU-only nodes: 26.48 ± 0.83 hours (9 compute nodes, 1152 CPU cores (see doc:resource_overview), 2304 CPU threads, 200 Gbps IB / 18% utilization)
External Hybrid CPU&GPU nodes: 52.15 ± 0.74 hours (4 compute nodes, 96 CPU cores on Intel Xeon Gold 6226, 192 CPU threads, 12 x NVIDIA V100 tensor core GPU, 56 Gbps IB / 77% utilization)
- The published completion time is an average, based on 6 simulations performed on each platform (one at a time); GROMACS 2023 code is compiled using Intel oneAPI LLVM compilers (against Open MPI 4.1.4 and CUDA 12.0); UCX is employed as IB communication middleware
Getting help¶
See Getting help