GROMACS

Versions available

Supported versions

Note

The versions of GROMACS installed in the software repository are built and supported by the Discoverer HPC team. The MPI builds should be employed for running the actual simulations (mdrun) and deriving trajectories, while the non-MPI ones should be regarded mostly a tool set for trajectory post-processing.

To check which GROMACS versions are currently supported on Discoverer, execute on the login node:

module avail gromacs

The following environment module naming convention is applied for the modules servicing the access to the software repository:

gromacs/MAJOR_N/MAJOR_N.MINOR_N-comp-num_lib-gpuavail-mpi_lib

where:

  • MAJOR_N - the major number of the GROMACS version (example: 2022)
  • MINOR_N - the minor number of the GROMACS version (example: 1, which stands for 2022.1)
  • comp - the compiler collection employed for compiling the source code (example: intel)
  • num_lib - the numerical methods’ library providing BLAS and FFTW the libgromacs is linked against (example: openblas)
  • gpuavail - shows if the build supports GPU acceleration (example: nogpu, which means no GPU support)
  • mpi_lib - the MPI library the libgromacs is linked against (example: openmpi, which means Open MPI library)

The installed versions are compiled based on the following recipes:

https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/gromacs

Available single and double precission executables

All installations of GROMACS available in the software repository contain both single and double precision executables.

  1. The single precision executables names are:

    • gmx_mpi (for the MPI-versions of GROMACS)
    • gmx (for the non-MPI and ThreadMPI versions of GROMACS)
  2. The double precision executables names are:

    • gmx_mpi_d (for the MPI-versions of GROMACS)
    • gmx_d (for the non-MPI and ThreadMPI versions of GROMACS)

User-supported versions

Users are welcome to bring, or compile, and use their own builds of GROMACS but those builds will not be supported by Discoverer HPC team.

Running simulations (mdrun)

Running simulations means invoking mdrun for generating trajectories based on a given TPR file.

Warning

You MUST NOT execute simulation directly upon the login node (login.discoverer.bg). You have to run your simulations as Slurm jobs only.

Warning

Write your trajectories and result of analysis only inside your Personal scratch and storage folder (/discofs/username) and DO NOT use for that purpose (under any circumstances) your Home folder (/home/username)!

Multi-node (MPI)

To organize your GROMACS multi-node MPI simulations, use the following Slurm batch template:

#!/bin/bash
#
#SBATCH --partition=cn         ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_8_nodes
#SBATCH --time=512:00:00       ### WallTime - set it accordningly

#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

#SBATCH --nodes           8    # May vary
#SBATCH --ntasks-per-core 1    # Bind one MPI tasks to one CPU core
#SBATCH --ntasks-per-node 128  # Must be less/equal to the number of CPU cores
#SBATCH --cpus-per-task   2    # Must be 2, unless you have a better guess

#SBATCH -o slurm.%j.out        # STDOUT
#SBATCH -e slurm.%j.err        # STDERR

module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-openmpi

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
export UCX_NET_DEVICES=mlx5_0:1

cd $SLURM_SUBMIT_DIR

mpirun gmx_mpi mdrun -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
                                                     # prefix of your TRR file

Note

If you need to run simulations using the double precision MPI-based version of GROMACS, replace gmx_mpi with gmx_mpi_d.

Specify the parameters and resources required for successfully running and completing the job:

  • Slurm partition of compute nodes, based on your project resource reservation (--partition)
  • job name, under which the job will be seen in the queue (--job-name)
  • wall time for running the job (--time)
  • number of occupied compute nodes (--nodes)
  • number of MPI proccesses per node (--ntasks-per-node)
  • number of threads (OpenMP threads) per MPI process (--cpus-per-task)
  • version of GROMACS to run after module load (see Supported versions)

Note

The number of MPI processes in total (across the nodes) should not exceed those assumed by the domain decomposition. Using this template, one may achieve maximum thread affinity on AMD Zen2 CPUs.

You may download a copy of that template: GROMACS 2021 multi-node | GROMACS 2022 multi-node.

Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_multinode_mpi.batch, and submit it to the queue:

cd /discofs/$USER/run_gromacs/
sbatch gromacs_multinode_mpi.batch

Upon successful submission, the standard output will be directed by Slurm into the file /discofs/$USER/run_gromacs/slurm.%j.out (where %j stands for the Slurm job ID), while the standard error output will be stored in /discofs/$USER/run_gromacs/slurm.%j.err.

Single-node (MPI)

In the template proposed in Multi-node (MPI) set #SBATCH --nodes           1 since the tools do not require parallel run.

Single-node (tMPI)

Thread-MPI is an internal MPI protocol. Unlike the classic MPI implementations (Intel MPI, MPICH, OpenMPI), it performs MPI processing based on threads, and does not operate across multiple compute nodes.

Running thread-MPI mdrun simulation is a bit “tricky” when it comes to describing the number of cores and logical CPUs in the resource reservation section of the Slurm batch file.

If mdrun runs as thread-MPI process with N MPI “tasks” and M OpenMP threads per MPI task, and if each MPI task is considered one thread, then the Slurm resource management has to handle one task running N x M threads. Since that idea contradicts OpenMP model, no export OMP_NUM_THREADS should remain in the Slurm batch (neither the other exports from the template above):

#!/bin/bash
#
#SBATCH --partition=cn         ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_1_node
#SBATCH --time=512:00:00       ### WallTime - set it accordningly

#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

#SBATCH --nodes           1    # MUST BE 1
#SBATCH --ntasks-per-node 1    # MUST BE 1
#SBATCH --cpus-per-task 256    # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
                               # which is ntomp x ntmpi (see gmx mdrun line below)

#SBATCH -o slurm.%j.out        # STDOUT
#SBATCH -e slurm.%j.err        # STDERR

module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-threadmpi

export OMP_PLACES=cores
export OMP_PROC_BIND=spread

cd $SLURM_SUBMIT_DIR

gmx mdrun -ntomp 2 -ntmpi 128 -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
                                                              # prefix of your TRR file

Note

If you need to run simulations using the double precision ThreadMPI version of GROMACS, replace gmx with gmx_d.

Modify the parameters to specify the required resources for running the job:

  • Slurm partition of compute nodes, based on your project resource reservation (--partition)
  • job name, under which the job will be seen in the queue (--job-name)
  • wall time for running the job (--time)
  • number of threads (OpenMP threads) per MPI process (--cpus-per-task), if it should not be 256
  • specify the version of GROMACS to run after module load (see Supported versions)
  • change -ntomp and -ntmpi values, if necessary (ntomp x ntmpi = cpus-per-task)

You may download a copy of that template: GROMACS 2021 thread-MPI | GROMACS 2022 thread-MPI.

Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_thread_mpi.batch) and submit it to the queue:

cd /discofs/$USER/run_gromacs/
sbatch gromacs_thread_mpi.batch

Upon successful submission, the standard output will be directed by Slurm into the file /discofs/$USER/run_gromacs/slurm.%j.out (where %j stands for the Slurm job ID), while the standard error output will be stored in /discofs/$USER/run_gromacs/slurm.%j.err. Note that the following content will appear (among the other lines) inside the file used for capturing the standard error (/discofs/$USER/run_gromacs/slurm.%j.out):

Using 128 MPI threads
Using 2 OpenMP threads per tMPI thread

which is and indication for running thread-MPI mdrun process.

Running GROMACS tools

GROMACS tools should be used to compose TPR file (before running mdrun), or handle the pre-processing of the trajectories after the completion of the simulation.

To run any of the tools, modify the following Slurm batch template:

#!/bin/bash
#
#SBATCH --partition=cn         ### Partition (you may need to change this)
#SBATCH --job-name=gromacs_on_1_node
#SBATCH --time=512:00:00       ### WallTime - set it accordningly

#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

#SBATCH --nodes           1    # MUST BE 1
#SBATCH --ntasks-per-node 1    # MUST BE 1
#SBATCH --cpus-per-task 256    # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
                               # which is ntomp x ntmpi (see gmx mdrun line below)

#SBATCH -o slurm.%j.out        # STDOUT
#SBATCH -e slurm.%j.err        # STDERR

module purge
module load gromacs/2022/2022.4-intel-fftw3-openblas-nogpu-nompi

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=spread

cd $SLURM_SUBMIT_DIR

gmx tool_name options ..

Note

If you need to run simulations using the double precision non-MPI version of GROMACS, replace gmx with gmx_d.

Modify the parameters to specify the required resources for running the job:

  • Slurm partition of compute nodes, based on your project resource reservation (--partition)
  • job name, under which the job will be seen in the queue (--job-name)
  • wall time for running the job (--time)
  • version of GROMACS to run after module load (see Supported versions)
  • tool_name options .. should be replaced with the name of the GROMACS tool
  • .. should be replaced with the particular set of options to pass to the running tool

Save the complete Slurm job description as a file, for example /discofs/$USER/run_gromacs/gromacs_tool_name.batch, and submit it to the queue:

cd /discofs/$USER/run_gromacs/
sbatch gromacs_tool_name.batch

[Optional] Suitable compilers, tools, and libraries for compiling GROMACS code

Use the most recent version of CMake.

Unless your goal is to build your own FFTW library and avoid linking libgromacs to Intel oneMKL, you do not need to employ Intel oneAPI Compiler Collection.

The recent GNU Compiler Collection can link librgromacs to Intel oneAPI MKL and MPI and the produced mdrun binary code will show excellent productivity.

Note

Since version 2022 the GROMACS source code should be compiled by LLVM compilers (that includes Intel oneAPI LLVM C/C++ compilers icx and icpx), while some classic compilers are still supported (legacy).

If you want to compile the GROMACS code by yourself, it might be helpful to check the recipes followed by the Discoverer HPC team:

https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/gromacs

Scalability test (CPU vs CPU+GPU)

Important

At the present time, the Discoverer’s compute nodes do not possess GPU modules. Nevertheless, GROMACS simulations enjoy high productivity there.

To roughly check if the GROMACS MD simulations run reasonably fast on Discoverer, one can compare the completion time of a model simulation, executed on the CPU-only based partition of Discoverer, to the one measured based on running the same simulation on a modern GPU-accelerated cluster, with nodes equipped with NVIDIA V100 tensor core GPU.

Below, we provide data for a model simulation, which completion time depends on very intensive computation of non-bonded interactions between the participating atoms. One can expect that adding powerful GPU accelerators to a multicore CPU system may shorten the completion time significantly. While the latter may be the case for running molecular dynamics simulations on certain complex molecular systems, the CPU-only MD runs on Discoverer are still very competitive.

Content of the simulation box and MD integration settings

The simulation box accommodates 347814 atoms, engaged into an entity of 44685 water molecules (based on TIP3P water model), 7938 propylene glycol molecules, and 1701 1-monoolein molecules. The GRO formatted file, containing the atom coordinates of all atoms, along with the components of the box vectors, is available for download at:

https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/GROMACS/benchmarking/sponge_GMO_PGL_5_5-equil_20.gro

The TPR file, containing the adopted force field parameters and MD integration settings, is also available for download:

https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/GROMACS/benchmarking/sponge_GMO_PGL_5_5-equil_20.tpr

Completion times

Discoverer’s CPU-only nodes: 26.48 ± 0.83 hours (9 compute nodes, 1152 CPU cores (see doc:resource_overview), 2304 CPU threads, 200 Gbps IB / 18% utilization)

External Hybrid CPU&GPU nodes: 52.15 ± 0.74 hours (4 compute nodes, 96 CPU cores on Intel Xeon Gold 6226, 192 CPU threads, 12 x NVIDIA V100 tensor core GPU, 56 Gbps IB / 77% utilization)

Getting help

See Getting help