GROMACS
=======

.. toctree::
   :maxdepth: 1
   :caption: Contents:

Versions available
------------------

Supported versions
..................

.. note:: The versions of GROMACS installed in the software repository are built and supported by the Discoverer HPC team. The MPI builds should be employed for running the actual simulations (mdrun) and deriving trajectories, while the threadMPI ones should be regarded mostly a tool set for trajectory post-processing.

To check which GROMACS versions are currently supported on Discoverer, execute on the login node:

.. code-block:: bash

 module avail gromacs

The following environment module naming convention is applied for the modules servicing the access to the software repository:

.. code-block:: bash

 gromacs/MAJOR_N/MAJOR_N.MINOR_N-comp-num_lib-gpuavail-mpi_lib

where:

  - ``MAJOR_N`` - the major number of the GROMACS version (example: 2022)
  - ``MINOR_N`` - the minor number of the GROMACS version (example: 1, which stands for 2022.1)
  - ``comp`` - the compiler collection employed for compiling the source code (example: intel)
  - ``num_lib`` - the numerical methods' library providing BLAS and FFTW the libgromacs is linked against (example: openblas)
  - ``gpuavail`` - shows if the build supports GPU acceleration (example: nogpu, which means no GPU support)
  - ``mpi_lib`` - the MPI library the GROMACS code is linked against (example: openmpi, which implies the use of Open MPI library)

The installed versions are compiled based on the following recipes:

https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/gromacs

.. note:: We're moving towards adopting LLVM as the main compiler collection used to compile programming code on Discoverer. This implies that we will discontinue the use of Intel oneAPI and GCC compiler collections for building code for the GROMACS installed in our software repository. This discontinuity also has an impact on the compilation of the external libraries utilized by GROMACS, specifically :doc:`openblas`, :doc:`fftw3`, and :doc:`zlib`.

.. note:: The :doc:`zlib` library used to build and run GROMACS code is, in fact, :doc:`zlib-ng`, specifically built to resemble Zlib. Moving towards Zlib-ng means that the utilized compression algorithms will be supported by hardware acceleration whenever they are executed on our AVX2-supporting processors.

User-supported versions
.......................

Users are welcome to bring, or compile, and use their own builds of GROMACS but **those builds will not be supported by Discoverer HPC team.**

Running simulations (mdrun)
---------------------------

Running simulations means invoking ``mdrun`` for generating trajectories based on a given TPR file.

.. warning:: **You MUST NOT execute simulation directly upon the login node (login.discoverer.bg).** You have to run your simulations as Slurm jobs only.

.. warning:: Write your trajectories and result of analysis only inside your :doc:`scratchfolder` and DO NOT use for that purpose (under any circumstances) your :doc:`homefolder`!

Multi-node (MPI)
................

To organize your GROMACS multi-node MPI simulations, use the following Slurm batch template:

.. code:: bash

 #!/bin/bash
 #
 #SBATCH --partition=cn         ### Partition (you may need to change this)
 #SBATCH --job-name=gromacs_on_8_nodes
 #SBATCH --time=512:00:00       ### WallTime - set it accordningly

 #SBATCH --account=<specify_your_slurm_account_name_here>
 #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

 #SBATCH --nodes           8    # May vary
 #SBATCH --ntasks-per-core 1    # Bind one MPI tasks to one CPU core
 #SBATCH --ntasks-per-node 128  # Must be less/equal to the number of CPU cores
 #SBATCH --cpus-per-task   2    # Must be 2, unless you have a better guess

 #SBATCH -o slurm.%j.out        # STDOUT
 #SBATCH -e slurm.%j.err        # STDERR

 module purge
 module load gromacs/2024/2024.5-cross-fftw3-openblas-nogpu-openmp

 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
 export OMP_PLACES=cores
 export OMP_PROC_BIND=spread
 export UCX_NET_DEVICES=mlx5_0:1

 cd $SLURM_SUBMIT_DIR

 mpirun gmx_mpi mdrun -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
                                                      # prefix of your TRR file

Specify the parameters and resources required for successfully running and completing the job:

 - Slurm partition of compute nodes, based on your project resource reservation (``--partition``)
 - job name, under which the job will be seen in the queue (``--job-name``)
 - wall time for running the job (``--time``)
 - number of occupied compute nodes (``--nodes``)
 - number of MPI proccesses per node (``--ntasks-per-node``)
 - number of threads (OpenMP threads) per MPI process (``--cpus-per-task``)
 - version of GROMACS to run after ``module load`` (see `Supported versions`_)

.. note:: The number of MPI processes in total (across the nodes) should not exceed those assumed by the domain decomposition. Using this template, one may achieve maximum thread affinity on AMD Zen2 CPUs.

You may download a copy of that template: `GROMACS 2021 multi-node`_ | `GROMACS 2022 multi-node`_.

Save the complete Slurm job description as a file, for example ``/discofs/$USER/run_gromacs/gromacs_multinode_mpi.batch``, and submit it to the queue:

.. code:: bash

 cd /discofs/$USER/run_gromacs/
 sbatch gromacs_multinode_mpi.batch

Upon successful submission, the standard output will be directed by Slurm into the file ``/discofs/$USER/run_gromacs/slurm.%j.out`` (where ``%j`` stands for the Slurm job ID), while the standard error output will be stored in ``/discofs/$USER/run_gromacs/slurm.%j.err``.

Single-node (MPI)
.................

In the template proposed in `Multi-node (MPI)`_ set ``#SBATCH --nodes           1``.

Single-node (tMPI)
..................

Thread-MPI is an internal MPI protocol. Unlike the classic MPI implementations (Intel MPI, MPICH, OpenMPI), it performs MPI processing based on threads, and does not operate across multiple compute nodes.

Running thread-MPI ``mdrun`` simulation is a bit "tricky" when it comes to describing the number of cores and logical CPUs in the resource reservation section of the Slurm batch file.

If ``mdrun`` runs as thread-MPI process with `N` MPI "tasks" and `M` OpenMP threads per MPI task, and if each MPI task is considered one thread, then the Slurm resource management has to handle one task running `N` x `M` threads. Since that idea contradicts OpenMP model, no ``export OMP_NUM_THREADS`` should remain in the Slurm batch (neither the other exports from the template above):

.. code:: bash

 #!/bin/bash
 #
 #SBATCH --partition=cn         ### Partition (you may need to change this)
 #SBATCH --job-name=gromacs_on_1_node
 #SBATCH --time=512:00:00       ### WallTime - set it accordningly

 #SBATCH --account=<specify_your_slurm_account_name_here>
 #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

 #SBATCH --nodes           1    # MUST BE 1
 #SBATCH --ntasks-per-node 1    # MUST BE 1
 #SBATCH --cpus-per-task 256    # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
                                # which is ntomp x ntmpi (see gmx mdrun line below)

 #SBATCH -o slurm.%j.out        # STDOUT
 #SBATCH -e slurm.%j.err        # STDERR

 module purge
 module load gromacs/2024/2024.5-cross-fftw3-openblas-nogpu-threadmpi

 export OMP_PLACES=cores
 export OMP_PROC_BIND=spread

 cd $SLURM_SUBMIT_DIR

 gmx mdrun -ntomp 2 -ntmpi 128 -v -s prefix.tpr -deffnm prefix # replace `prefix` with the
                                                               # prefix of your TRR file

Modify the parameters to specify the required resources for running the job:

 - Slurm partition of compute nodes, based on your project resource reservation (``--partition``)
 - job name, under which the job will be seen in the queue (``--job-name``)
 - wall time for running the job (``--time``)
 - number of threads (OpenMP threads) per MPI process (``--cpus-per-task``), if it should not be 256
 - specify the version of GROMACS to run after ``module load`` (see `Supported versions`_)
 - change ``-ntomp`` and ``-ntmpi`` values, if necessary (``ntomp`` x ``ntmpi`` = ``cpus-per-task``)

You may download a copy of that template: `GROMACS 2021 thread-MPI`_ | `GROMACS 2022 thread-MPI`_.

Save the complete Slurm job description as a file, for example ``/discofs/$USER/run_gromacs/gromacs_thread_mpi.batch``) and submit it to the queue:

.. code:: bash

 cd /discofs/$USER/run_gromacs/
 sbatch gromacs_thread_mpi.batch

Upon successful submission, the standard output will be directed by Slurm into the file ``/discofs/$USER/run_gromacs/slurm.%j.out`` (where ``%j`` stands for the Slurm job ID), while the standard error output will be stored in ``/discofs/$USER/run_gromacs/slurm.%j.err``. Note that the following content will appear (among the other lines) inside the file used for capturing the standard error (``/discofs/$USER/run_gromacs/slurm.%j.out``):

.. code:: bash

 Using 128 MPI threads
 Using 2 OpenMP threads per tMPI thread

which is and indication for running thread-MPI ``mdrun`` process.

Running GROMACS tools
---------------------

`GROMACS tools`_ should be used to compose TPR file (before running ``mdrun``), or handle the pre-processing of the trajectories `after` the completion of the simulation.

To run any of the tools, modify the following Slurm batch template:

.. code:: bash

 #!/bin/bash
 #
 #SBATCH --partition=cn         ### Partition (you may need to change this)
 #SBATCH --job-name=gromacs_on_1_node
 #SBATCH --time=512:00:00       ### WallTime - set it accordningly

 #SBATCH --account=<specify_your_slurm_account_name_here>
 #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>

 #SBATCH --nodes           1    # MUST BE 1
 #SBATCH --ntasks-per-node 1    # MUST BE 1
 #SBATCH --cpus-per-task 256    # N MPI threads x M OpenMP threads (128 * 2 for AMD EPYC 7H12)
                                # which is ntomp x ntmpi (see gmx mdrun line below)

 #SBATCH -o slurm.%j.out        # STDOUT
 #SBATCH -e slurm.%j.err        # STDERR

 module purge
 module load gromacs/2024/2024.5-cross-fftw3-openblas-nogpu-threadmpi

 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
 export OMP_PLACES=cores
 export OMP_PROC_BIND=spread

 cd $SLURM_SUBMIT_DIR

 gmx tool_name options ..

Modify the parameters to specify the required resources for running the job:

 - Slurm partition of compute nodes, based on your project resource reservation (``--partition``)
 - job name, under which the job will be seen in the queue (``--job-name``)
 - wall time for running the job (``--time``)
 - version of GROMACS to run after ``module load`` (see `Supported versions`_)
 - ``tool_name options ..`` should be replaced with the name of the `GROMACS tool`_
 - ``..`` should be replaced with the particular set of options to pass to the running tool

Save the complete Slurm job description as a file, for example ``/discofs/$USER/run_gromacs/gromacs_tool_name.batch``, and submit it to the queue:

.. code:: bash

 cd /discofs/$USER/run_gromacs/
 sbatch gromacs_tool_name.batch

Scalability test (CPU vs CPU+GPU)
---------------------------------

.. important:: At the present time, the Discoverer's compute nodes do not possess GPU modules. Nevertheless, GROMACS simulations enjoy high productivity there.

To roughly check if the GROMACS MD simulations run reasonably fast on Discoverer, one can compare the completion time of a model simulation, executed on the CPU-only based partition of Discoverer, to the one measured based on running the same simulation on a modern GPU-accelerated cluster, with nodes equipped with `NVIDIA V100 tensor core GPU`_.

Below, we provide data for a model simulation, which completion time depends on very intensive computation of non-bonded interactions between the participating atoms. One can expect that adding powerful GPU accelerators to a multicore CPU system may shorten the completion time significantly. While the latter may be the case for running molecular dynamics simulations on certain complex molecular systems, the CPU-only MD runs on Discoverer are still very competitive.

Content of the simulation box and MD integration settings
.........................................................

The simulation box accommodates 347814 atoms, engaged into an entity of 44685 water molecules (based on `TIP3P water model`_), 7938 `propylene glycol`_  molecules, and 1701 `1-monoolein`_ molecules. The `GRO formatted`_ file, containing the atom coordinates of all atoms, along with the components of the box vectors, is available for download at:

https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/GROMACS/benchmarking/sponge_GMO_PGL_5_5-equil_20.gro

The TPR file, containing the adopted force field parameters and MD integration settings, is also available for download:

https://gitlab.discoverer.bg/vkolev/snippets/-/raw/main/GROMACS/benchmarking/sponge_GMO_PGL_5_5-equil_20.tpr

Completion time on CPU nodes
............................

**23.10 ± 0.71 hours** (8 compute nodes, 1152 CPU cores (see :doc:`resource_overview`), 2304 CPU threads, 200 Gbps IB / 18% utilization). The published completion time was computed based on six independent simulations, each performed on a different set of randomly selected eight nodes included in the “cn” partition.

Getting help
------------

See :doc:`help`


.. _`GROMACS 2021 multi-node`: https://gitlab.discoverer.bg/vkolev/recipes/-/blob/main/gromacs/2021/gromacs-2021.slurm.multi-node.mpi.batch 
.. _`GROMACS 2022 multi-node`: https://gitlab.discoverer.bg/vkolev/recipes/-/blob/main/gromacs/2022/gromacs-2022.slurm.multi-node.mpi.batch

.. _`GROMACS 2021 thread-MPI`: https://gitlab.discoverer.bg/vkolev/recipes/-/blob/main/gromacs/2021/gromacs-2021.slurm.thread.mpi.batch
.. _`GROMACS 2022 thread-MPI`: https://gitlab.discoverer.bg/vkolev/recipes/-/blob/main/gromacs/2022/gromacs-2022.slurm.thread.mpi.batch

.. _`GROMACS tools`: https://manual.gromacs.org/documentation/current/user-guide/cmdline.html#commands-by-name
.. _`GROMACS tool`: https://manual.gromacs.org/documentation/current/user-guide/cmdline.html#commands-by-name

.. _`propylene glycol`: https://en.wiktionary.org/wiki/propylene_glycol
.. _`TIP3P water model`: https://en.wikipedia.org/wiki/Water_model
.. _`1-monoolein`: https://pubchem.ncbi.nlm.nih.gov/compound/Glyceryl-monooleate
.. _`GRO formatted`: https://manual.gromacs.org/documentation/current/reference-manual/file-formats.html#gro
.. _`NVIDIA V100 tensor core GPU`: https://www.nvidia.com/en-us/data-center/v100/
.. _`Intel Xeon Gold 6226`: https://ark.intel.com/content/www/us/en/ark/products/193957/intel-xeon-gold-6226-processor-19-25m-cache-2-70-ghz.html
.. _`UCX`: https://openucx.org/
.. _`Open MPI`: https://www.open-mpi.org/
.. _`CUDA`: https://developer.nvidia.com/cuda-toolkit
.. _`GROMACS 2023 code`: https://ftp.gromacs.org/gromacs/gromacs-2023.tar.gz
.. _`Intel oneAPI LLVM compilers`: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html