AmberTools (CPU)¶
Table of Contents
About¶
According to the AmberTools website, AmberTools is a comprehensive suite of biomolecular simulation tools that works alongside the AMBER molecular dynamics package. It provides a collection of programs for setting up, running, and analysing molecular dynamics simulations, with a focus on biomolecular systems such as proteins, nucleic acids, and small molecules.
AmberTools is freely available and open-source, providing extensive functionality for preparing simulations, analysing trajectories, and performing computational chemistry calculations. Unlike AMBER’s pmemd.MPI, AmberTools has no licensing restrictions and can be used by both academic and commercial users on Discoverer CPU cluster.
This document describes running AmberTools on Discoverer CPU cluster.
Documentation about how to use AmberTools is available here: https://ambermd.org/Manuals.php
Note
If you are looking for running pmemd.MPI on Discoverer CPU cluster, see pmemd.MPI (CPU).
Versions available¶
Currently we support the following versions of AmberTools:
- 24
- 25
To check which AmberTools versions are currently supported on Discoverer, execute on the login node:
module avail ambertools
AmberTools programs¶
AmberTools24 includes a wide variety of tools beyond the sander variants described above. Some of the most important ones are:
Structure preparation¶
tleap(tLEaP): Text-based LEaP for molecular structure preparation, topology generation, and system setup - Build and modify molecular structures - Assign force field parameters - Solvate systems (water, ions) - Create topology and coordinate files - Command-line interfacexleap(xLEaP): Graphical LEaP (X11-based) for molecular structure preparation - Same functionality as tleap with graphical user interface - Requires X11 display for GUI - Useful for interactive structure buildingparmed(ParmEd): Parameter file editor and molecular structure manipulation - Edit topology files - Add/remove atoms, bonds, angles, dihedrals - Modify force field parameters - Combine multiple structures
Simulation analysis¶
cpptraj: Powerful trajectory analysis tool (formerly ptraj, serial version) - Analyse trajectories from multiple MD engines (AMBER, GROMACS, CHARMM, NAMD) - Calculate geometric properties (distances, angles, RMSD) - Hydrogen bond analysis - Secondary structure analysis - Clustering and principal component analysis - Extensive scripting capabilitiescpptraj.MPI: Parallel MPI version of cpptraj for multi-node trajectory analysis - Distribute analysis across multiple compute nodes - Suitable for large trajectories or computationally intensive analysescpptraj.OMP: OpenMP parallel version of cpptraj for shared-memory trajectory analysis - Uses threading for parallelisation within a single node - Suitable for multi-core workstationsprocess_mdout.perl: Extract and analyse energy data from AMBER MD output files
Binding free energy calculations¶
MMPBSA.py: MM-PBSA and MM-GBSA binding free energy calculations (serial version) - Calculate binding free energies using implicit solvent models - Decompose binding energies by residue - Perform per-residue and per-atom energy decompositions - Support for multiple MD enginesMMPBSA.py.MPI: Parallel MPI version of MMPBSA.py - Distribute MM-PBSA/MM-GBSA calculations across multiple compute nodes - Suitable for large systems or multiple trajectory analysis
Quantum mechanics / molecular mechanics¶
sqm: Semi-empirical quantum mechanics program (AM1, PM3, AM1-D, PM3-D methods, serial version) - Geometry optimisations - Energy and force calculationssqm.MPI: Parallel MPI version of sqm - Distribute QM calculations across multiple compute nodes - Suitable for larger QM regions or multiple QM calculationsmdgx: Molecular dynamics geometry and topology exchange tool (serial version) - Generate geometry and topology files - Convert between different formatsmdgx.MPI: Parallel MPI version of mdgx - Distribute processing across multiple compute nodesmdgx.OMP: OpenMP parallel version of mdgx - Uses threading for parallelisation within a single node
Utility programs¶
antechamber: Automatic atom type assignment and parameter generation for small molecules (serial version) - Generate GAFF parameters for organic molecules - Create parameter files for new compounds - Interface with quantum chemistry programsparmchk2: Check and generate Amber parameter files for molecules processed by antechamber - Validates GAFF parameters - Generates missing parametersreduce: Add missing hydrogens to PDB structures - Places hydrogens at optimal positions - Handles protonation statespdb4amber: Prepare PDB files for Amber simulations - Removes non-standard residues - Fixes common PDB format issues - Prepares structures for leappackmol: Pack molecules into defined regions (solvation, membrane insertion) - Solvate systems with water - Insert molecules into membranes - Generate mixed-solvent systemspackmol-memgen: Generate membrane configurations using packmolambpdb: Convert Amber topology/coordinate files to PDB format - Extract coordinates from trajectory files - Convert topology files to PDBambmask: Manipulate Amber mask expressions - Test and validate mask syntax - Useful for advanced Amber scriptingquick: Semi-empirical quantum mechanics calculations (serial version) - QM/MM calculations - Geometry optimisationsquick.MPI: Parallel MPI version of quick for QM/MM calculationsgem.pmemd: Generalized Ensemble Methods (GEM) for enhanced sampling (serial version) - Temperature replica exchange - Hamiltonian replica exchangegem.pmemd.MPI: Parallel MPI version of gem.pmemd for multi-node GEM simulations
Additional analysis tools¶
pbsa: Poisson-Boltzmann surface area calculations - Calculate solvation free energies - Electrostatic calculationsgbnsr6: Generalized Born (GB) calculations using GB-Neck2 model - Implicit solvent calculations - Solvation free energy calculationssimplepbsa: Simplified PB calculations (serial version) - Fast PB approximations - Binding energy calculationssimplepbsa.MPI: Parallel MPI version of simplepbsarism1d: One-dimensional reference interaction site model - Solvation structure analysis - Thermodynamic propertiesrism3d.snglpnt: Three-dimensional RISM (serial version) - 3D solvation structure - Site-site correlation functionsrism3d.snglpnt.MPI: Parallel MPI version of rism3d.snglpntsaxs_md: Small-angle X-ray scattering analysis from MD trajectories (serial version) - Calculate SAXS profiles - Compare with experimental datasaxs_md.OMP: OpenMP parallel version of saxs_mdsaxs_rism: SAXS from RISM calculations (serial version) - Combine RISM and SAXS analysissaxs_rism.OMP: OpenMP parallel version of saxs_rismnmode: Normal mode analysis - Vibrational frequencies - Entropy calculationsmmpbsa_py_energy: Extract energy components from MMPBSA calculationsmmpbsa_py_nabnmode: NAB-based normal mode calculations for MMPBSA
Enhanced sampling and free energy methods¶
ndfes: Neural network-based free energy surfaces (serial version) - Enhanced sampling analysis - Free energy calculationsndfes.OMP: OpenMP parallel version of ndfesndfes-path: Path-based analysis for ndfes calculationsndfes-path.OMP: OpenMP parallel version of ndfes-pathndfes-AvgFESs.py: Average free energy surfaces from multiple simulationsndfes-CheckEquil.py: Check equilibrium in enhanced sampling simulationsndfes-CombineMetafiles.py: Combine metadynamics filesndfes-PrepareAmberData.py: Prepare Amber data for ndfes analysisndfes-PrintFES.py: Print free energy surfacesndfes-path-analyzesims.py: Analyse path simulationsndfes-path-prepguess.py: Prepare initial guesses for path calculationsedgembar: Energy decomposition group method BAR (serial version) - Free energy decomposition - Binding energy analysisedgembar.OMP: OpenMP parallel version of edgembaredgembar-WriteGraphHtml.py: Generate HTML graphs for edgembar resultsedgembar-amber2dats.py: Convert Amber data for edgembaredgembar-bookend2dats.py: Convert bookend data for edgembar
Parameter fitting and optimisation¶
paramfit: Parameter fitting for force field development (serial version) - Optimise force field parameters - Fit to quantum chemistry dataparamfit.OMP: OpenMP parallel version of paramfitresp: Restrained Electrostatic Potential fitting - Generate atomic charges from quantum chemistry - ESP fittingrespgen: Generate RESP input filesparmcal: Parameter calculation utilities
Python analysis and utility tools¶
MCPB.py: Metal Center Parameter Builder - Generate parameters for metal-containing systems - Fit metal-ligand interactionsCartHess2FC.py: Convert Cartesian Hessian to force constantsIPMach.py: Ion parameterisation machine learningOptC4.py: Optimise C4 parametersPdbSearcher.py: Search PDB structuresProScrs.py: Protein scoring utilitiesbar_pbsa.py: BAR method for PBSA calculationspy_resp.py: Python interface to RESP calculationspype-resp.py: Enhanced Python RESP interfacepyresp_gen.py: Generate RESP input filesceinutil.py,cpinutil.py,cpeinutil.py: Constant pH utilities - Constant pH MD setup - pH-dependent calculationscestats,cphstats: Constant pH statisticsfinddgref.py: Find reference free energy valuesfitpkaeo.py: Fit pKa valuesgenremdinputs.py: Generate replica exchange MD input filesmdout_analyzer.py: Analyse MD output filesmdout2pymbar.pl: Convert MD output to PyMBAR formatmetalpdb2mol2.py: Convert metal-containing PDB to MOL2 formatmol2rtf.py: Convert MOL2 to RTF formatcharmmlipid2amber.py: Convert CHARMM lipid parameters to Amber formatamb2chm_par.py,amb2chm_psf_crd.py: Convert Amber to CHARMM formatsamb2gro_top_gro.py: Convert Amber to GROMACS formatscar_to_files.py: Convert Cartesian coordinate files
Specialised utilities¶
AddToBox,ChBox: Manipulate simulation boxesPropPDB: PDB property calculationsUnitCell: Unit cell manipulationXrayPrep: Prepare structures for X-ray refinementadd_pdb,add_xray: Add structures from PDB or X-ray dataprocess_minout.perl: Process minimisation outputprocess_mdout.perl: Process MD output (already mentioned above)teLeap: Terminal-based LEaP (alternative interface)xaLeap: X11-based LEaP (alternative interface to xleap)ucpp: Utility for processing Amber filestest-api,test-api.MPI: API testing tools
Note: This is not an exhaustive list. AmberTools includes many more specialised tools and utilities. For a complete list of available tools, see the AmberTools documentation or check the bin directory of your installation.
Our AmberTools builds use Open MPI as the MPI library.
Features:
- Supports multi-node simulations
- Uses Open MPI for inter-node communication
- Compatible with SLURM multi-node job submission
- Can handle larger systems across multiple nodes
- Integrated with PLUMED2 for enhanced sampling methods
- Uses LLVM.org OpenMP runtime for optimal threading performance
- GUI support enabled (leap graphics, etc.)
Executable names: sander.MPI, sander, cpptraj, leap, parmed, antechamber, sqm, MMPBSA.py, and many others.
For more details see Multi-node run using MPI.
Important
Users are welcome to bring, or compile, and use their own builds of AmberTools but those builds will not be supported by Discoverer HPC team.
Build recipes, build logs, and build documentation for the AmberTools builds provided on Discoverer are available at the AmberTools build repository.
Running the tools¶
Running data analysis or simulations means invoking AmberTools executables (such as sander.MPI, sander, cpptraj, etc.) for preparing systems, running simulations, or analysing trajectories.
Warning
You MUST NOT execute simulations or data analysis tools directly upon the login node (login.discoverer.bg). You have to run your simulations as SLURM jobs only.
Warning
Write your trajectories or data files and result of analysis only inside your Per-project scratch and storage folder and DO NOT use for that purpose (under any circumstances) your Home folder (/home/username)!
Common AmberTools executables:
Sander molecular dynamics engines:
sander: Serial version of sander for small systems or testing (single-threaded)sander.MPI: Parallel MPI version of sander for multi-node molecular dynamics simulations across distributed memory systemssander.OMP: OpenMP parallel version of sander for shared-memory parallelisation using threading (single-node multi-core)sander.LES: Locally Enhanced Sampling (LES) version of sander. LES is an enhanced sampling method that allows selected atoms (e.g., side chains or ligands) to be represented by multiple copies, enabling more efficient conformational sampling. This version is serial (single-threaded)sander.LES.MPI: LES version with MPI parallelisation, combining Locally Enhanced Sampling with multi-node distributed memory parallelisationOther tools:
cpptraj: Trajectory analysis tool (serial version)cpptraj.MPI: Parallel MPI version of cpptraj for multi-node trajectory analysiscpptraj.OMP: OpenMP parallel version of cpptraj for shared-memory trajectory analysistleap: Text-based LEaP for structure preparation and topology generationxleap: Graphical LEaP (X11-based) for structure preparation and topology generationparmed: Parameter file editorantechamber: Automatic atom type assignment for small moleculessqm: Semi-empirical quantum mechanics program (serial version)sqm.MPI: Parallel MPI version of sqmmdgx: Molecular dynamics geometry and topology exchange tool (serial version)mdgx.MPI: Parallel MPI version of mdgxmdgx.OMP: OpenMP parallel version of mdgxPython tools:
Serial Python tools: -
MMPBSA.py: MM-PBSA and MM-GBSA binding free energy calculations (serial version, single-threaded)MPI-parallel Python tools: -
MMPBSA.py.MPI: Parallel MPI version of MMPBSA.py for multi-node binding free energy calculationsNote
Most Python tools in AmberTools (e.g.,
ante-MMPBSA.py,MCPB.py, etc.) are serial and run on a single CPU core. OnlyMMPBSA.py.MPIsupports MPI parallelisation.
Multi-node run using MPI¶
Note
The SLURM script displayed below applies only to those of the AmberTools tools that support MPI parallelisation:
- Fortran/C++ MPI tools:
sander.MPI,sander.LES.MPI,cpptraj.MPI,sqm.MPI,mdgx.MPI - Python MPI tools:
MMPBSA.py.MPI(see MPI-parallel Python tools)
For detailed guidelines on optimal resource allocation (number of nodes, tasks per node, memory requirements, etc.) based on system size, see Resource allocation guidelines.
This script is used for multi-node MPI runs, but you can use it on a single node as well (by setting --nodes=1 and --ntasks-per-node=N where N is the number of MPI ranks):
#!/bin/bash # #SBATCH --partition=cn # Partition (you may need to change this) #SBATCH --job-name=sander_mpi # Job name #SBATCH --time=512:00:00 # WallTime - set it accordingly #SBATCH --account=<specify_your_slurm_account_name_here> #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account> #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=64 # Number of MPI tasks to run upon each node #SBATCH --ntasks-per-socket=32 # Number of tasks per NUMA-bound socket #SBATCH --cpus-per-task=1 # Number of OpenMP threads per MPI rank (recommended: 1 for pure MPI) #SBATCH --ntasks-per-core=1 # Each MPI rank is bound to a CPU core #SBATCH --mem=251G # Do not exceed this on Discoverer CPU cluster #SBATCH -o slurm.%j.out # STDOUT #SBATCH -e slurm.%j.err # STDERR # Load required modules module purge || exit module load ambertools/24/24.0 || exit # Set OpenMP environment variables (if needed) export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} export OMP_PROC_BIND=close # Bind threads close to parent MPI process export OMP_PLACES=cores # Place threads on cores # Optimise InfiniBand communication (if available) export UCX_NET_DEVICES=mlx5_0:1 # Change to submission directory cd ${SLURM_SUBMIT_DIR} # Run MPI-parallel AmberTools tool # Examples: # - For Fortran/C++ tools: sander.MPI, cpptraj.MPI, sqm.MPI, mdgx.MPI # - For Python tools: MMPBSA.py.MPI # # OpenMPI options: # --map-by socket:PE=${OMP_NUM_THREADS} binds MPI processes to sockets # with PE (Processing Element) threads per MPI rank # --bind-to core binds each MPI rank to a CPU core # --report-bindings shows CPU binding (useful for debugging) # Example for Fortran/C++ MPI tool (sander.MPI): mpirun --map-by socket:PE=${OMP_NUM_THREADS} \ --bind-to core \ --report-bindings \ sander.MPI -O -i mdin -p prmtop.0 -o out.0 -c inpcrd.0 -r restrt.0 # Example for Python MPI tool (MMPBSA.py.MPI): # mpirun --map-by socket:PE=${OMP_NUM_THREADS} \ # --bind-to core \ # --report-bindings \ # MMPBSA.py.MPI -O -i mmpbsa.in -o mmpbsa.dat -sp complex.prmtop -cp complex.prmtop -lp ligand.prmtop -rp receptor.prmtop -y trajectory.nc
In the script above, edit the parameters and resources required for successfully running and completing the job. For detailed guidelines on optimal resource allocation, see Resource allocation guidelines.
- SLURM partition of compute nodes (
--partition): Specifies which group of nodes (partition) to use. For AmberTools on Discoverer, usecnpartition which contains the CPU-optimised nodes.- Job name (
--job-name): A descriptive name for your job that will appear in the queue. Use meaningful names likesander_protein_simorsander_membrane_run.- Wall time (
--time): Maximum time your job can run. Format isHH:MM:SS(e.g.,48:00:00for 48 hours). Set this based on your simulation size and expected runtime.- Number of compute nodes (
--nodes): How many physical nodes to allocate. For multi-node AmberTools simulations, this determines the total computational power available. See Resource allocation guidelines for recommendations based on system size.- Number of MPI processes per node (
--ntasks-per-node): Critical for AmberTools performance. On Discoverer with 8 NUMA domains per node, use 64 MPI tasks to get 8 tasks per NUMA domain for optimal memory locality. See Resource allocation guidelines for recommended values based on system size.- Number of MPI tasks per NUMA domain (
--ntasks-per-socket): Essential for NUMA-aware performance. Set to 32 to place exactly 32 MPI tasks per NUMA domain (64 total tasks ÷ 2 sockets per NUMA domain = 32 per domain). This ensures optimal memory access patterns and cache utilisation within each NUMA boundary.- Number of OpenMP threads per MPI process (
--cpus-per-task): Controls hybrid parallelism. Recommended value is 1 (pure MPI mode) since OpenMP usage in sander.MPI is limited. For information on how this affects CPU thread affinity and pinning, see CPU thread affinity and pinning.- AmberTools version (
module load): Choose the appropriate version based on your simulation requirements. See Versions available for available builds and their characteristics.
Save the complete SLURM job description as a file, for example /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/sander_mpi.sh, and submit it to the queue:
cd /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/ sbatch sander_mpi.sh
Upon successful submission, the standard output will be directed by SLURM into the file /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.out (where %j stands for the SLURM job ID), while the standard error output will be stored in /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.err.
Single-threaded execution¶
Note
The SLURM script displayed below applies only to those of the AmberTools tools that run on a single CPU core using one thread:
sander: Molecular dynamics simulationscpptraj: Trajectory analysissqm: Semi-empirical quantum mechanics programtleap: Protein preparation toolparmed: Protein structure manipulationantechamber: Automatic atom type assignment for small moleculesMMPBSA.py: MM-PBSA and MM-GBSA binding free energy calculationsmdgx: Molecular dynamics geometry and topology exchange tool
#!/bin/bash
#
#SBATCH --partition=cn # Partition of compute nodes
#SBATCH --job-name=sander_single_threaded # Job name
#SBATCH --time=01:00:00 # WallTime - set it accordingly
#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>
#SBATCH --nodes=1 # Single node
#SBATCH --ntasks=1 # Single task
#SBATCH --cpus-per-task=1 # One CPU per task
#SBATCH --mem=32G # Memory per task (increase if needed, typically 16-64G for small to medium systems)
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
# Load required modules
module purge || exit
module load ambertools/24/24.0 || exit
# Change to submission directory
cd ${SLURM_SUBMIT_DIR}
# Run serial sander
sander -O -i mdin -p prmtop.0 -o out.0 -c inpcrd.0 -r restrt.0
In the script above, edit the parameters and resources required for successfully running and completing the job.
- For serial tools:
- Use
--ntasks=1and--cpus-per-task=1for single-threaded tools - Adjust
--membased on system size (typically 16-64G for small to medium systems) - These tools run on a single CPU core and are suitable for small systems or testing
- Use
Save the complete SLURM job description as a file, for example /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/sander_serial.sh, and submit it to the queue:
cd /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/ sbatch sander_serial.sh
Upon successful submission, the standard output will be directed by SLURM into the file /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.out (where %j stands for the SLURM job ID), while the standard error output will be stored in /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.err.
Single-node run using OpenMP¶
This script is used for OpenMP-parallel tools that use threading on a single node:
sander.OMP: OpenMP molecular dynamics simulationscpptraj.OMP: OpenMP trajectory analysismdgx.OMP: OpenMP geometry/topology processing
Warning
OpenMP scaling is not guaranteed! The optimal number of OpenMP threads depends on many factors including algorithm efficiency, memory bandwidth, cache usage, and problem size. Always test different thread counts to find the optimal configuration for your specific system and workload. See OpenMP scaling considerations for more details.
#!/bin/bash
#
#SBATCH --partition=cn # Partition of compute nodes
#SBATCH --job-name=sander_omp # Job name
#SBATCH --time=01:00:00 # WallTime - set it accordingly
#SBATCH --account=<specify_your_slurm_account_name_here>
#SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account>
#SBATCH --nodes=1 # Single node
#SBATCH --ntasks=1 # Single task
#SBATCH --cpus-per-task=64 # Number of OpenMP threads (START WITH FEWER AND TEST SCALING!)
#SBATCH --mem=251G # Memory per task (do not exceed this on Discoverer CPU cluster)
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
# Load required modules
module purge || exit
module load ambertools/24/24.0 || exit
# Set OpenMP environment variables
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PROC_BIND=close # Bind threads close to parent process
export OMP_PLACES=cores # Place threads on cores
# Optional: Enable OpenMP verbose output for debugging
# export OMP_DISPLAY_ENV=VERBOSE
# Change to submission directory
cd ${SLURM_SUBMIT_DIR}
# Run OpenMP sander
sander.OMP -O -i mdin -p prmtop.0 -o out.0 -c inpcrd.0 -r restrt.0
In the script above, edit the parameters and resources required for successfully running and completing the job.
- For OpenMP tools:
- Use
--ntasks=1and--cpus-per-task=Nwhere N is the number of OpenMP threads (test scaling first!) - Set
OMP_NUM_THREADSequal to--cpus-per-task - Start with fewer threads (8-16) and test scaling before using higher thread counts
- Adjust
--membased on system size and number of threads - These tools use shared-memory threading and are suitable for single-node multi-core simulations
- Always test scaling: Run with different thread counts to find optimal performance
- Monitor wall-clock time and CPU utilisation to identify optimal thread count
- See OpenMP scaling considerations for detailed guidelines on testing and optimising OpenMP performance
- Use
Save the complete SLURM job description as a file, for example /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/sander_omp.sh, and submit it to the queue:
cd /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/ sbatch sander_omp.sh
Upon successful submission, the standard output will be directed by SLURM into the file /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.out (where %j stands for the SLURM job ID), while the standard error output will be stored in /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.err.
MPI-parallel Python tools¶
For MPI-parallel Python tools such as MMPBSA.py.MPI, use the multi-node MPI script. This script is identical to the one shown in the Multi-node run using MPI section, but with the tool replaced by the MPI-parallel Python tool.
For detailed guidelines on optimal resource allocation (number of nodes, tasks per node, memory requirements, etc.) based on system size, see Resource allocation guidelines.
#!/bin/bash # #SBATCH --partition=cn # Partition (you may need to change this) #SBATCH --job-name=mmpbsa_mpi # Job name #SBATCH --time=512:00:00 # WallTime - set it accordingly #SBATCH --account=<specify_your_slurm_account_name_here> #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account> #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=64 # Number of MPI tasks to run upon each node #SBATCH --ntasks-per-socket=32 # Number of tasks per NUMA-bound socket #SBATCH --cpus-per-task=1 # Number of OpenMP threads per MPI rank (recommended: 1 for pure MPI) #SBATCH --ntasks-per-core=1 # Each MPI rank is bound to a CPU core #SBATCH --mem=251G # Do not exceed this on Discoverer CPU cluster #SBATCH -o slurm.%j.out # STDOUT #SBATCH -e slurm.%j.err # STDERR # Load required modules module purge || exit module load ambertools/24/24.0 || exit # Set OpenMP environment variables (if needed) export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} export OMP_PROC_BIND=close # Bind threads close to parent MPI process export OMP_PLACES=cores # Place threads on cores # Optimise InfiniBand communication (if available) export UCX_NET_DEVICES=mlx5_0:1 # Change to submission directory cd ${SLURM_SUBMIT_DIR} # Run MMPBSA.py.MPI with OpenMPI # --map-by socket:PE=${OMP_NUM_THREADS} binds MPI processes to sockets # with PE (Processing Element) threads per MPI rank # --bind-to core binds each MPI rank to a CPU core # --report-bindings shows CPU binding (useful for debugging) mpirun --map-by socket:PE=${OMP_NUM_THREADS} \ --bind-to core \ --report-bindings \ MMPBSA.py.MPI -O -i mmpbsa.in -o mmpbsa.dat -sp complex.prmtop -cp complex.prmtop -lp ligand.prmtop -rp receptor.prmtop -y trajectory.nc
In the script above, edit the parameters and resources required for successfully running and completing the job. For detailed guidelines on optimal resource allocation, see Resource allocation guidelines.
- SLURM partition of compute nodes (
--partition): Specifies which group of nodes (partition) to use. For AmberTools on Discoverer, usecnpartition which contains the CPU-optimised nodes.- Job name (
--job-name): A descriptive name for your job that will appear in the queue. Use meaningful names likemmpbsa_protein_bindingormmpbsa_multitrajectory.- Wall time (
--time): Maximum time your job can run. Format isHH:MM:SS(e.g.,48:00:00for 48 hours). Set this based on your calculation size and expected runtime.- Number of compute nodes (
--nodes): How many physical nodes to allocate. For multi-nodeMMPBSA.py.MPIruns, this determines the total computational power available. See Resource allocation guidelines for recommendations based on system size.- Number of MPI processes per node (
--ntasks-per-node): Critical forMMPBSA.py.MPIperformance. On Discoverer with 8 NUMA domains per node, use 64 MPI tasks to get 8 tasks per NUMA domain for optimal memory locality. See Resource allocation guidelines for recommended values based on system size.- Number of MPI tasks per NUMA domain (
--ntasks-per-socket): Essential for NUMA-aware performance. Set to 32 to place exactly 32 MPI tasks per NUMA domain (64 total tasks ÷ 2 sockets per NUMA domain = 32 per domain). This ensures optimal memory access patterns and cache utilisation within each NUMA boundary.- Number of OpenMP threads per MPI process (
--cpus-per-task): Controls hybrid parallelism. Recommended value is 1 (pure MPI mode) forMMPBSA.py.MPIsince OpenMP usage in Python MPI tools is typically minimal. For information on how this affects CPU thread affinity and pinning, see CPU thread affinity and pinning.- AmberTools version (
module load): Choose the appropriate version based on your calculation requirements. See Versions available for available builds and their characteristics.
The OpenMP environment variables (OMP_NUM_THREADS, OMP_PROC_BIND, OMP_PLACES) are set in the script but are typically not used since MMPBSA.py.MPI runs in pure MPI mode with --cpus-per-task=1. These variables are included for consistency with other MPI tools and in case any Python libraries use OpenMP internally.
Note
MMPBSA.py.MPI uses MPI for parallelisation across multiple compute nodes, similar to sander.MPI. For optimal performance with large systems or multiple trajectories, use MMPBSA.py.MPI instead of serial MMPBSA.py.
Save the complete SLURM job description as a file, for example /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/mmpbsa_mpi.sh, and submit it to the queue:
cd /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/ sbatch mmpbsa_mpi.sh
Upon successful submission, the standard output will be directed by SLURM into the file /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.out (where %j stands for the SLURM job ID), while the standard error output will be stored in /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.err.
Python single-threaded tools¶
Note
Our AmberTools builds include some Python-based tools that run on a single CPU core using one thread, such as MMPBSA.py and ante-MMPBSA.py. They are compatible with Python 3.12.
When loading the environment module ambertools, Python 3.12 is loaded automatically and it is provieded by anaconda3 module (loaded as a dependency of ambertools).
This script is used for Python-based tools that may run serially or with limited parallelism:
MMPBSA.py: Serial MM-PBSA/MM-GBSA binding free energy calculationsante-MMPBSA.py: Pre-processing for MMPBSA- Other Python tools in AmberTools
#!/bin/bash # #SBATCH --partition=cn # Partition of compute nodes #SBATCH --job-name=mmpbsa_single_threaded # Job name #SBATCH --time=00:30:00 # WallTime - set it accordingly #SBATCH --account=<specify_your_slurm_account_name_here> #SBATCH --qos=<specify_the_qos_name_here_if_it_is_not_the_default_one_for_the_account> #SBATCH --nodes=1 # One node #SBATCH --ntasks=1 # One task per node #SBATCH --cpus-per-task=1 # One CPU per task #SBATCH --mem=2G # Memory per task (increase if needed, typically 2-16G for small to medium systems) #SBATCH -o slurm.%j.out # STDOUT #SBATCH -e slurm.%j.err # STDERR # Load required modules module purge || exit module load ambertools/24/24.0 || exit # Change to submission directory cd ${SLURM_SUBMIT_DIR} # Run MMPBSA.py MMPBSA.py -O -i mmpbsa.in -o mmpbsa.dat -sp complex.prmtop -cp complex.prmtop -lp ligand.prmtop -rp receptor.prmtop -y trajectory.nc
In the script above, edit the parameters and resources required for successfully running and completing the job.
- For Python tools:
- Use
--ntasks=1and--cpus-per-task=Nwhere N is the number of CPU cores (typically 4-16) - Some Python tools may use internal parallelisation
- Adjust
--membased on system size and tool requirements
- Use
Save the complete SLURM job description as a file, for example /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/mmpbsa.sh, and submit it to the queue:
cd /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/ sbatch sander_serial.sh
Upon successful submission, the standard output will be directed by SLURM into the file /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.out (where %j stands for the SLURM job ID), while the standard error output will be stored in /valhalla/projects/<your_slurm_project_account_name>/run_ambertools/slurm.%j.err.
Scalling and performance considerations¶
Here we provide some guidelines and considerations for scaling and performance of the tools. Always consider the specific system and workload when determining the optimal number of threads and resources. If you are not sure, please contact the Discoverer HPC team (see Getting help).
OpenMP scaling considerations¶
OpenMP parallelisation efficiency depends on several factors:
- Algorithm parallelisation: Some algorithms parallelise better than others. Not all code sections may benefit from threading.
- Memory bandwidth: As the number of threads increases, memory bandwidth may become a bottleneck, limiting scaling.
- Cache coherence: False sharing and cache line conflicts can degrade performance with too many threads.
- Problem size: Small problems may not benefit from many threads due to overhead. Larger problems typically scale better.
- NUMA topology: Thread placement across NUMA domains affects performance. Use
OMP_PROC_BINDandOMP_PLACESto control placement.
Testing OpenMP scaling¶
To determine the optimal number of OpenMP threads for your workload:
- Start with fewer threads: Begin testing with 8-16 threads, then gradually increase.
- Test multiple configurations: Run the same workload with different thread counts (e.g., 8, 16, 32, 64) and compare wall-clock times.
- Monitor performance: Check the output logs for: - Wall-clock time (total execution time) - CPU utilisation (are all threads being used?) - Memory bandwidth utilisation
- Calculate speedup: Speedup = Time(serial) / Time(threads). Efficiency = Speedup / Number_of_threads. Aim for efficiency > 50%.
- Watch for diminishing returns: If doubling threads doesn’t reduce runtime by at least 1.5×, you’ve likely hit diminishing returns.
Example scaling test script:
#!/bin/bash # Test OpenMP scaling by running the same job with different thread counts for threads in 8 16 32 64; do echo "Testing with ${threads} threads" #SBATCH --cpus-per-task=${threads} # ... rest of SLURM directives ... export OMP_NUM_THREADS=${threads} time sander.OMP -O -i mdin -p prmtop.0 -o out.0 -c inpcrd.0 -r restrt.0 echo "Completed ${threads} threads test" done
Recommended starting points:
- Small systems (<50k atoms): Start with 8-16 threads
- Medium systems (50k-100k atoms): Start with 16-32 threads
- Large systems (>100k atoms): Start with 32-64 threads
Note
Do not over-subscribe CPU cores! Set --cpus-per-task to no more than the number of physical CPU cores available on the compute node. Over-subscription (using more threads than cores) typically degrades performance due to context switching overhead.
CPU thread affinity and pinning¶
Our build of AmberTools uses Open MPI’s thread affinity management:
- Open MPI binding: Use
--map-by socket:PE=${OMP_NUM_THREADS}to bind MPI processes to sockets with PE (Processing Element) threads per MPI rank - Core binding: Use
--bind-to coreto bind each MPI rank to a CPU core - Thread affinity: OpenMP environment variables (
OMP_PROC_BIND,OMP_PLACES) control OpenMP thread affinity within each MPI rank
Note
For OpenMP-only tools (e.g., sander.OMP, cpptraj.OMP, mdgx.OMP), thread affinity settings are particularly important for performance. See OpenMP scaling considerations for detailed guidelines on testing and optimising OpenMP thread affinity and scaling.
Recommended OpenMPI settings:
mpirun --map-by socket:PE=${OMP_NUM_THREADS} \ --bind-to core \ --report-bindings \ sander.MPI -O -i mdin -p prmtop.0 -o out.0 -c inpcrd.0 -r restrt.0
Open MPI binding options:
--map-by socket:PE=${OMP_NUM_THREADS}: Maps MPI processes to sockets with PE threads per rank--bind-to core: Binds each MPI rank to a CPU core--report-bindings: Shows CPU binding (useful for debugging)
OpenMP thread affinity settings (when combined with Open MPI settings):
OMP_PROC_BIND=close: Binds threads close to parent MPI processOMP_PLACES=cores: Places threads on cores
Resource allocation guidelines¶
These resource allocation guidelines are specifically for MPI-parallel AmberTools executables that run across multiple compute nodes:
sander.MPI: Multi-node molecular dynamics simulationssander.LES.MPI: Multi-node LES simulationscpptraj.MPI: Multi-node trajectory analysisMMPBSA.py.MPI: Multi-node binding free energy calculationssqm.MPI: Multi-node QM calculationsmdgx.MPI: Multi-node geometry/topology processing
For achieving optimal performance when running these MPI-parallel AmberTools on Discoverer CPU cluster, you should follow the following guidelines. For details on CPU thread affinity and process pinning, see CPU thread affinity and pinning.
Note
Serial and OpenMP tools (sander, sander.OMP, cpptraj, cpptraj.OMP, etc.) typically run on single nodes or workstations and do not require the multi-node resource allocation strategies described here. For single-node OpenMP tools, use --cpus-per-task equal to the number of OpenMP threads you want (typically the number of cores available on a single node).
| Scenario | Nodes | Tasks/Node | Tasks/Socket | CPUs/Task | Total Cores | Use Case |
|---|---|---|---|---|---|---|
| Small system | 1 | 32 | 16 | 1 | 32 | <50k atoms |
| Medium system | 2 | 64 | 32 | 1 | 128 | 50k-100k atoms |
| Large system | 4 | 64 | 32 | 1 | 256 | 100k-200k atoms |
| Very large system | 8+ | 64 | 32 | 1 | 512+ | >200k atoms |
Guidelines:
- Number of nodes: Start with 1-2 nodes for small systems, scale up for larger systems
- Tasks per node: Use 32-64 MPI tasks per node depending on system size
- Tasks per socket: Set to distribute tasks evenly across NUMA domains (32 tasks per socket for 64 tasks/node)
- CPUs per task: Always use 1 (pure MPI mode) since OpenMP usage is typically minimal
- Memory: Do not exceed 251G per node on Discoverer CPU cluster
Total resource allocation calculations:
- Total MPI ranks = nodes × tasks-per-node
- Total CPU cores = nodes × tasks-per-node × cpus-per-task
- Example: 2 nodes × 64 tasks/node × 1 cpu/task = 128 cores
Build information¶
Build recipes, build logs, and build documentation for the AmberTools builds provided on Discoverer are available at the AmberTools build repository.