WRF

Versions avaiable

Supported versions

Note

The versions of WRF installed in the software repository are built and supported by the Discoverer HPC team.

To check which WRF versions are currently supported on Discoverer, execute on the login node:

module avail

and grep the output for “wrf”.

Important

The WRF builds available in the Discoverer HPC software repository employ dual “dm+sm” parallelism (Distributed Memory + Shared Memory). In that case, the defined number of OpenMP threads per MPI process implies the number of tiles during the WRF execution. Refer to the WRF documentation for more details on that topic, if you are interested.

User-supported versions

Users are welcome to bring, or compile, and use their own builds of WRF, but those builds will not be supported by the Discoverer HPC team.

Running WRF

Warning

You MUST NOT execute simulations directly upon the login node (login.discoverer.bg). You have to run your simulations as Slurm jobs only.

Warning

Write the results only inside your Personal scratch and storage folder (/discofs/username) and DO NOT use for that purpose (under any circumstances) your Home folder (/home/username)!

Slurm batch template

To run WRF as a Slurm batch job, you may use the following template:

#!/bin/bash
#
#SBATCH --partition=cn         # Name of the partition of nodes (as the support team)
#SBATCH --job-name=wrf_1
#SBATCH --time=00:50:00        # The job completes for ~ 6 min

#SBATCH --nodes           2    # Two nodes will be used
#SBATCH --ntasks-per-node 128  # Use all 128 CPU cores on each node
#SBATCH --ntasks-per-core 1    # Run only one MPI process per CPU core
#SBATCH --cpus-per-task   2    # Number of OpenMP threads per MPI process

#SBATCH -o slurm.%j.out        # STDOUT
#SBATCH -e slurm.%j.err        # STDERR

ulimit -Hs unlimited
ulimit -Ss unlimited

module purge
module load wrf/4/4.4-nvidia-openmpi

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PROC_BIND=false
export OMP_SCHEDULE='STATIC'
export OMP_WAIT_POLICY='ACTIVE'
export UCX_NET_DEVICES=mlx5_0:1

mpirun -np 4 real.exe # Process the input NC data sets, here you do not
                      # need more than 4 MPI processes

mpirun wrf.exe        # Run the actual simulation

Specify the parmeters and resources required for successfully running and completing the job:

  • Slurm partition of compute nodes, based on your project resource reservation (--partition)
  • job name, under which the job will be seen in the queue (--job-name)
  • wall time for running the job (--time)
  • number of occupied compute nodes (--nodes)
  • number of MPI proccesses per node (--ntasks-per-node)
  • number of threads (OpenMP threads) per MPI process (--cpus-per-task)
  • version of WRF to run after module load (see Supported versions)

Note

The requested number of MPI processes per node should not be greater than 128 (128 is the number of CPU cores per compute node, see Resource Overview).

You need to submit the Slurm batch job script to the queue from within the folder where the input NC and namelist.input files reside. Check the provided working example (see below) to find more details about how to create a complete Slurm batch job script for running WRF.

Working example

The goal of this working example is to show one possible way WRF can run on Discoverer HPC by means of Slurm batch job. Running the example is simple - just execute on the login node (login.discoverer.bg):

sbatch /opt/software/WRF/4/4.4-nvidia-openmpi/examples/1.batch

Once started successfully by Slurm, that job will create first a directory under your Personal scratch and storage folder (/discofs/username). The name of the directory will be similar to this one: wrf_2022-06-21-22-21-06-52.1655836192 (numbers should be different in your case). You may check the advance of the simulation by entering that directory and execute there:

tail -f em_real/rsl.error.0000

Getting help

See Getting help