WRF¶
Versions avaiable¶
Supported versions¶
Note
The versions of WRF installed in the software repository are built and supported by the Discoverer HPC team.
To check which WRF versions are currently supported on Discoverer, execute on the login node:
module avail
and grep the output for “wrf”.
Important
The WRF builds available in the Discoverer HPC software repository employ dual “dm+sm” parallelism (Distributed Memory + Shared Memory). In that case, the defined number of OpenMP threads per MPI process implies the number of tiles during the WRF execution. Refer to the WRF documentation for more details on that topic, if you are interested.
User-supported versions¶
Users are welcome to bring, or compile, and use their own builds of WRF, but those builds will not be supported by the Discoverer HPC team.
Running WRF¶
Warning
You MUST NOT execute simulations directly upon the login node (login.discoverer.bg). You have to run your simulations as Slurm jobs only.
Warning
Write the results only inside your Personal scratch and storage folder (/discofs/username) and DO NOT use for that purpose (under any circumstances) your Home folder (/home/username)!
Slurm batch template¶
To run WRF as a Slurm batch job, you may use the following template:
#!/bin/bash
#
#SBATCH --partition=cn # Name of the partition of nodes (as the support team)
#SBATCH --job-name=wrf_1
#SBATCH --time=00:50:00 # The job completes for ~ 6 min
#SBATCH --nodes 2 # Two nodes will be used
#SBATCH --ntasks-per-node 128 # Use all 128 CPU cores on each node
#SBATCH --ntasks-per-core 1 # Run only one MPI process per CPU core
#SBATCH --cpus-per-task 2 # Number of OpenMP threads per MPI process
#SBATCH -o slurm.%j.out # STDOUT
#SBATCH -e slurm.%j.err # STDERR
ulimit -Hs unlimited
ulimit -Ss unlimited
module purge
module load wrf/4/4.4-nvidia-openmpi
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PROC_BIND=false
export OMP_SCHEDULE='STATIC'
export OMP_WAIT_POLICY='ACTIVE'
export UCX_NET_DEVICES=mlx5_0:1
mpirun -np 4 real.exe # Process the input NC data sets, here you do not
# need more than 4 MPI processes
mpirun wrf.exe # Run the actual simulation
Specify the parmeters and resources required for successfully running and completing the job:
- Slurm partition of compute nodes, based on your project resource reservation (
--partition
)- job name, under which the job will be seen in the queue (
--job-name
)- wall time for running the job (
--time
)- number of occupied compute nodes (
--nodes
)- number of MPI proccesses per node (
--ntasks-per-node
)- number of threads (OpenMP threads) per MPI process (
--cpus-per-task
)- version of WRF to run after
module load
(see Supported versions)
Note
The requested number of MPI processes per node should not be greater than 128 (128 is the number of CPU cores per compute node, see Resource Overview).
You need to submit the Slurm batch job script to the queue from within the folder where the input NC and namelist.input
files reside. Check the provided working example (see below) to find more details about how to create a complete Slurm batch job script for running WRF.
Working example¶
The goal of this working example is to show one possible way WRF can run on Discoverer HPC by means of Slurm batch job. Running the example is simple - just execute on the login node (login.discoverer.bg):
sbatch /opt/software/WRF/4/4.4-nvidia-openmpi/examples/1.batch
Once started successfully by Slurm, that job will create first a directory under your Personal scratch and storage folder (/discofs/username). The name of the directory will be similar to this one: wrf_2022-06-21-22-21-06-52.1655836192
(numbers should be different in your case). You may check the advance of the simulation by entering that directory and execute there:
tail -f em_real/rsl.error.0000
Getting help¶
See Getting help