NVIDIA HPC SDK (former PGI Compilers)¶
Supported versions¶
To check which NVIDIA HPC SDK versions are currently installed on Discoverer, execute on the login node:
module load nvidia
module avail nvhpc
Loading¶
To obtain access to the latest NVIDIA HPC SDK load the environment module nvhpc/latest
after loading nvidia
:
module load nvidia
module load nvhpc/latest
LLVM compilers¶
Warning
NVIDIA HPC SDK comes with LLVM compilers only! Even if they are compatible with the older PGI compilers by name and list of legacy compiler flags, they may not be able to compile any ancient code was once fully compatible with the original old PGI compilers.
nvc
nvc++
nnfortran
Note
Unless it is set by the developers of the source code to compile, do not employ nvcc
compiler. Use nvc
instead.
NVIDIA optimized BLAS and LAPACK (CPU-version)¶
Along with the GPU optimizing BLAS and LAPACK, NVIDIA HPC SDK installation also comes with optimized CPU-only versions of BLAS and LAPACK.
To gain access to their installation tree, load the corresponding module:
module load nvidia
module load nvhpc/latest
Those libraries comes with their static and dynamics versions:
libblas_ilp64.a
libblas_ilp64.so
libblas_lp64.a
libblas_lp64.so
liblapack_ilp64.a
liblapack_ilp64.so
liblapack_lp64.a
liblapack_lp64.so
The names of those files contain a suffix that denotes the sizes of the arrays (number of elements) that can be handled by those libraries. The suffix ilp64
denotes the capability of the library to index large arrays comprising more than 231-1 elements, whereas the lp64
version of libraries are capable of indexing arrays comprising a number of elements less or equal to the upper range of the 32-bit integer type (up to 231-1 elements).
By default the symlinks:
libblas.a
libblas.so
liblapack.a
liblapack.so
point to the lp64
version of the libraries (indexing up to 2^31-1 elements). That means that if you pass the following flags to the linker:
-lblas -llapack
the produced binary code will be linked against libblas_lp64
and liblapack_lp64
(statcally or dynamically, depending on the selected type of the linking).
In case one needs to link the code agains the ilp64
versions of the libraries (to handle arrays with more than 231-1 elements), the following options have to be passed to the linker:
-lblas_ilp64 -llapack_ilp64
If you want to do that through the environment, add the flags to the value of LDFLAGS
:
export LDFLAGS+=" -lblas_ilp64 -llapack_ilp64"
If your linker is part of the NVIDIA HPC SDK compiler collerion, then you don’t have to append -L/path
flag to the LDLFAGS
environmental library. But in case you need to perform a cross-compilation, then setting the path to the library location is necessary. One opportunity to find out that path is to list the value of LD_LIBRARY_PATH
after the corresponding environment have been loaded, for example:
export LDFLAGS+=" -L/opt/software/nvidia/hpc_sdk/Linux_x86_64/24.5/compilers/lib -lblas_ilp64 -llapack_ilp64"
If you want to use CMake to compile a code using the LDFLAGS
environment library, be sure to go over this document:
Compiler optimization flags for AMD Zen2 CPU microarchitecture¶
Note
The compute nodes of Discoverer HPC are equipped with AMD EPYC 7H12 64-Core processors, which implies AMD Zen2 CPU architecture.
The following compiler flags can be useful during the compile-time optimization of your binary code on AMD Zen2:
-tp zen2
For example (directly invoking the CPU microarchitecture optimization):
nvc -tp zen2 ...
nvc++ -tp zen2 ...
nvfortran -tp zen2 ...
Recent versions of NVIDIA HPC SDK compilers also support -march
and -mtune
flags, which implies the “classic” way of passing the CPU microarchitecture flags to the compiler:
nvc -march=znver2 -mtune=znver2 ...
nvc++ -march=znver2 -mtune=znver2 ...
nvfortran -march=znver2 -mtune=znver2 ...
To pass the flags for the microarchitecture to the compilers through the environment one may add to the values of CFLAGS
, CXXFLAGS
, or FCFLAGS
environmental variables the following:
CFLAGS+=" -tp zen2"
CXXFLAGS+=" -tp zen2"
FCFLAGS+=" -tp zen2"
CFLAGS+=" -march=znver2 -mtune=znver2"
CXXFLAGS+=" -march=znver2 -mtune=znver2"
FCFLAGS+=" -march=znver2 -mtune=znver2"
More on the supported compiler flags: NVIDIA HPC SDK Documentation
Interaction with CMake¶
It is recommended to specify the compiler executables when invoking cmake
tool:
-DCMAKE_C_COMPILER=nvc
-DCMAKE_CXX_COMPILER=nvc++
-DCMAKE_Fortran_COMPILER=nvfortran
The corresponding optimization compiler flags can be passed to cmake
as well:
-DCMAKE_C_FLAGS="-tp zen2 ${CFLAGS}"
-DCMAKE_CXX_FLAGS="-tp zen2 ${CXXFLAGS}"
-DCMAKE_Fortran_FLAGS="-tp zen2 ${FCFLAGS}"
or
-DCMAKE_C_FLAGS="-march=znver2 -mtune=znver2 ${CFLAGS}"
-DCMAKE_CXX_FLAGS="-march=znver2 -mtune=znver2 ${CXXFLAGS}"
-DCMAKE_Fortran_FLAGS="-march=znver2 -mtune=znver2 ${FCFLAGS}"
Getting help¶
See Getting help