NVIDIA HPC SDK (former PGI Compilers)

Supported versions

To check which NVIDIA HPC SDK versions are currently installed on Discoverer, execute on the login node:

module load nvidia
module avail nvhpc

Loading

To obtain access to the latest NVIDIA HPC SDK load the environment module nvhpc/latest after loading nvidia:

module load nvidia
module load nvhpc/latest

LLVM compilers

Warning

NVIDIA HPC SDK comes with LLVM compilers only! Even if they are compatible with the older PGI compilers by name and list of legacy compiler flags, they may not be able to compile any ancient code was once fully compatible with the original old PGI compilers.

nvc
nvc++
nnfortran

Note

Unless it is set by the developers of the source code to compile, do not employ nvcc compiler. Use nvc instead.

NVIDIA optimized BLAS and LAPACK (CPU-version)

Along with the GPU optimizing BLAS and LAPACK, NVIDIA HPC SDK installation also comes with optimized CPU-only versions of BLAS and LAPACK.

To gain access to their installation tree, load the corresponding module:

module load nvidia
module load nvhpc/latest

Those libraries comes with their static and dynamics versions:

libblas_ilp64.a
libblas_ilp64.so
libblas_lp64.a
libblas_lp64.so
liblapack_ilp64.a
liblapack_ilp64.so
liblapack_lp64.a
liblapack_lp64.so

The names of those files contain a suffix that denotes the sizes of the arrays (number of elements) that can be handled by those libraries. The suffix ilp64 denotes the capability of the library to index large arrays comprising more than 231-1 elements, whereas the lp64 version of libraries are capable of indexing arrays comprising a number of elements less or equal to the upper range of the 32-bit integer type (up to 231-1 elements).

By default the symlinks:

libblas.a
libblas.so
liblapack.a
liblapack.so

point to the lp64 version of the libraries (indexing up to 2^31-1 elements). That means that if you pass the following flags to the linker:

-lblas -llapack

the produced binary code will be linked against libblas_lp64 and liblapack_lp64 (statcally or dynamically, depending on the selected type of the linking).

In case one needs to link the code agains the ilp64 versions of the libraries (to handle arrays with more than 231-1 elements), the following options have to be passed to the linker:

-lblas_ilp64 -llapack_ilp64

If you want to do that through the environment, add the flags to the value of LDFLAGS:

export LDFLAGS+=" -lblas_ilp64 -llapack_ilp64"

If your linker is part of the NVIDIA HPC SDK compiler collerion, then you don’t have to append -L/path flag to the LDLFAGS environmental library. But in case you need to perform a cross-compilation, then setting the path to the library location is necessary. One opportunity to find out that path is to list the value of LD_LIBRARY_PATH after the corresponding environment have been loaded, for example:

export LDFLAGS+=" -L/opt/software/nvidia/hpc_sdk/Linux_x86_64/24.5/compilers/lib -lblas_ilp64 -llapack_ilp64"

If you want to use CMake to compile a code using the LDFLAGS environment library, be sure to go over this document:

https://cmake.org/cmake/help/latest/envvar/LDFLAGS.html

Compiler optimization flags for AMD Zen2 CPU microarchitecture

Note

The compute nodes of Discoverer HPC are equipped with AMD EPYC 7H12 64-Core processors, which implies AMD Zen2 CPU architecture.

The following compiler flags can be useful during the compile-time optimization of your binary code on AMD Zen2:

-tp zen2

For example (directly invoking the CPU microarchitecture optimization):

nvc -tp zen2 ...
nvc++ -tp zen2 ...
nvfortran -tp zen2 ...

Recent versions of NVIDIA HPC SDK compilers also support -march and -mtune flags, which implies the “classic” way of passing the CPU microarchitecture flags to the compiler:

nvc -march=znver2 -mtune=znver2 ...
nvc++ -march=znver2 -mtune=znver2 ...
nvfortran -march=znver2 -mtune=znver2 ...

To pass the flags for the microarchitecture to the compilers through the environment one may add to the values of CFLAGS, CXXFLAGS, or FCFLAGS environmental variables the following:

CFLAGS+=" -tp zen2"
CXXFLAGS+=" -tp zen2"
FCFLAGS+=" -tp zen2"
CFLAGS+=" -march=znver2 -mtune=znver2"
CXXFLAGS+=" -march=znver2 -mtune=znver2"
FCFLAGS+=" -march=znver2 -mtune=znver2"

More on the supported compiler flags: NVIDIA HPC SDK Documentation

Interaction with CMake

It is recommended to specify the compiler executables when invoking cmake tool:

-DCMAKE_C_COMPILER=nvc
-DCMAKE_CXX_COMPILER=nvc++
-DCMAKE_Fortran_COMPILER=nvfortran

The corresponding optimization compiler flags can be passed to cmake as well:

-DCMAKE_C_FLAGS="-tp zen2 ${CFLAGS}"
-DCMAKE_CXX_FLAGS="-tp zen2 ${CXXFLAGS}"
-DCMAKE_Fortran_FLAGS="-tp zen2 ${FCFLAGS}"

or

-DCMAKE_C_FLAGS="-march=znver2 -mtune=znver2 ${CFLAGS}"
-DCMAKE_CXX_FLAGS="-march=znver2 -mtune=znver2 ${CXXFLAGS}"
-DCMAKE_Fortran_FLAGS="-march=znver2 -mtune=znver2 ${FCFLAGS}"

Getting help

See Getting help