NVIDIA HPC SDK (former PGI Compilers) ===================================== .. toctree:: :maxdepth: 1 :caption: Contents: Supported versions ------------------ To check which `NVIDIA HPC SDK`_ versions are currently installed on Discoverer, execute on the login node: .. code-block:: bash module load nvidia module avail nvhpc Loading ------- To obtain access to the latest NVIDIA HPC SDK load the environment module ``nvhpc/latest`` after loading ``nvidia``: .. code:: bash module load nvidia module load nvhpc/latest LLVM compilers -------------- .. warning:: **NVIDIA HPC SDK comes with LLVM compilers only!** Even if they are compatible with the older PGI compilers by name and list of legacy compiler flags, they may not be able to compile any ancient code was once fully compatible with the original old PGI compilers. .. code:: bash nvc nvc++ nnfortran .. note:: Unless it is set by the developers of the source code to compile, do not employ ``nvcc`` compiler. Use ``nvc`` instead. NVIDIA optimized BLAS and LAPACK (CPU-version) ---------------------------------------------- Along with the GPU optimizing BLAS and LAPACK, NVIDIA HPC SDK installation also comes with optimized CPU-only versions of BLAS and LAPACK. To gain access to their installation tree, load the corresponding module: .. code:: bash module load nvidia module load nvhpc/latest Those libraries comes with their static and dynamics versions: .. code:: bash libblas_ilp64.a libblas_ilp64.so libblas_lp64.a libblas_lp64.so liblapack_ilp64.a liblapack_ilp64.so liblapack_lp64.a liblapack_lp64.so The names of those files contain a suffix that denotes the sizes of the arrays (number of elements) that can be handled by those libraries. The suffix ``ilp64`` denotes the capability of the library to index large arrays comprising more than 2\ :sup:`31`-1 elements, whereas the ``lp64`` version of libraries are capable of indexing arrays comprising a number of elements less or equal to the upper range of the 32-bit integer type (up to 2\ :sup:`31`-1 elements). By default the symlinks: .. code:: bash libblas.a libblas.so liblapack.a liblapack.so point to the ``lp64`` version of the libraries (indexing up to 2^31-1 elements). That means that if you pass the following flags to the linker: .. code:: bash -lblas -llapack the produced binary code will be linked against ``libblas_lp64`` and ``liblapack_lp64`` (statcally or dynamically, depending on the selected type of the linking). In case one needs to link the code agains the ``ilp64`` versions of the libraries (to handle arrays with more than 2\ :sup:`31`-1 elements), the following options have to be passed to the linker: .. code:: bash -lblas_ilp64 -llapack_ilp64 If you want to do that through the environment, add the flags to the value of ``LDFLAGS``: .. code:: bash export LDFLAGS+=" -lblas_ilp64 -llapack_ilp64" If your linker is part of the NVIDIA HPC SDK compiler collerion, then you don't have to append ``-L/path`` flag to the ``LDLFAGS`` environmental library. But in case you need to perform a cross-compilation, then setting the path to the library location is necessary. One opportunity to find out that path is to list the value of ``LD_LIBRARY_PATH`` after the corresponding environment have been loaded, for example: .. code:: bash export LDFLAGS+=" -L/opt/software/nvidia/hpc_sdk/Linux_x86_64/24.5/compilers/lib -lblas_ilp64 -llapack_ilp64" If you want to use CMake to compile a code using the ``LDFLAGS`` environment library, be sure to go over this document: https://cmake.org/cmake/help/latest/envvar/LDFLAGS.html Compiler optimization flags for AMD Zen2 CPU microarchitecture -------------------------------------------------------------- .. note:: The compute nodes of Discoverer HPC are equipped with AMD EPYC 7H12 64-Core processors, which implies AMD Zen2 CPU architecture. The following compiler flags can be useful during the compile-time optimization of your binary code on AMD Zen2: .. code:: bash -tp zen2 For example (directly invoking the CPU microarchitecture optimization): .. code:: bash nvc -tp zen2 ... nvc++ -tp zen2 ... nvfortran -tp zen2 ... Recent versions of NVIDIA HPC SDK compilers also support ``-march`` and ``-mtune`` flags, which implies the "classic" way of passing the CPU microarchitecture flags to the compiler: .. code:: bash nvc -march=znver2 -mtune=znver2 ... nvc++ -march=znver2 -mtune=znver2 ... nvfortran -march=znver2 -mtune=znver2 ... To pass the flags for the microarchitecture to the compilers through the environment one may add to the values of ``CFLAGS``, ``CXXFLAGS``, or ``FCFLAGS`` environmental variables the following: .. code:: bash CFLAGS+=" -tp zen2" CXXFLAGS+=" -tp zen2" FCFLAGS+=" -tp zen2" .. code:: bash CFLAGS+=" -march=znver2 -mtune=znver2" CXXFLAGS+=" -march=znver2 -mtune=znver2" FCFLAGS+=" -march=znver2 -mtune=znver2" More on the supported compiler flags: `NVIDIA HPC SDK Documentation`_ Interaction with CMake ---------------------- It is recommended to specify the compiler executables when invoking ``cmake`` tool: .. code:: bash -DCMAKE_C_COMPILER=nvc -DCMAKE_CXX_COMPILER=nvc++ -DCMAKE_Fortran_COMPILER=nvfortran The corresponding optimization compiler flags can be passed to ``cmake`` as well: .. code:: bash -DCMAKE_C_FLAGS="-tp zen2 ${CFLAGS}" -DCMAKE_CXX_FLAGS="-tp zen2 ${CXXFLAGS}" -DCMAKE_Fortran_FLAGS="-tp zen2 ${FCFLAGS}" or .. code:: bash -DCMAKE_C_FLAGS="-march=znver2 -mtune=znver2 ${CFLAGS}" -DCMAKE_CXX_FLAGS="-march=znver2 -mtune=znver2 ${CXXFLAGS}" -DCMAKE_Fortran_FLAGS="-march=znver2 -mtune=znver2 ${FCFLAGS}" Getting help ------------ See :doc:`help` .. _`NVIDIA HPC SDK`: https://developer.nvidia.com/hpc-sdk .. _`NVIDIA HPC SDK Documentation`: https://docs.nvidia.com/hpc-sdk/index.html