PAPI ==== .. contents:: :depth: 3 :local: About PAPI ---------- PAPI (Performance Application Programming Interface) provides a powerful and portable way to measure hardware performance at the lowest level. PAPI is an open-source project available at `PAPI Project `_. Build recipes and compilation instructions for this installation are available at: https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/papi PAPI enables you to: What PAPI can do for you ~~~~~~~~~~~~~~~~~~~~~~~~ 1. Measure Hardware Performance Counters - Access CPU cycles, instruction counts, and execution metrics - Monitor cache behaviour (hits, misses, accesses) - Track branch prediction accuracy - Measure floating-point operation rates - Analyse memory bandwidth and access patterns 2. Profile Application Performance - Identify performance bottlenecks at the hardware level - Understand why code runs slowly (cache misses, branch mispredictions, etc.) - Compare performance across different algorithms or implementations - Measure the impact of compiler optimizations 3. Optimize Code Based on Data - Use hardware metrics to guide optimization efforts - Detect cache-unfriendly memory access patterns - Identify branch prediction issues - Measure floating-point efficiency 4. Cross-Platform Performance Analysis - Write performance measurement code once, run on multiple architectures - Compare performance across different hardware platforms - Conduct reproducible performance studies Basic Workflow ~~~~~~~~~~~~~~ The typical workflow for using PAPI is straightforward: 1. **Initialize PAPI** - Set up the library 2. **Create event set and add events** - Choose what to measure 3. **Start counters** - Begin measurement 4. **Execute code** - Run the code you want to profile 5. **Stop counters and read values** - Get the performance data 6. **Cleanup** - Release resources With these steps, you can measure and analyse the performance characteristics of your applications at the hardware level, providing insights that are impossible to obtain from simple timing measurements alone. Prerequisites ------------- 1. Load required modules: .. code:: bash # Load LLVM module (required for clang and clang++) module load llvm # Load PAPI module module load papi/7/7.2.0 2. Verify PAPI is available: .. code:: bash papi_version papi_avail 3. Verify compiler availability: .. code:: bash # Check GCC (usually available by default) gcc --version g++ --version # Check clang and clang++ versions and availability (requires llvm module to be loaded) clang --version clang++ --version Compiling PAPI applications --------------------------- Basic compilation ~~~~~~~~~~~~~~~~~ With ``gcc``: .. code:: bash module load papi/7 gcc -o my_program my_program.c -lpapi With ``clang``: .. code:: bash module load papi/7 module load llvm clang -o my_program my_program.c -lpapi With optimization ~~~~~~~~~~~~~~~~~ For production code, use optimization flags: With ``gcc``: .. code:: bash module load papi/7 gcc -O3 -o my_program my_program.c -lpapi With ``clang``: .. code:: bash module load papi/7 module load llvm clang -O3 -o my_program my_program.c -lpapi Using module environment variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The PAPI module sets up compiler flags automatically: With ``g++``: .. code:: bash module load papi/7 gcc $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi With ``clang``: .. code:: bash module load papi/7 module load llvm clang $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi Or explicitly specify paths: .. code:: bash # GCC (usually available by default) gcc -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi # Clang (requires module load llvm) module load llvm clang -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi Basic PAPI usage pattern ------------------------ 1. Initialize PAPI ~~~~~~~~~~~~~~~~~~ .. code:: c #include int retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT) { fprintf(stderr, "PAPI init error: %s\n", PAPI_strerror(retval)); exit(1); } 2. Create an Event Set ~~~~~~~~~~~~~~~~~~~~~~ .. code:: c int EventSet = PAPI_NULL; retval = PAPI_create_eventset(&EventSet); if (retval != PAPI_OK) { fprintf(stderr, "Error creating eventset: %s\n", PAPI_strerror(retval)); exit(1); } 3. Add Events to Measure ~~~~~~~~~~~~~~~~~~~~~~~~ Common events: - ``PAPI_TOT_CYC`` - Total CPU cycles - ``PAPI_TOT_INS`` - Total instructions - ``PAPI_L1_DCM`` - L1 data cache misses - ``PAPI_L2_DCM`` - L2 data cache misses - ``PAPI_BR_MSP`` - Branch mispredictions - ``PAPI_FP_OPS`` - Floating point operations .. code-block:: c retval = PAPI_add_event(EventSet, PAPI_TOT_CYC); if (retval != PAPI_OK) { fprintf(stderr, "Error adding event: %s\n", PAPI_strerror(retval)); exit(1); } 4. Start Measurement ~~~~~~~~~~~~~~~~~~~~ .. code:: c retval = PAPI_start(EventSet); if (retval != PAPI_OK) { fprintf(stderr, "Error starting counters: %s\n", PAPI_strerror(retval)); exit(1); } 5. Execute Code to Measure ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: c // Your code here for (int i = 0; i < iterations; i++) { // computation } 6. Stop and Read Values ~~~~~~~~~~~~~~~~~~~~~~~ .. code:: c long long values[1]; retval = PAPI_stop(EventSet, values); if (retval != PAPI_OK) { fprintf(stderr, "Error stopping counters: %s\n", PAPI_strerror(retval)); exit(1); } printf("Total cycles: %lld\n", values[0]); 7. Cleanup ~~~~~~~~~~ .. code:: c PAPI_cleanup_eventset(EventSet); PAPI_destroy_eventset(&EventSet); PAPI_shutdown(); Complete code examples ---------------------- C code example ~~~~~~~~~~~~~~ Here is a complete working example in C that demonstrates all the steps together: .. code-block:: c /* Created by Veselin Kolev on 31 December 2025 * * Complete PAPI Example * * This program demonstrates basic usage of PAPI to measure hardware * performance counters. It performs a simple computation and measures * CPU cycles, instructions, and cache misses. * */ #include #include #include #define NUM_EVENTS 3 #define ERROR_RETURN(retval) { \ fprintf(stderr, "Error %d %s:line %d: \n", retval, __FILE__, __LINE__); \ exit(retval); \ } int main(int argc, char **argv) { int retval, i; int EventSet = PAPI_NULL; long long values[NUM_EVENTS]; int events[NUM_EVENTS] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM}; char event_names[NUM_EVENTS][PAPI_MAX_STR_LEN]; int event_indices[NUM_EVENTS]; /* Track which events were successfully added */ int num_added = 0; /* Initialize PAPI library */ retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT) { fprintf(stderr, "PAPI library init error: %s\n", PAPI_strerror(retval)); ERROR_RETURN(retval); } printf("PAPI initialized successfully\n"); printf("PAPI Version: %d.%d.%d\n", PAPI_VERSION_MAJOR(PAPI_VERSION), PAPI_VERSION_MINOR(PAPI_VERSION), PAPI_VERSION_REVISION(PAPI_VERSION)); /* Create EventSet */ retval = PAPI_create_eventset(&EventSet); if (retval != PAPI_OK) { fprintf(stderr, "PAPI create eventset error: %s\n", PAPI_strerror(retval)); ERROR_RETURN(retval); } /* Add events to EventSet */ for (i = 0; i < NUM_EVENTS; i++) { /* Check if event is available before adding */ retval = PAPI_query_event(events[i]); if (retval != PAPI_OK) { retval = PAPI_event_code_to_name(events[i], event_names[i]); if (retval == PAPI_OK) { fprintf(stderr, "Warning: Event %s is not available on this platform\n", event_names[i]); } else { fprintf(stderr, "Warning: Event %d is not available on this platform\n", events[i]); } continue; } retval = PAPI_add_event(EventSet, events[i]); if (retval != PAPI_OK) { fprintf(stderr, "PAPI add event error: %s\n", PAPI_strerror(retval)); retval = PAPI_event_code_to_name(events[i], event_names[i]); if (retval == PAPI_OK) { fprintf(stderr, "Event %s may not be available on this platform\n", event_names[i]); } else { fprintf(stderr, "Event %d may not be available on this platform\n", events[i]); } continue; } /* Get event name for display */ retval = PAPI_event_code_to_name(events[i], event_names[i]); if (retval != PAPI_OK) { sprintf(event_names[i], "Event_%d", events[i]); } /* Track successfully added events */ event_indices[num_added] = i; num_added++; } if (num_added == 0) { fprintf(stderr, "Error: No events could be added to the eventset\n"); ERROR_RETURN(1); } printf("Successfully added %d event(s) to eventset\n", num_added); /* Start counting */ retval = PAPI_start(EventSet); if (retval != PAPI_OK) { fprintf(stderr, "PAPI start error: %s\n", PAPI_strerror(retval)); ERROR_RETURN(retval); } printf("\nStarting measurement...\n"); /* Perform some computation */ volatile double sum = 0.0; int iterations = 1000000; for (i = 0; i < iterations; i++) { sum += i * 1.5; } printf("Computation completed (sum = %f)\n", sum); /* Stop counting and read values */ retval = PAPI_stop(EventSet, values); if (retval != PAPI_OK) { fprintf(stderr, "PAPI stop error: %s\n", PAPI_strerror(retval)); ERROR_RETURN(retval); } /* Display results */ printf("\n=== Performance Counter Results ===\n"); for (i = 0; i < num_added; i++) { int idx = event_indices[i]; printf("%-30s: %lld\n", event_names[idx], values[i]); } /* Calculate derived metrics */ /* Find which events were successfully added and their positions */ int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1; for (i = 0; i < num_added; i++) { int idx = event_indices[i]; if (events[idx] == PAPI_TOT_CYC) cycles_pos = i; if (events[idx] == PAPI_TOT_INS) ins_pos = i; if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i; } if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) { double cpi = (double)values[cycles_pos] / (double)values[ins_pos]; printf("\nCycles per Instruction (CPI): %.4f\n", cpi); } if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) { if (ins_pos >= 0 && values[ins_pos] > 0) { /* Calculate miss rate per instruction */ double miss_rate = (double)values[cache_miss_pos] / (double)values[ins_pos] * 100.0; printf("L1 Data Cache Miss Rate: %.4f%% (misses per instruction)\n", miss_rate); } else { /* Just show the raw number if we don't have instruction count */ printf("\nL1 Data Cache Misses: %lld\n", values[cache_miss_pos]); } } /* Cleanup */ retval = PAPI_cleanup_eventset(EventSet); if (retval != PAPI_OK) { fprintf(stderr, "PAPI cleanup eventset error: %s\n", PAPI_strerror(retval)); } retval = PAPI_destroy_eventset(&EventSet); if (retval != PAPI_OK) { fprintf(stderr, "PAPI destroy eventset error: %s\n", PAPI_strerror(retval)); } PAPI_shutdown(); printf("\nPAPI test completed successfully\n"); return 0; } Save this code to a file (e.g., ``example.c``) and compile it: With ``gcc``: .. code:: bash module load papi/7 gcc $CFLAGS -o example example.c $LDFLAGS -lpapi With ``clang``: .. code:: bash module load llvm module load papi/7 clang $CFLAGS -o example example.c $LDFLAGS -lpapi Or explicitly specify paths: .. code:: bash module load papi/7 gcc -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi # Clang (requires module load llvm first) module load papi/7 module load llvm clang -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi C++ code example ---------------- Here is the same example written in C++ using modern C++ features: .. code-block:: cpp /* Created by Veselin Kolev on 31 December 2025 * * Complete PAPI Example (C++) * * This program demonstrates basic usage of PAPI to measure hardware * performance counters. It performs a simple computation and measures * CPU cycles, instructions, and cache misses. * */ #include #include #include #include #include #include extern "C" { #include } const int NUM_EVENTS = 3; void error_exit(int retval, const char* file, int line) { std::cerr << "Error " << retval << " " << file << ":line " << line << std::endl; std::exit(retval); } int main(int argc, char **argv) { int retval; int EventSet = PAPI_NULL; std::vector values(NUM_EVENTS); std::vector events = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM}; std::vector event_names(NUM_EVENTS); std::vector event_indices; // Track which events were successfully added int num_added = 0; /* Initialize PAPI library */ retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT) { std::cerr << "PAPI library init error: " << PAPI_strerror(retval) << std::endl; error_exit(retval, __FILE__, __LINE__); } std::cout << "PAPI initialized successfully" << std::endl; std::cout << "PAPI Version: " << PAPI_VERSION_MAJOR(PAPI_VERSION) << "." << PAPI_VERSION_MINOR(PAPI_VERSION) << "." << PAPI_VERSION_REVISION(PAPI_VERSION) << std::endl; /* Create EventSet */ retval = PAPI_create_eventset(&EventSet); if (retval != PAPI_OK) { std::cerr << "PAPI create eventset error: " << PAPI_strerror(retval) << std::endl; error_exit(retval, __FILE__, __LINE__); } /* Add events to EventSet */ for (size_t i = 0; i < events.size(); i++) { /* Check if event is available before adding */ retval = PAPI_query_event(events[i]); if (retval != PAPI_OK) { char name[PAPI_MAX_STR_LEN]; retval = PAPI_event_code_to_name(events[i], name); if (retval == PAPI_OK) { std::cerr << "Warning: Event " << name << " is not available on this platform" << std::endl; } else { std::cerr << "Warning: Event " << events[i] << " is not available on this platform" << std::endl; } continue; } retval = PAPI_add_event(EventSet, events[i]); if (retval != PAPI_OK) { std::cerr << "PAPI add event error: " << PAPI_strerror(retval) << std::endl; char name[PAPI_MAX_STR_LEN]; retval = PAPI_event_code_to_name(events[i], name); if (retval == PAPI_OK) { std::cerr << "Event " << name << " may not be available on this platform" << std::endl; } else { std::cerr << "Event " << events[i] << " may not be available on this platform" << std::endl; } continue; } /* Get event name for display */ char name[PAPI_MAX_STR_LEN]; retval = PAPI_event_code_to_name(events[i], name); if (retval == PAPI_OK) { event_names[i] = std::string(name); } else { event_names[i] = "Event_" + std::to_string(events[i]); } /* Track successfully added events */ event_indices.push_back(i); num_added++; } if (num_added == 0) { std::cerr << "Error: No events could be added to the eventset" << std::endl; error_exit(1, __FILE__, __LINE__); } std::cout << "Successfully added " << num_added << " event(s) to eventset" << std::endl; /* Start counting */ retval = PAPI_start(EventSet); if (retval != PAPI_OK) { std::cerr << "PAPI start error: " << PAPI_strerror(retval) << std::endl; error_exit(retval, __FILE__, __LINE__); } std::cout << "\nStarting measurement..." << std::endl; /* Perform some computation */ volatile double sum = 0.0; const int iterations = 1000000; for (int i = 0; i < iterations; i++) { sum += i * 1.5; } std::cout << "Computation completed (sum = " << std::fixed << std::setprecision(6) << sum << ")" << std::endl; /* Stop counting and read values */ retval = PAPI_stop(EventSet, values.data()); if (retval != PAPI_OK) { std::cerr << "PAPI stop error: " << PAPI_strerror(retval) << std::endl; error_exit(retval, __FILE__, __LINE__); } /* Display results */ std::cout << "\n=== Performance Counter Results ===" << std::endl; for (int i = 0; i < num_added; i++) { int idx = event_indices[i]; std::cout << std::left << std::setw(30) << event_names[idx] << ": " << values[i] << std::endl; } /* Calculate derived metrics */ /* Find which events were successfully added and their positions */ int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1; for (int i = 0; i < num_added; i++) { int idx = event_indices[i]; if (events[idx] == PAPI_TOT_CYC) cycles_pos = i; if (events[idx] == PAPI_TOT_INS) ins_pos = i; if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i; } if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) { double cpi = static_cast(values[cycles_pos]) / static_cast(values[ins_pos]); std::cout << "\nCycles per Instruction (CPI): " << std::fixed << std::setprecision(4) << cpi << std::endl; } if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) { if (ins_pos >= 0 && values[ins_pos] > 0) { /* Calculate miss rate per instruction */ double miss_rate = static_cast(values[cache_miss_pos]) / static_cast(values[ins_pos]) * 100.0; std::cout << "L1 Data Cache Miss Rate: " << std::fixed << std::setprecision(4) << miss_rate << "% (misses per instruction)" << std::endl; } else { /* Just show the raw number if we don't have instruction count */ std::cout << "\nL1 Data Cache Misses: " << values[cache_miss_pos] << std::endl; } } /* Cleanup */ retval = PAPI_cleanup_eventset(EventSet); if (retval != PAPI_OK) { std::cerr << "PAPI cleanup eventset error: " << PAPI_strerror(retval) << std::endl; } retval = PAPI_destroy_eventset(&EventSet); if (retval != PAPI_OK) { std::cerr << "PAPI destroy eventset error: " << PAPI_strerror(retval) << std::endl; } PAPI_shutdown(); std::cout << "\nPAPI test completed successfully" << std::endl; return 0; } Save this code to a file (e.g., ``example.cpp``) and compile it: With ``g++``: .. code:: bash module load papi/7 g++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi With ``clang++``: .. code:: bash module load llvm module load papi/7 clang++ -stdlib=libc++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi Or explicitly specify paths: .. code:: bash module load papi/7 g++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi # Clang++ (requires module load llvm first) module load papi/7 module load llvm clang++ -stdlib=libc++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi Common performance events ------------------------- CPU metrics ~~~~~~~~~~~ - ``PAPI_TOT_CYC`` - Total CPU cycles - ``PAPI_TOT_INS`` - Total instructions executed - ``PAPI_REF_CYC`` - Reference cycles Cache metrics ~~~~~~~~~~~~~ - ``PAPI_L1_DCM`` - L1 data cache misses - ``PAPI_L1_DCA`` - L1 data cache accesses - ``PAPI_L2_DCM`` - L2 data cache misses - ``PAPI_L2_DCA`` - L2 data cache accesses - ``PAPI_L3_TCM`` - L3 total cache misses Branch metrics ~~~~~~~~~~~~~~ - ``PAPI_BR_CN`` - Conditional branches - ``PAPI_BR_MSP`` - Branch mispredictions - ``PAPI_BR_PRC`` - Conditional branches correctly predicted Floating point ~~~~~~~~~~~~~~ - ``PAPI_FP_OPS`` - Floating point operations - ``PAPI_SP_OPS`` - Single precision operations - ``PAPI_DP_OPS`` - Double precision operations Memory ~~~~~~ - ``PAPI_LD_INS`` - Load instructions - ``PAPI_SR_INS`` - Store instructions Discovering available events ---------------------------- List all available events ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: bash papi_avail List native events ~~~~~~~~~~~~~~~~~~ .. code:: bash papi_native_avail Check specific event ~~~~~~~~~~~~~~~~~~~~ .. code:: bash papi_event_chooser PAPI_TOT_CYC Error handling -------------- Always check return values from PAPI functions: .. code:: c int retval = PAPI_function(...); if (retval != PAPI_OK) { fprintf(stderr, "Error: %s\n", PAPI_strerror(retval)); // Handle error appropriately } Common error codes - ``PAPI_OK`` - Success - ``PAPI_EINVAL`` - Invalid argument - ``PAPI_ENOMEM`` - Out of memory - ``PAPI_ECNFLCT`` - Event conflict - ``PAPI_ENOEVNT`` - Event not available Advanced usage -------------- Multiple events ~~~~~~~~~~~~~~~ .. code:: c int events[] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM}; long long values[3]; // Add all events for (int i = 0; i < 3; i++) { PAPI_add_event(EventSet, events[i]); } PAPI_start(EventSet); // ... code ... PAPI_stop(EventSet, values); // values[0] = cycles, values[1] = instructions, values[2] = cache misses High-level API ~~~~~~~~~~~~~~ PAPI also provides a high-level API for common metrics: .. code:: c float real_time, proc_time, mflops; long long flpops; PAPI_flops(&real_time, &proc_time, &flpops, &mflops); printf("MFLOPS: %f\n", mflops); Thread safety ~~~~~~~~~~~~~ PAPI is thread-safe. Each thread should: 1. Create its own EventSet 2. Initialize counters independently 3. Clean up its own resources Performance considerations -------------------------- 1. **Overhead**: PAPI has minimal overhead, but frequent start/stop operations can add up 2. **Counter Limits**: Hardware has a limited number of simultaneous counters (typically 2-8) 3. **Multiplexing**: Use PAPI multiplexing to measure more events than available counters 4. **Sampling**: For long-running codes, consider sampling instead of continuous measurement Troubleshooting --------------- “Event not available” ~~~~~~~~~~~~~~~~~~~~~ Some events may not be available on all platforms: - Check with ``papi_avail`` - Use ``PAPI_query_event()`` to check availability before adding - Have fallback events ready Permission issues ~~~~~~~~~~~~~~~~~ Some counters require special permissions: - May need to run as root - Or adjust ``/proc/sys/kernel/perf_event_paranoid`` to allow user access Linking issues ~~~~~~~~~~~~~~ If you get undefined references: - Ensure ``-lpapi`` is at the end of the link command - Check that ``LD_LIBRARY_PATH`` includes PAPI library directory - Verify module is loaded: ``module list`` Additional resources -------------------- - PAPI User's Guide: ``man papi`` - PAPI API Reference: ``man PAPI_start`` - Example programs in PAPI distribution - Online PAPI documentation Example makefile ---------------- Using ``gcc`` ~~~~~~~~~~~~~ .. code:: makefile CC = gcc CFLAGS = -O3 -Wall LDFLAGS = -lpapi example: example.c $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Using ``clang`` ~~~~~~~~~~~~~~~ .. note:: Requires ``module load llvm`` before running ``make``. .. code:: makefile CC = clang CFLAGS = -O3 -Wall LDFLAGS = -lpapi example: example.c $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Compiler-Agnostic Makefile ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: makefile # Default to gcc (more widely available), but can be overridden # For clang: make CC=clang (requires module load llvm) CC ?= gcc CFLAGS = -O3 -Wall LDFLAGS = -lpapi example: example.c $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Usage: .. code:: bash # GCC (usually available by default) module load papi/7 make CC=gcc # Explicitly use gcc # Clang (requires module load llvm first) module load papi/7 module load llvm make CC=clang # Use clang C++ Makefile Examples --------------------- Using ``g++`` ~~~~~~~~~~~~~ .. code:: makefile CXX = g++ CXXFLAGS = -O3 -Wall LDFLAGS = -lpapi example: example.cpp $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Using ``clang++`` ~~~~~~~~~~~~~~~~~ .. note:: Requires ``module load llvm`` before running ``make``. .. code:: makefile CXX = clang++ CXXFLAGS = -O3 -Wall -stdlib=libc++ LDFLAGS = -lpapi example: example.cpp $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Compiler-Agnostic Makefile for C++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: makefile # Default to g++ (more widely available), but can be overridden # For clang++: make CXX=clang++ (requires module load llvm) CXX ?= g++ CXXFLAGS = -O3 -Wall # Add -stdlib=libc++ if using clang++ ifeq ($(CXX),clang++) CXXFLAGS += -stdlib=libc++ endif LDFLAGS = -lpapi example: example.cpp $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS) clean: rm -f example Usage: .. code:: bash # G++ (usually available by default) module load papi/7 make CXX=g++ # Explicitly use g++ # Clang++ (requires module load llvm first) module load papi/7 module load llvm make CXX=clang++ # Use clang++