PAPI

About PAPI

PAPI (Performance Application Programming Interface) provides a powerful and portable way to measure hardware performance at the lowest level. PAPI is an open-source project available at PAPI Project.

Build recipes and compilation instructions for this installation are available at:

https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/papi

PAPI enables you to:

What PAPI can do for you

  1. Measure Hardware Performance Counters
    • Access CPU cycles, instruction counts, and execution metrics
    • Monitor cache behaviour (hits, misses, accesses)
    • Track branch prediction accuracy
    • Measure floating-point operation rates
    • Analyse memory bandwidth and access patterns
  2. Profile Application Performance
    • Identify performance bottlenecks at the hardware level
    • Understand why code runs slowly (cache misses, branch mispredictions, etc.)
    • Compare performance across different algorithms or implementations
    • Measure the impact of compiler optimizations
  3. Optimize Code Based on Data
    • Use hardware metrics to guide optimization efforts
    • Detect cache-unfriendly memory access patterns
    • Identify branch prediction issues
    • Measure floating-point efficiency
  4. Cross-Platform Performance Analysis
    • Write performance measurement code once, run on multiple architectures
    • Compare performance across different hardware platforms
    • Conduct reproducible performance studies

Basic Workflow

The typical workflow for using PAPI is straightforward:

  1. Initialize PAPI - Set up the library
  2. Create event set and add events - Choose what to measure
  3. Start counters - Begin measurement
  4. Execute code - Run the code you want to profile
  5. Stop counters and read values - Get the performance data
  6. Cleanup - Release resources

With these steps, you can measure and analyse the performance characteristics of your applications at the hardware level, providing insights that are impossible to obtain from simple timing measurements alone.

Prerequisites

  1. Load required modules:

    # Load LLVM module (required for clang and clang++)
    module load llvm
    
    # Load PAPI module
    module load papi/7/7.2.0
    
  2. Verify PAPI is available:

    papi_version
    papi_avail
    
  3. Verify compiler availability:

    # Check GCC (usually available by default)
    gcc --version
    g++ --version
    
    # Check clang and clang++ versions and availability (requires llvm module to be loaded)
    clang --version
    clang++ --version
    

Compiling PAPI applications

Basic compilation

With gcc:

module load papi/7
gcc -o my_program my_program.c -lpapi

With clang:

module load papi/7
module load llvm
clang -o my_program my_program.c -lpapi

With optimization

For production code, use optimization flags:

With gcc:

module load papi/7
gcc -O3 -o my_program my_program.c -lpapi

With clang:

module load papi/7
module load llvm
clang -O3 -o my_program my_program.c -lpapi

Using module environment variables

The PAPI module sets up compiler flags automatically:

With g++:

module load papi/7
gcc $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi

With clang:

module load papi/7
module load llvm
clang $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi

Or explicitly specify paths:

# GCC (usually available by default)
gcc -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi

# Clang (requires module load llvm)
module load llvm
clang -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi

Basic PAPI usage pattern

1. Initialize PAPI

#include <papi.h>

int retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
    fprintf(stderr, "PAPI init error: %s\n", PAPI_strerror(retval));
    exit(1);
}

2. Create an Event Set

int EventSet = PAPI_NULL;
retval = PAPI_create_eventset(&EventSet);
if (retval != PAPI_OK) {
    fprintf(stderr, "Error creating eventset: %s\n", PAPI_strerror(retval));
    exit(1);
}

3. Add Events to Measure

Common events:

  • PAPI_TOT_CYC - Total CPU cycles
  • PAPI_TOT_INS - Total instructions
  • PAPI_L1_DCM - L1 data cache misses
  • PAPI_L2_DCM - L2 data cache misses
  • PAPI_BR_MSP - Branch mispredictions
  • PAPI_FP_OPS - Floating point operations
retval = PAPI_add_event(EventSet, PAPI_TOT_CYC);
if (retval != PAPI_OK) {
    fprintf(stderr, "Error adding event: %s\n", PAPI_strerror(retval));
    exit(1);
}

4. Start Measurement

retval = PAPI_start(EventSet);
if (retval != PAPI_OK) {
    fprintf(stderr, "Error starting counters: %s\n", PAPI_strerror(retval));
    exit(1);
}

5. Execute Code to Measure

// Your code here
for (int i = 0; i < iterations; i++) {
    // computation
}

6. Stop and Read Values

long long values[1];
retval = PAPI_stop(EventSet, values);
if (retval != PAPI_OK) {
    fprintf(stderr, "Error stopping counters: %s\n", PAPI_strerror(retval));
    exit(1);
}

printf("Total cycles: %lld\n", values[0]);

7. Cleanup

PAPI_cleanup_eventset(EventSet);
PAPI_destroy_eventset(&EventSet);
PAPI_shutdown();

Complete code examples

C code example

Here is a complete working example in C that demonstrates all the steps together:

/* Created by Veselin Kolev <v.kolev@discoverer.bg> on 31 December 2025
 *
 * Complete PAPI Example
 *
 * This program demonstrates basic usage of PAPI to measure hardware
 * performance counters. It performs a simple computation and measures
 * CPU cycles, instructions, and cache misses.
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include <papi.h>

#define NUM_EVENTS 3
#define ERROR_RETURN(retval) { \
    fprintf(stderr, "Error %d %s:line %d: \n", retval, __FILE__, __LINE__); \
    exit(retval); \
}

int main(int argc, char **argv)
{
int retval, i;
int EventSet = PAPI_NULL;
long long values[NUM_EVENTS];
int events[NUM_EVENTS] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
char event_names[NUM_EVENTS][PAPI_MAX_STR_LEN];
int event_indices[NUM_EVENTS];  /* Track which events were successfully added */
int num_added = 0;

/* Initialize PAPI library */
    retval = PAPI_library_init(PAPI_VER_CURRENT);
    if (retval != PAPI_VER_CURRENT) {
        fprintf(stderr, "PAPI library init error: %s\n", PAPI_strerror(retval));
        ERROR_RETURN(retval);
    }

    printf("PAPI initialized successfully\n");
    printf("PAPI Version: %d.%d.%d\n",
           PAPI_VERSION_MAJOR(PAPI_VERSION),
           PAPI_VERSION_MINOR(PAPI_VERSION),
           PAPI_VERSION_REVISION(PAPI_VERSION));

    /* Create EventSet */
    retval = PAPI_create_eventset(&EventSet);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI create eventset error: %s\n", PAPI_strerror(retval));
        ERROR_RETURN(retval);
    }

/* Add events to EventSet */
for (i = 0; i < NUM_EVENTS; i++) {
    /* Check if event is available before adding */
    retval = PAPI_query_event(events[i]);
    if (retval != PAPI_OK) {
        retval = PAPI_event_code_to_name(events[i], event_names[i]);
        if (retval == PAPI_OK) {
            fprintf(stderr, "Warning: Event %s is not available on this platform\n", event_names[i]);
        } else {
            fprintf(stderr, "Warning: Event %d is not available on this platform\n", events[i]);
        }
        continue;
    }

    retval = PAPI_add_event(EventSet, events[i]);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI add event error: %s\n", PAPI_strerror(retval));
        retval = PAPI_event_code_to_name(events[i], event_names[i]);
        if (retval == PAPI_OK) {
            fprintf(stderr, "Event %s may not be available on this platform\n", event_names[i]);
        } else {
            fprintf(stderr, "Event %d may not be available on this platform\n", events[i]);
        }
        continue;
    }

    /* Get event name for display */
    retval = PAPI_event_code_to_name(events[i], event_names[i]);
    if (retval != PAPI_OK) {
        sprintf(event_names[i], "Event_%d", events[i]);
    }

    /* Track successfully added events */
    event_indices[num_added] = i;
    num_added++;
}

if (num_added == 0) {
    fprintf(stderr, "Error: No events could be added to the eventset\n");
    ERROR_RETURN(1);
}

printf("Successfully added %d event(s) to eventset\n", num_added);

/* Start counting */
    retval = PAPI_start(EventSet);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI start error: %s\n", PAPI_strerror(retval));
        ERROR_RETURN(retval);
    }

    printf("\nStarting measurement...\n");

    /* Perform some computation */
    volatile double sum = 0.0;
    int iterations = 1000000;

    for (i = 0; i < iterations; i++) {
        sum += i * 1.5;
    }

    printf("Computation completed (sum = %f)\n", sum);

    /* Stop counting and read values */
    retval = PAPI_stop(EventSet, values);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI stop error: %s\n", PAPI_strerror(retval));
        ERROR_RETURN(retval);
    }

/* Display results */
printf("\n=== Performance Counter Results ===\n");
for (i = 0; i < num_added; i++) {
    int idx = event_indices[i];
    printf("%-30s: %lld\n", event_names[idx], values[i]);
}

/* Calculate derived metrics */
/* Find which events were successfully added and their positions */
int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1;
for (i = 0; i < num_added; i++) {
    int idx = event_indices[i];
    if (events[idx] == PAPI_TOT_CYC) cycles_pos = i;
    if (events[idx] == PAPI_TOT_INS) ins_pos = i;
    if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i;
}

if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) {
    double cpi = (double)values[cycles_pos] / (double)values[ins_pos];
    printf("\nCycles per Instruction (CPI): %.4f\n", cpi);
}

if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) {
    if (ins_pos >= 0 && values[ins_pos] > 0) {
        /* Calculate miss rate per instruction */
        double miss_rate = (double)values[cache_miss_pos] / (double)values[ins_pos] * 100.0;
        printf("L1 Data Cache Miss Rate: %.4f%% (misses per instruction)\n", miss_rate);
    } else {
        /* Just show the raw number if we don't have instruction count */
        printf("\nL1 Data Cache Misses: %lld\n", values[cache_miss_pos]);
    }
}

    /* Cleanup */
    retval = PAPI_cleanup_eventset(EventSet);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI cleanup eventset error: %s\n", PAPI_strerror(retval));
    }

    retval = PAPI_destroy_eventset(&EventSet);
    if (retval != PAPI_OK) {
        fprintf(stderr, "PAPI destroy eventset error: %s\n", PAPI_strerror(retval));
    }

    PAPI_shutdown();

    printf("\nPAPI test completed successfully\n");
    return 0;
}

Save this code to a file (e.g., example.c) and compile it:

With gcc:

module load papi/7
gcc $CFLAGS -o example example.c $LDFLAGS -lpapi

With clang:

module load llvm
module load papi/7
clang $CFLAGS -o example example.c $LDFLAGS -lpapi

Or explicitly specify paths:

module load papi/7
gcc -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi

# Clang (requires module load llvm first)
module load papi/7
module load llvm
clang -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi

C++ code example

Here is the same example written in C++ using modern C++ features:

/* Created by Veselin Kolev <v.kolev@discoverer.bg> on 31 December 2025
 *
 * Complete PAPI Example (C++)
 *
 * This program demonstrates basic usage of PAPI to measure hardware
 * performance counters. It performs a simple computation and measures
 * CPU cycles, instructions, and cache misses.
 *
 */

#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <cstdio>
#include <papi.h>

extern "C" {
#include <stdlib.h>
}

const int NUM_EVENTS = 3;

void error_exit(int retval, const char* file, int line) {
    std::cerr << "Error " << retval << " " << file << ":line " << line << std::endl;
    std::exit(retval);
}

int main(int argc, char **argv)
{
    int retval;
    int EventSet = PAPI_NULL;
    std::vector<long long> values(NUM_EVENTS);
    std::vector<int> events = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
    std::vector<std::string> event_names(NUM_EVENTS);
    std::vector<int> event_indices;  // Track which events were successfully added
    int num_added = 0;

    /* Initialize PAPI library */
    retval = PAPI_library_init(PAPI_VER_CURRENT);
    if (retval != PAPI_VER_CURRENT) {
        std::cerr << "PAPI library init error: " << PAPI_strerror(retval) << std::endl;
        error_exit(retval, __FILE__, __LINE__);
    }

    std::cout << "PAPI initialized successfully" << std::endl;
    std::cout << "PAPI Version: "
              << PAPI_VERSION_MAJOR(PAPI_VERSION) << "."
              << PAPI_VERSION_MINOR(PAPI_VERSION) << "."
              << PAPI_VERSION_REVISION(PAPI_VERSION) << std::endl;

    /* Create EventSet */
    retval = PAPI_create_eventset(&EventSet);
    if (retval != PAPI_OK) {
        std::cerr << "PAPI create eventset error: " << PAPI_strerror(retval) << std::endl;
        error_exit(retval, __FILE__, __LINE__);
    }

    /* Add events to EventSet */
    for (size_t i = 0; i < events.size(); i++) {
        /* Check if event is available before adding */
        retval = PAPI_query_event(events[i]);
        if (retval != PAPI_OK) {
            char name[PAPI_MAX_STR_LEN];
            retval = PAPI_event_code_to_name(events[i], name);
            if (retval == PAPI_OK) {
                std::cerr << "Warning: Event " << name
                          << " is not available on this platform" << std::endl;
            } else {
                std::cerr << "Warning: Event " << events[i]
                          << " is not available on this platform" << std::endl;
            }
            continue;
        }

        retval = PAPI_add_event(EventSet, events[i]);
        if (retval != PAPI_OK) {
            std::cerr << "PAPI add event error: " << PAPI_strerror(retval) << std::endl;
            char name[PAPI_MAX_STR_LEN];
            retval = PAPI_event_code_to_name(events[i], name);
            if (retval == PAPI_OK) {
                std::cerr << "Event " << name
                          << " may not be available on this platform" << std::endl;
            } else {
                std::cerr << "Event " << events[i]
                          << " may not be available on this platform" << std::endl;
            }
            continue;
        }

        /* Get event name for display */
        char name[PAPI_MAX_STR_LEN];
        retval = PAPI_event_code_to_name(events[i], name);
        if (retval == PAPI_OK) {
            event_names[i] = std::string(name);
        } else {
            event_names[i] = "Event_" + std::to_string(events[i]);
        }

        /* Track successfully added events */
        event_indices.push_back(i);
        num_added++;
    }

    if (num_added == 0) {
        std::cerr << "Error: No events could be added to the eventset" << std::endl;
        error_exit(1, __FILE__, __LINE__);
    }

    std::cout << "Successfully added " << num_added << " event(s) to eventset" << std::endl;

    /* Start counting */
    retval = PAPI_start(EventSet);
    if (retval != PAPI_OK) {
        std::cerr << "PAPI start error: " << PAPI_strerror(retval) << std::endl;
        error_exit(retval, __FILE__, __LINE__);
    }

    std::cout << "\nStarting measurement..." << std::endl;

    /* Perform some computation */
    volatile double sum = 0.0;
    const int iterations = 1000000;

    for (int i = 0; i < iterations; i++) {
        sum += i * 1.5;
    }

    std::cout << "Computation completed (sum = " << std::fixed
              << std::setprecision(6) << sum << ")" << std::endl;

    /* Stop counting and read values */
    retval = PAPI_stop(EventSet, values.data());
    if (retval != PAPI_OK) {
        std::cerr << "PAPI stop error: " << PAPI_strerror(retval) << std::endl;
        error_exit(retval, __FILE__, __LINE__);
    }

    /* Display results */
    std::cout << "\n=== Performance Counter Results ===" << std::endl;
    for (int i = 0; i < num_added; i++) {
        int idx = event_indices[i];
        std::cout << std::left << std::setw(30) << event_names[idx]
                  << ": " << values[i] << std::endl;
    }

    /* Calculate derived metrics */
    /* Find which events were successfully added and their positions */
    int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1;
    for (int i = 0; i < num_added; i++) {
        int idx = event_indices[i];
        if (events[idx] == PAPI_TOT_CYC) cycles_pos = i;
        if (events[idx] == PAPI_TOT_INS) ins_pos = i;
        if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i;
    }

    if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) {
        double cpi = static_cast<double>(values[cycles_pos]) / static_cast<double>(values[ins_pos]);
        std::cout << "\nCycles per Instruction (CPI): "
                  << std::fixed << std::setprecision(4) << cpi << std::endl;
    }

    if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) {
        if (ins_pos >= 0 && values[ins_pos] > 0) {
            /* Calculate miss rate per instruction */
            double miss_rate = static_cast<double>(values[cache_miss_pos])
                             / static_cast<double>(values[ins_pos]) * 100.0;
            std::cout << "L1 Data Cache Miss Rate: "
                      << std::fixed << std::setprecision(4) << miss_rate
                      << "% (misses per instruction)" << std::endl;
        } else {
            /* Just show the raw number if we don't have instruction count */
            std::cout << "\nL1 Data Cache Misses: "
                      << values[cache_miss_pos] << std::endl;
        }
    }

    /* Cleanup */
    retval = PAPI_cleanup_eventset(EventSet);
    if (retval != PAPI_OK) {
        std::cerr << "PAPI cleanup eventset error: " << PAPI_strerror(retval) << std::endl;
    }

    retval = PAPI_destroy_eventset(&EventSet);
    if (retval != PAPI_OK) {
        std::cerr << "PAPI destroy eventset error: " << PAPI_strerror(retval) << std::endl;
    }

    PAPI_shutdown();

    std::cout << "\nPAPI test completed successfully" << std::endl;
    return 0;
}

Save this code to a file (e.g., example.cpp) and compile it:

With g++:

module load papi/7
g++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi

With clang++:

module load llvm
module load papi/7
clang++ -stdlib=libc++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi

Or explicitly specify paths:

module load papi/7
g++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi

# Clang++ (requires module load llvm first)
module load papi/7
module load llvm
clang++ -stdlib=libc++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi

Common performance events

CPU metrics

  • PAPI_TOT_CYC - Total CPU cycles
  • PAPI_TOT_INS - Total instructions executed
  • PAPI_REF_CYC - Reference cycles

Cache metrics

  • PAPI_L1_DCM - L1 data cache misses
  • PAPI_L1_DCA - L1 data cache accesses
  • PAPI_L2_DCM - L2 data cache misses
  • PAPI_L2_DCA - L2 data cache accesses
  • PAPI_L3_TCM - L3 total cache misses

Branch metrics

  • PAPI_BR_CN - Conditional branches
  • PAPI_BR_MSP - Branch mispredictions
  • PAPI_BR_PRC - Conditional branches correctly predicted

Floating point

  • PAPI_FP_OPS - Floating point operations
  • PAPI_SP_OPS - Single precision operations
  • PAPI_DP_OPS - Double precision operations

Memory

  • PAPI_LD_INS - Load instructions
  • PAPI_SR_INS - Store instructions

Discovering available events

List native events

papi_native_avail

Check specific event

papi_event_chooser PAPI_TOT_CYC

Error handling

Always check return values from PAPI functions:

int retval = PAPI_function(...);
if (retval != PAPI_OK) {
    fprintf(stderr, "Error: %s\n", PAPI_strerror(retval));
    // Handle error appropriately
}

Common error codes

  • PAPI_OK - Success
  • PAPI_EINVAL - Invalid argument
  • PAPI_ENOMEM - Out of memory
  • PAPI_ECNFLCT - Event conflict
  • PAPI_ENOEVNT - Event not available

Advanced usage

Multiple events

int events[] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
long long values[3];

// Add all events
for (int i = 0; i < 3; i++) {
    PAPI_add_event(EventSet, events[i]);
}

PAPI_start(EventSet);
// ... code ...
PAPI_stop(EventSet, values);

// values[0] = cycles, values[1] = instructions, values[2] = cache misses

High-level API

PAPI also provides a high-level API for common metrics:

float real_time, proc_time, mflops;
long long flpops;

PAPI_flops(&real_time, &proc_time, &flpops, &mflops);
printf("MFLOPS: %f\n", mflops);

Thread safety

PAPI is thread-safe. Each thread should:

  1. Create its own EventSet
  2. Initialize counters independently
  3. Clean up its own resources

Performance considerations

  1. Overhead: PAPI has minimal overhead, but frequent start/stop operations can add up
  2. Counter Limits: Hardware has a limited number of simultaneous counters (typically 2-8)
  3. Multiplexing: Use PAPI multiplexing to measure more events than available counters
  4. Sampling: For long-running codes, consider sampling instead of continuous measurement

Troubleshooting

“Event not available”

Some events may not be available on all platforms: - Check with papi_avail - Use PAPI_query_event() to check availability before adding - Have fallback events ready

Permission issues

Some counters require special permissions:

  • May need to run as root
  • Or adjust /proc/sys/kernel/perf_event_paranoid to allow user access

Linking issues

If you get undefined references: - Ensure -lpapi is at the end of the link command - Check that LD_LIBRARY_PATH includes PAPI library directory - Verify module is loaded: module list

Additional resources

  • PAPI User’s Guide: man papi
  • PAPI API Reference: man PAPI_start
  • Example programs in PAPI distribution
  • Online PAPI documentation

Example makefile

Using gcc

CC = gcc
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi

example: example.c
    $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Using clang

Note

Requires module load llvm before running make.

CC = clang
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi

example: example.c
    $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Compiler-Agnostic Makefile

# Default to gcc (more widely available), but can be overridden
# For clang: make CC=clang (requires module load llvm)
CC ?= gcc
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi

example: example.c
    $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Usage:

# GCC (usually available by default)
module load papi/7
make CC=gcc       # Explicitly use gcc

# Clang (requires module load llvm first)
module load papi/7
module load llvm
make CC=clang     # Use clang

C++ Makefile Examples

Using g++

CXX = g++
CXXFLAGS = -O3 -Wall
LDFLAGS = -lpapi

example: example.cpp
    $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Using clang++

Note

Requires module load llvm before running make.

CXX = clang++
CXXFLAGS = -O3 -Wall -stdlib=libc++
LDFLAGS = -lpapi

example: example.cpp
    $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Compiler-Agnostic Makefile for C++

# Default to g++ (more widely available), but can be overridden
# For clang++: make CXX=clang++ (requires module load llvm)
CXX ?= g++
CXXFLAGS = -O3 -Wall
# Add -stdlib=libc++ if using clang++
ifeq ($(CXX),clang++)
    CXXFLAGS += -stdlib=libc++
endif
LDFLAGS = -lpapi

example: example.cpp
    $(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)

clean:
    rm -f example

Usage:

# G++ (usually available by default)
module load papi/7
make CXX=g++      # Explicitly use g++

# Clang++ (requires module load llvm first)
module load papi/7
module load llvm
make CXX=clang++  # Use clang++