PAPI¶
- About PAPI
- Prerequisites
- Compiling PAPI applications
- Basic PAPI usage pattern
- Complete code examples
- C++ code example
- Common performance events
- Discovering available events
- Error handling
- Advanced usage
- Performance considerations
- Troubleshooting
- Additional resources
- Example makefile
- C++ Makefile Examples
About PAPI¶
PAPI (Performance Application Programming Interface) provides a powerful and portable way to measure hardware performance at the lowest level. PAPI is an open-source project available at PAPI Project.
Build recipes and compilation instructions for this installation are available at:
https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/papi
PAPI enables you to:
What PAPI can do for you¶
- Measure Hardware Performance Counters
- Access CPU cycles, instruction counts, and execution metrics
- Monitor cache behaviour (hits, misses, accesses)
- Track branch prediction accuracy
- Measure floating-point operation rates
- Analyse memory bandwidth and access patterns
- Profile Application Performance
- Identify performance bottlenecks at the hardware level
- Understand why code runs slowly (cache misses, branch mispredictions, etc.)
- Compare performance across different algorithms or implementations
- Measure the impact of compiler optimizations
- Optimize Code Based on Data
- Use hardware metrics to guide optimization efforts
- Detect cache-unfriendly memory access patterns
- Identify branch prediction issues
- Measure floating-point efficiency
- Cross-Platform Performance Analysis
- Write performance measurement code once, run on multiple architectures
- Compare performance across different hardware platforms
- Conduct reproducible performance studies
Basic Workflow¶
The typical workflow for using PAPI is straightforward:
- Initialize PAPI - Set up the library
- Create event set and add events - Choose what to measure
- Start counters - Begin measurement
- Execute code - Run the code you want to profile
- Stop counters and read values - Get the performance data
- Cleanup - Release resources
With these steps, you can measure and analyse the performance characteristics of your applications at the hardware level, providing insights that are impossible to obtain from simple timing measurements alone.
Prerequisites¶
Load required modules:
# Load LLVM module (required for clang and clang++) module load llvm # Load PAPI module module load papi/7/7.2.0
Verify PAPI is available:
papi_version papi_avail
Verify compiler availability:
# Check GCC (usually available by default) gcc --version g++ --version # Check clang and clang++ versions and availability (requires llvm module to be loaded) clang --version clang++ --version
Compiling PAPI applications¶
Basic compilation¶
With gcc:
module load papi/7
gcc -o my_program my_program.c -lpapi
With clang:
module load papi/7
module load llvm
clang -o my_program my_program.c -lpapi
With optimization¶
For production code, use optimization flags:
With gcc:
module load papi/7
gcc -O3 -o my_program my_program.c -lpapi
With clang:
module load papi/7
module load llvm
clang -O3 -o my_program my_program.c -lpapi
Using module environment variables¶
The PAPI module sets up compiler flags automatically:
With g++:
module load papi/7
gcc $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi
With clang:
module load papi/7
module load llvm
clang $CFLAGS -o my_program my_program.c $LDFLAGS -lpapi
Or explicitly specify paths:
# GCC (usually available by default)
gcc -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi
# Clang (requires module load llvm)
module load llvm
clang -o my_program my_program.c -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -lpapi
Basic PAPI usage pattern¶
1. Initialize PAPI¶
#include <papi.h>
int retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
fprintf(stderr, "PAPI init error: %s\n", PAPI_strerror(retval));
exit(1);
}
2. Create an Event Set¶
int EventSet = PAPI_NULL;
retval = PAPI_create_eventset(&EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "Error creating eventset: %s\n", PAPI_strerror(retval));
exit(1);
}
3. Add Events to Measure¶
Common events:
PAPI_TOT_CYC- Total CPU cyclesPAPI_TOT_INS- Total instructionsPAPI_L1_DCM- L1 data cache missesPAPI_L2_DCM- L2 data cache missesPAPI_BR_MSP- Branch mispredictionsPAPI_FP_OPS- Floating point operations
retval = PAPI_add_event(EventSet, PAPI_TOT_CYC);
if (retval != PAPI_OK) {
fprintf(stderr, "Error adding event: %s\n", PAPI_strerror(retval));
exit(1);
}
4. Start Measurement¶
retval = PAPI_start(EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "Error starting counters: %s\n", PAPI_strerror(retval));
exit(1);
}
5. Execute Code to Measure¶
// Your code here
for (int i = 0; i < iterations; i++) {
// computation
}
6. Stop and Read Values¶
long long values[1];
retval = PAPI_stop(EventSet, values);
if (retval != PAPI_OK) {
fprintf(stderr, "Error stopping counters: %s\n", PAPI_strerror(retval));
exit(1);
}
printf("Total cycles: %lld\n", values[0]);
7. Cleanup¶
PAPI_cleanup_eventset(EventSet);
PAPI_destroy_eventset(&EventSet);
PAPI_shutdown();
Complete code examples¶
C code example¶
Here is a complete working example in C that demonstrates all the steps together:
/* Created by Veselin Kolev <v.kolev@discoverer.bg> on 31 December 2025
*
* Complete PAPI Example
*
* This program demonstrates basic usage of PAPI to measure hardware
* performance counters. It performs a simple computation and measures
* CPU cycles, instructions, and cache misses.
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <papi.h>
#define NUM_EVENTS 3
#define ERROR_RETURN(retval) { \
fprintf(stderr, "Error %d %s:line %d: \n", retval, __FILE__, __LINE__); \
exit(retval); \
}
int main(int argc, char **argv)
{
int retval, i;
int EventSet = PAPI_NULL;
long long values[NUM_EVENTS];
int events[NUM_EVENTS] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
char event_names[NUM_EVENTS][PAPI_MAX_STR_LEN];
int event_indices[NUM_EVENTS]; /* Track which events were successfully added */
int num_added = 0;
/* Initialize PAPI library */
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
fprintf(stderr, "PAPI library init error: %s\n", PAPI_strerror(retval));
ERROR_RETURN(retval);
}
printf("PAPI initialized successfully\n");
printf("PAPI Version: %d.%d.%d\n",
PAPI_VERSION_MAJOR(PAPI_VERSION),
PAPI_VERSION_MINOR(PAPI_VERSION),
PAPI_VERSION_REVISION(PAPI_VERSION));
/* Create EventSet */
retval = PAPI_create_eventset(&EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI create eventset error: %s\n", PAPI_strerror(retval));
ERROR_RETURN(retval);
}
/* Add events to EventSet */
for (i = 0; i < NUM_EVENTS; i++) {
/* Check if event is available before adding */
retval = PAPI_query_event(events[i]);
if (retval != PAPI_OK) {
retval = PAPI_event_code_to_name(events[i], event_names[i]);
if (retval == PAPI_OK) {
fprintf(stderr, "Warning: Event %s is not available on this platform\n", event_names[i]);
} else {
fprintf(stderr, "Warning: Event %d is not available on this platform\n", events[i]);
}
continue;
}
retval = PAPI_add_event(EventSet, events[i]);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI add event error: %s\n", PAPI_strerror(retval));
retval = PAPI_event_code_to_name(events[i], event_names[i]);
if (retval == PAPI_OK) {
fprintf(stderr, "Event %s may not be available on this platform\n", event_names[i]);
} else {
fprintf(stderr, "Event %d may not be available on this platform\n", events[i]);
}
continue;
}
/* Get event name for display */
retval = PAPI_event_code_to_name(events[i], event_names[i]);
if (retval != PAPI_OK) {
sprintf(event_names[i], "Event_%d", events[i]);
}
/* Track successfully added events */
event_indices[num_added] = i;
num_added++;
}
if (num_added == 0) {
fprintf(stderr, "Error: No events could be added to the eventset\n");
ERROR_RETURN(1);
}
printf("Successfully added %d event(s) to eventset\n", num_added);
/* Start counting */
retval = PAPI_start(EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI start error: %s\n", PAPI_strerror(retval));
ERROR_RETURN(retval);
}
printf("\nStarting measurement...\n");
/* Perform some computation */
volatile double sum = 0.0;
int iterations = 1000000;
for (i = 0; i < iterations; i++) {
sum += i * 1.5;
}
printf("Computation completed (sum = %f)\n", sum);
/* Stop counting and read values */
retval = PAPI_stop(EventSet, values);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI stop error: %s\n", PAPI_strerror(retval));
ERROR_RETURN(retval);
}
/* Display results */
printf("\n=== Performance Counter Results ===\n");
for (i = 0; i < num_added; i++) {
int idx = event_indices[i];
printf("%-30s: %lld\n", event_names[idx], values[i]);
}
/* Calculate derived metrics */
/* Find which events were successfully added and their positions */
int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1;
for (i = 0; i < num_added; i++) {
int idx = event_indices[i];
if (events[idx] == PAPI_TOT_CYC) cycles_pos = i;
if (events[idx] == PAPI_TOT_INS) ins_pos = i;
if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i;
}
if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) {
double cpi = (double)values[cycles_pos] / (double)values[ins_pos];
printf("\nCycles per Instruction (CPI): %.4f\n", cpi);
}
if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) {
if (ins_pos >= 0 && values[ins_pos] > 0) {
/* Calculate miss rate per instruction */
double miss_rate = (double)values[cache_miss_pos] / (double)values[ins_pos] * 100.0;
printf("L1 Data Cache Miss Rate: %.4f%% (misses per instruction)\n", miss_rate);
} else {
/* Just show the raw number if we don't have instruction count */
printf("\nL1 Data Cache Misses: %lld\n", values[cache_miss_pos]);
}
}
/* Cleanup */
retval = PAPI_cleanup_eventset(EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI cleanup eventset error: %s\n", PAPI_strerror(retval));
}
retval = PAPI_destroy_eventset(&EventSet);
if (retval != PAPI_OK) {
fprintf(stderr, "PAPI destroy eventset error: %s\n", PAPI_strerror(retval));
}
PAPI_shutdown();
printf("\nPAPI test completed successfully\n");
return 0;
}
Save this code to a file (e.g., example.c) and compile it:
With gcc:
module load papi/7
gcc $CFLAGS -o example example.c $LDFLAGS -lpapi
With clang:
module load llvm
module load papi/7
clang $CFLAGS -o example example.c $LDFLAGS -lpapi
Or explicitly specify paths:
module load papi/7
gcc -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi
# Clang (requires module load llvm first)
module load papi/7
module load llvm
clang -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.c -lpapi
C++ code example¶
Here is the same example written in C++ using modern C++ features:
/* Created by Veselin Kolev <v.kolev@discoverer.bg> on 31 December 2025
*
* Complete PAPI Example (C++)
*
* This program demonstrates basic usage of PAPI to measure hardware
* performance counters. It performs a simple computation and measures
* CPU cycles, instructions, and cache misses.
*
*/
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <cstdio>
#include <papi.h>
extern "C" {
#include <stdlib.h>
}
const int NUM_EVENTS = 3;
void error_exit(int retval, const char* file, int line) {
std::cerr << "Error " << retval << " " << file << ":line " << line << std::endl;
std::exit(retval);
}
int main(int argc, char **argv)
{
int retval;
int EventSet = PAPI_NULL;
std::vector<long long> values(NUM_EVENTS);
std::vector<int> events = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
std::vector<std::string> event_names(NUM_EVENTS);
std::vector<int> event_indices; // Track which events were successfully added
int num_added = 0;
/* Initialize PAPI library */
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {
std::cerr << "PAPI library init error: " << PAPI_strerror(retval) << std::endl;
error_exit(retval, __FILE__, __LINE__);
}
std::cout << "PAPI initialized successfully" << std::endl;
std::cout << "PAPI Version: "
<< PAPI_VERSION_MAJOR(PAPI_VERSION) << "."
<< PAPI_VERSION_MINOR(PAPI_VERSION) << "."
<< PAPI_VERSION_REVISION(PAPI_VERSION) << std::endl;
/* Create EventSet */
retval = PAPI_create_eventset(&EventSet);
if (retval != PAPI_OK) {
std::cerr << "PAPI create eventset error: " << PAPI_strerror(retval) << std::endl;
error_exit(retval, __FILE__, __LINE__);
}
/* Add events to EventSet */
for (size_t i = 0; i < events.size(); i++) {
/* Check if event is available before adding */
retval = PAPI_query_event(events[i]);
if (retval != PAPI_OK) {
char name[PAPI_MAX_STR_LEN];
retval = PAPI_event_code_to_name(events[i], name);
if (retval == PAPI_OK) {
std::cerr << "Warning: Event " << name
<< " is not available on this platform" << std::endl;
} else {
std::cerr << "Warning: Event " << events[i]
<< " is not available on this platform" << std::endl;
}
continue;
}
retval = PAPI_add_event(EventSet, events[i]);
if (retval != PAPI_OK) {
std::cerr << "PAPI add event error: " << PAPI_strerror(retval) << std::endl;
char name[PAPI_MAX_STR_LEN];
retval = PAPI_event_code_to_name(events[i], name);
if (retval == PAPI_OK) {
std::cerr << "Event " << name
<< " may not be available on this platform" << std::endl;
} else {
std::cerr << "Event " << events[i]
<< " may not be available on this platform" << std::endl;
}
continue;
}
/* Get event name for display */
char name[PAPI_MAX_STR_LEN];
retval = PAPI_event_code_to_name(events[i], name);
if (retval == PAPI_OK) {
event_names[i] = std::string(name);
} else {
event_names[i] = "Event_" + std::to_string(events[i]);
}
/* Track successfully added events */
event_indices.push_back(i);
num_added++;
}
if (num_added == 0) {
std::cerr << "Error: No events could be added to the eventset" << std::endl;
error_exit(1, __FILE__, __LINE__);
}
std::cout << "Successfully added " << num_added << " event(s) to eventset" << std::endl;
/* Start counting */
retval = PAPI_start(EventSet);
if (retval != PAPI_OK) {
std::cerr << "PAPI start error: " << PAPI_strerror(retval) << std::endl;
error_exit(retval, __FILE__, __LINE__);
}
std::cout << "\nStarting measurement..." << std::endl;
/* Perform some computation */
volatile double sum = 0.0;
const int iterations = 1000000;
for (int i = 0; i < iterations; i++) {
sum += i * 1.5;
}
std::cout << "Computation completed (sum = " << std::fixed
<< std::setprecision(6) << sum << ")" << std::endl;
/* Stop counting and read values */
retval = PAPI_stop(EventSet, values.data());
if (retval != PAPI_OK) {
std::cerr << "PAPI stop error: " << PAPI_strerror(retval) << std::endl;
error_exit(retval, __FILE__, __LINE__);
}
/* Display results */
std::cout << "\n=== Performance Counter Results ===" << std::endl;
for (int i = 0; i < num_added; i++) {
int idx = event_indices[i];
std::cout << std::left << std::setw(30) << event_names[idx]
<< ": " << values[i] << std::endl;
}
/* Calculate derived metrics */
/* Find which events were successfully added and their positions */
int cycles_pos = -1, ins_pos = -1, cache_miss_pos = -1;
for (int i = 0; i < num_added; i++) {
int idx = event_indices[i];
if (events[idx] == PAPI_TOT_CYC) cycles_pos = i;
if (events[idx] == PAPI_TOT_INS) ins_pos = i;
if (events[idx] == PAPI_L1_DCM) cache_miss_pos = i;
}
if (cycles_pos >= 0 && ins_pos >= 0 && values[ins_pos] > 0) {
double cpi = static_cast<double>(values[cycles_pos]) / static_cast<double>(values[ins_pos]);
std::cout << "\nCycles per Instruction (CPI): "
<< std::fixed << std::setprecision(4) << cpi << std::endl;
}
if (cache_miss_pos >= 0 && values[cache_miss_pos] > 0) {
if (ins_pos >= 0 && values[ins_pos] > 0) {
/* Calculate miss rate per instruction */
double miss_rate = static_cast<double>(values[cache_miss_pos])
/ static_cast<double>(values[ins_pos]) * 100.0;
std::cout << "L1 Data Cache Miss Rate: "
<< std::fixed << std::setprecision(4) << miss_rate
<< "% (misses per instruction)" << std::endl;
} else {
/* Just show the raw number if we don't have instruction count */
std::cout << "\nL1 Data Cache Misses: "
<< values[cache_miss_pos] << std::endl;
}
}
/* Cleanup */
retval = PAPI_cleanup_eventset(EventSet);
if (retval != PAPI_OK) {
std::cerr << "PAPI cleanup eventset error: " << PAPI_strerror(retval) << std::endl;
}
retval = PAPI_destroy_eventset(&EventSet);
if (retval != PAPI_OK) {
std::cerr << "PAPI destroy eventset error: " << PAPI_strerror(retval) << std::endl;
}
PAPI_shutdown();
std::cout << "\nPAPI test completed successfully" << std::endl;
return 0;
}
Save this code to a file (e.g., example.cpp) and compile it:
With g++:
module load papi/7
g++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi
With clang++:
module load llvm
module load papi/7
clang++ -stdlib=libc++ $CXXFLAGS -o example example.cpp $LDFLAGS -lpapi
Or explicitly specify paths:
module load papi/7
g++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi
# Clang++ (requires module load llvm first)
module load papi/7
module load llvm
clang++ -stdlib=libc++ -I$PAPI_ROOT/include -L$PAPI_ROOT/lib -o example example.cpp -lpapi
Common performance events¶
CPU metrics¶
PAPI_TOT_CYC- Total CPU cyclesPAPI_TOT_INS- Total instructions executedPAPI_REF_CYC- Reference cycles
Cache metrics¶
PAPI_L1_DCM- L1 data cache missesPAPI_L1_DCA- L1 data cache accessesPAPI_L2_DCM- L2 data cache missesPAPI_L2_DCA- L2 data cache accessesPAPI_L3_TCM- L3 total cache misses
Branch metrics¶
PAPI_BR_CN- Conditional branchesPAPI_BR_MSP- Branch mispredictionsPAPI_BR_PRC- Conditional branches correctly predicted
Floating point¶
PAPI_FP_OPS- Floating point operationsPAPI_SP_OPS- Single precision operationsPAPI_DP_OPS- Double precision operations
Discovering available events¶
List all available events¶
papi_avail
List native events¶
papi_native_avail
Check specific event¶
papi_event_chooser PAPI_TOT_CYC
Error handling¶
Always check return values from PAPI functions:
int retval = PAPI_function(...);
if (retval != PAPI_OK) {
fprintf(stderr, "Error: %s\n", PAPI_strerror(retval));
// Handle error appropriately
}
Common error codes
PAPI_OK- SuccessPAPI_EINVAL- Invalid argumentPAPI_ENOMEM- Out of memoryPAPI_ECNFLCT- Event conflictPAPI_ENOEVNT- Event not available
Advanced usage¶
Multiple events¶
int events[] = {PAPI_TOT_CYC, PAPI_TOT_INS, PAPI_L1_DCM};
long long values[3];
// Add all events
for (int i = 0; i < 3; i++) {
PAPI_add_event(EventSet, events[i]);
}
PAPI_start(EventSet);
// ... code ...
PAPI_stop(EventSet, values);
// values[0] = cycles, values[1] = instructions, values[2] = cache misses
High-level API¶
PAPI also provides a high-level API for common metrics:
float real_time, proc_time, mflops;
long long flpops;
PAPI_flops(&real_time, &proc_time, &flpops, &mflops);
printf("MFLOPS: %f\n", mflops);
Thread safety¶
PAPI is thread-safe. Each thread should:
- Create its own EventSet
- Initialize counters independently
- Clean up its own resources
Performance considerations¶
- Overhead: PAPI has minimal overhead, but frequent start/stop operations can add up
- Counter Limits: Hardware has a limited number of simultaneous counters (typically 2-8)
- Multiplexing: Use PAPI multiplexing to measure more events than available counters
- Sampling: For long-running codes, consider sampling instead of continuous measurement
Troubleshooting¶
“Event not available”¶
Some events may not be available on all platforms: - Check with
papi_avail - Use PAPI_query_event() to check availability before
adding - Have fallback events ready
Permission issues¶
Some counters require special permissions:
- May need to run as root
- Or adjust
/proc/sys/kernel/perf_event_paranoidto allow user access
Linking issues¶
If you get undefined references: - Ensure -lpapi is at the end of the link command - Check that LD_LIBRARY_PATH includes PAPI library directory - Verify module is loaded: module list
Additional resources¶
- PAPI User’s Guide:
man papi - PAPI API Reference:
man PAPI_start - Example programs in PAPI distribution
- Online PAPI documentation
Example makefile¶
Using gcc¶
CC = gcc
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi
example: example.c
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Using clang¶
Note
Requires module load llvm before running make.
CC = clang
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi
example: example.c
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Compiler-Agnostic Makefile¶
# Default to gcc (more widely available), but can be overridden
# For clang: make CC=clang (requires module load llvm)
CC ?= gcc
CFLAGS = -O3 -Wall
LDFLAGS = -lpapi
example: example.c
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Usage:
# GCC (usually available by default)
module load papi/7
make CC=gcc # Explicitly use gcc
# Clang (requires module load llvm first)
module load papi/7
module load llvm
make CC=clang # Use clang
C++ Makefile Examples¶
Using g++¶
CXX = g++
CXXFLAGS = -O3 -Wall
LDFLAGS = -lpapi
example: example.cpp
$(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Using clang++¶
Note
Requires module load llvm before running make.
CXX = clang++
CXXFLAGS = -O3 -Wall -stdlib=libc++
LDFLAGS = -lpapi
example: example.cpp
$(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Compiler-Agnostic Makefile for C++¶
# Default to g++ (more widely available), but can be overridden
# For clang++: make CXX=clang++ (requires module load llvm)
CXX ?= g++
CXXFLAGS = -O3 -Wall
# Add -stdlib=libc++ if using clang++
ifeq ($(CXX),clang++)
CXXFLAGS += -stdlib=libc++
endif
LDFLAGS = -lpapi
example: example.cpp
$(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)
clean:
rm -f example
Usage:
# G++ (usually available by default)
module load papi/7
make CXX=g++ # Explicitly use g++
# Clang++ (requires module load llvm first)
module load papi/7
module load llvm
make CXX=clang++ # Use clang++