C-blosc2¶
High-performance compression library optimized for binary data
Overview¶
C-Blosc2 is a high-performance compression library specifically optimized for binary data, such as numerical arrays, tensors, and scientific datasets. It provides a flexible framework of codecs and filters that allow developers to balance compression speed and ratio according to their specific use cases.
C-Blosc2 is the successor to the original C-Blosc library and provides backward compatibility to both the C-Blosc1 API and its in-memory format. However, buffers generated with C-Blosc2 are not format-compatible with C-Blosc1. For full API compatibility with C-Blosc1, define the BLOSC1_COMPAT symbol.
The library is optimized for fast compression and decompression of binary data, supporting multiple compression codecs including BloscLZ, LZ4, LZ4HC, Zlib, Zstd, and Zlib-NG. It includes shuffle and bitshuffle pre-filters for improved compression ratios on numerical data, built-in support for multi-threaded compression and decompression, and is designed to work efficiently with modern CPU cache architectures.
HDF5 integration:
C-Blosc2 acts as an HDF5 compression filter and introduces a second partitioning layer that allows for larger HDF5 chunks, ideally fitting into modern CPU caches, while Blosc2’s internal blocks serve as the minimum data units for read and decompress operations. This dual-layer partitioning approach bypasses the standard HDF5 pipeline during both writing and reading, resulting in substantial performance improvements for scientific computing workloads. The integration provides reduced I/O operation times, optimized data layout for CPU cache efficiency, flexible codec and compression level selection, and transparent integration with existing HDF5 applications through the HDF5 filter mechanism.
Available versions¶
To view available c-blosc2 versions:
module avail c-blosc2
Build recipes and configuration details are maintained in our GitLab repository:
Build optimizations¶
Our C-Blosc2 installations are optimized for maximum performance on Discoverer’s hardware. We use the recent LLVM Compiler Infrastructure compilers to build the C-Blosc2 library code, which are the default compilers on Discoverer Petascale Supercomputer.
Compiler optimizations:
- Link Time Optimization (LTO): Full LTO (
-flto=full) is enabled for both compilation and linking, allowing cross-module optimizations that significantly improve performance. - CPU-Specific Optimizations:
-
-march=native: Optimizes for the native CPU architecture, enabling all available instruction sets --mtune=native: Tunes the generated code specifically for the target CPU --mfma: Enables FMA (Fused Multiply-Add) instructions for improved floating-point performance - Position Independent Code:
-fPICis used to enable shared library support.
Linker optimizations:
- LLD Linker: We use LLVM’s LLD linker for faster linking and better optimization support.
- LTO at Link Time: Full link-time optimization enables whole-program optimizations.
Build configuration:
- Release Build: All optimizations are enabled for production use.
- Multi-threading: Built-in support for multi-threaded compression and decompression.
- Full Feature Set: All codecs and filters are enabled, providing maximum flexibility.
- Dual Library Builds: Both shared (
.so) and static (.a) libraries are built and installed, providing flexibility for different use cases.
These optimizations ensure that our C-Blosc2 installation provides the fastest possible compression and decompression performance for CPU-based applications on Discoverer, while maintaining full compatibility with the standard C-Blosc2 API.
Available libraries¶
C-Blosc2 provides the libblosc2 shared library that is installed by default:
libblosc2.so- Blosc2 compression libraryThis library implements the Blosc2 compression framework, providing high-speed compression and decompression for binary data, particularly numerical arrays and tensors.
- Header file:
blosc2.h - Link flag:
-lblosc2 - pkg-config:
blosc2
- Header file:
Note
The library uses optimized implementations and can be used in both C and C++ applications. It is particularly effective for scientific computing workloads involving numerical data.
Library variants¶
The libblosc2 library is available as both static (.a) and shared (.so) libraries. The Environment Modules automatically configure the appropriate paths for dynamic linking, which is the recommended approach for HPC environments.
- Shared libraries (recommended):
libblosc2.sois used by default- Automatically configured when loading the module
- Recommended for HPC environments
- Static libraries:
libblosc2.ais also available- Use only if your application specifically requires static linking
- Requires explicit
-staticflag during linking
Linking your application¶
After loading the c-blosc2 module, the environment variables are automatically configured. You can link your application using one of the following methods:
Method 1: Using environment variables (recommended)
# Load the module first
module load c-blosc2/2/<version>
# Link against libblosc2 - C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -lblosc2
clang -o myapp myapp.c $CFLAGS $LDFLAGS -lblosc2
# Link against libblosc2 - C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -lblosc2
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -lblosc2
Method 2: Using pkg-config
# Load the module first
module load c-blosc2/2/<version>
# Link against libblosc2 - C code
gcc -o myapp myapp.c $(pkg-config --cflags --libs blosc2)
clang -o myapp myapp.c $(pkg-config --cflags --libs blosc2)
# Link against libblosc2 - C++ code
g++ -o myapp myapp.cpp $(pkg-config --cflags --libs blosc2)
clang++ -o myapp myapp.cpp $(pkg-config --cflags --libs blosc2)
Method 3: Manual linking
# Load the module first
module load c-blosc2/2/<version>
# Link against libblosc2 - C code
gcc -o myapp myapp.c -I$BLOSC2_ROOT/include -L$BLOSC2_ROOT/lib64 -lblosc2
clang -o myapp myapp.c -I$BLOSC2_ROOT/include -L$BLOSC2_ROOT/lib64 -lblosc2
# Link against libblosc2 - C++ code
g++ -o myapp myapp.cpp -I$BLOSC2_ROOT/include -L$BLOSC2_ROOT/lib64 -lblosc2
clang++ -o myapp myapp.cpp -I$BLOSC2_ROOT/include -L$BLOSC2_ROOT/lib64 -lblosc2
Static linking (if required):
If your application specifically requires static linking:
# C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -lblosc2 -static
clang -o myapp myapp.c $CFLAGS $LDFLAGS -lblosc2 -static
# C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -lblosc2 -static
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -lblosc2 -static
Note
The Environment Modules automatically set CFLAGS, CXXFLAGS, and LDFLAGS when you load the module. Using these variables is the recommended approach as they remain correct even if the module path changes.
Using with HDF5¶
C-Blosc2 integrates seamlessly with HDF5 through the HDF5 filter mechanism. To use C-Blosc2 compression with HDF5:
- Load both modules:
module load c-blosc2/2/<version>
module load hdf5/<version>
- Link your application:
# Link against both HDF5 and C-Blosc2
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -lhdf5 -lblosc2
clang -o myapp myapp.c $CFLAGS $LDFLAGS -lhdf5 -lblosc2
- Use in your code:
When creating HDF5 datasets, you can enable C-Blosc2 compression by setting the appropriate compression filter in your HDF5 property list. The C-Blosc2 filter ID is registered with HDF5 and can be used transparently.
Note
The C-Blosc2 HDF5 filter provides significant performance improvements for I/O operations, especially when working with large numerical datasets. The dual-layer partitioning approach optimizes data access patterns for modern CPU cache architectures.