XZ¶
High compression ratio data compression library and tools
Overview¶
XZ is a free general-purpose data compression library and tools that provide a high compression ratio.
Warning
The version of XZ that was affected by the embedded backdoor code, as outlined in the CERT.EU 2024-032, was not installed on Discoverer and will never be installed in our software repository.
Note
We provide XZ installation that is faster and more reliable than the system-wide one. Therefore, we recommend using our installation instead of the system-wide one (see below).
Available versions¶
To view available xz versions:
$ module avail xz
Build recipes and configuration details are maintained in our GitLab repository:
Build optimizations¶
Our XZ installations are optimized for maximum performance on Discoverer’s hardware. We use the recent LLVM Compiler Infrastructure compilers to build the XZ library code, which are the default compilers on Discoverer Petascale Supercomputer.
Compiler optimizations:
- Link Time Optimization (LTO): Full LTO (
-flto=full) is enabled for both compilation and linking, allowing cross-module optimizations that significantly improve performance. - CPU-Specific Optimizations:
-
-march=native: Optimizes for the native CPU architecture, enabling all available instruction sets --mtune=native: Tunes the generated code specifically for the target CPU --mfma: Enables FMA (Fused Multiply-Add) instructions for improved floating-point performance - Position Independent Code:
-fPICis used to enable shared library support.
Linker optimizations:
- LLD Linker: We use LLVM’s LLD linker (
CMAKE_LINKER_TYPE=LLD) for faster linking and better optimization support. - LTO at Link Time:
-flto=full -Wl,--lto-O3enables full link-time optimization with optimization level 3, allowing the linker to perform whole-program optimizations.
Build configuration:
- Release Build:
CMAKE_BUILD_TYPE=Releaseensures all optimizations are enabled. - Hardware-Accelerated CRC:
DXZ_CLMUL_CRC=ONenables CLMUL (Carry-less Multiplication) hardware acceleration for CRC32 and CRC64 checksums, providing significant performance improvements on modern CPUs. - Multi-threading:
DXZ_THREADS=yesenables multi-threaded compression and decompression support. - Match Finders: Multiple match finder algorithms are enabled (
hc3;hc4;bt2;bt3;bt4) to provide the best compression ratio and speed trade-offs. - Checksum Support: All checksum types are enabled (
crc32;crc64;sha256) for data integrity verification. - Memory Optimization:
DXZ_ASSUME_RAM="512"assumes 512MB of available RAM, allowing the build system to optimize for this memory configuration. - Full Feature Set:
DXZ_SMALL=OFFensures all features are enabled, prioritizing performance over binary size.
Build system:
- Build Tool: Ninja build system is used for fast parallel builds.
- Parallel Compilation: Builds use 4 parallel compilation jobs for efficient resource utilization.
- Testing: All builds are tested using the comprehensive test suite (ctest) before installation, ensuring correctness and reliability.
- Dual Library Builds: Both shared (
.so) and static (.a) libraries are built and installed, providing flexibility for different use cases.
These optimizations ensure that our XZ installation provides the fastest possible compression and decompression performance for CPU-based applications on Discoverer, while maintaining full compatibility with the standard XZ API.
Compiler support¶
Warning
For not on we will support only LLVM builds of XZ. No other builds will be officially supported.
Supported builds¶
Production builds:
module avail xz
Legacy builds (retiring soon, deprecated, do not use)
module avail xz/*/*llvm # LLVM build (this is not default but we do not use the compiler name in the module name)
module avail xz/*/*gcc # GCC build (deprecated, will be retired soon)
module avail xz/*/*intel # Intel oneAPI build (deprecated, will be retired soon)
module avail xz/*/*aocc # AMD AOCC build (deprecated, will be retired soon)
Available libraries¶
XZ provides the liblzma shared library that is installed by default:
liblzma.so- LZMA compression libraryThis library implements the LZMA (Lempel-Ziv-Markov chain Algorithm) compression algorithm, providing high compression ratios with good performance.
- Header file:
lzma.h - Link flag:
-llzma - pkg-config:
liblzma
- Header file:
Note
The library uses optimized implementations and can be used in both C and C++ applications.
Library variants¶
The liblzma library is available as both static (.a) and shared (.so) libraries. The Environment Modules automatically configure the appropriate paths for dynamic linking, which is the recommended approach for HPC environments.
- Shared libraries (recommended):
liblzma.sois used by default- Automatically configured when loading the module
- Recommended for HPC environments
- Static libraries:
liblzma.ais also available- Use only if your application specifically requires static linking
- Requires explicit
-staticflag during linking
Linking your application¶
After loading the xz module, the environment variables are automatically configured. You can link your application using one of the following methods:
Method 1: Using environment variables (recommended)
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -llzma
clang -o myapp myapp.c $CFLAGS $LDFLAGS -llzma
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma
Method 2: Using pkg-config
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c $(pkg-config --cflags --libs liblzma)
clang -o myapp myapp.c $(pkg-config --cflags --libs liblzma)
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp $(pkg-config --cflags --libs liblzma)
clang++ -o myapp myapp.cpp $(pkg-config --cflags --libs liblzma)
Method 3: Manual linking
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
clang -o myapp myapp.c -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
clang++ -o myapp myapp.cpp -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
Static linking (if required):
If your application specifically requires static linking:
# C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -llzma -static
clang -o myapp myapp.c $CFLAGS $LDFLAGS -llzma -static
# C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma -static
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma -static
Note
The Environment Modules automatically set CFLAGS, CXXFLAGS, and LDFLAGS when you load the module. Using these variables is the recommended approach as they remain correct even if the module path changes.
Replacing the system-wide xz installation¶
To use the liblzma.so library from our installation instead of relying on the system-wide installation:
module load xz/<version>
./your_program # will automatically use liblzma.so from xz installation
This way your executable will use the xz library from our installation instead of the system-wide one.
Command-line utilities¶
XZ provides a comprehensive set of command-line utilities for compression, decompression, and working with compressed files. After loading the xz module, these utilities are available in your PATH.
Main compression/decompression tools:
xz- Main compression and decompression toolThe primary utility for compressing and decompressing files in the .xz format. It can also handle .lzma files when invoked under different names (see polymorphism below).
- Compresses files to .xz format
- Decompresses .xz and .lzma files
- Supports various compression levels (0-9)
- Supports multi-threading for faster compression
xzdec- Simple decompressor- A lightweight decompression-only tool for .xz files. It is smaller than
xzand does not support compression, making it useful for embedded systems or when only decompression is needed. lzmadec- Simple LZMA decompressor- A lightweight decompression-only tool specifically for .lzma files. Similar to
xzdecbut for the legacy LZMA format. lzmainfo- LZMA file information- Displays information about .lzma compressed files, including compression method, uncompressed size, and other metadata.
Convenience tools (symlinks to xz):
Several tools are implemented as symlinks to the main xz binary. The xz program detects which name it was invoked under (using argv[0]) and adjusts its behavior accordingly. This polymorphism allows one binary to provide multiple interfaces:
xzcat->xz- Decompresses .xz files to standard output (equivalent to
xz -dc). Useful for piping decompressed data to other commands. lzcat->xz- Decompresses .lzma files to standard output. Provides compatibility with legacy LZMA format.
lzma->xz- Compresses files to .lzma format (legacy format). When invoked as
lzma, the tool uses the older LZMA format instead of the newer .xz format. unxz->xz- Decompresses .xz files (equivalent to
xz -d). Provides an intuitive name for decompression operations. unlzma->xz- Decompresses .lzma files. Provides compatibility with legacy LZMA format.
Comparison tools:
xzdiff- Compare compressed files- Compares two compressed files by decompressing them and running
diff. Useful for comparing versions of files stored in compressed format. xzcmp->xzdiff- Compares compressed files using
cmpinstead ofdiff. Useful for binary file comparisons. lzdiff->xzdiff- Compares .lzma compressed files using
diff. lzcmp->xzdiff- Compares .lzma compressed files using
cmp.
Search tools:
xzgrep- Search compressed files- Searches compressed files for patterns using
grep. Decompresses files on-the-fly and searches the content without requiring manual decompression. xzegrep->xzgrep- Searches compressed files using
egrep(extended regular expressions). xzfgrep->xzgrep- Searches compressed files using
fgrep(fixed strings). lzgrep->xzgrep- Searches .lzma compressed files using
grep. lzegrep->xzgrep- Searches .lzma compressed files using
egrep. lzfgrep->xzgrep- Searches .lzma compressed files using
fgrep.
Viewing tools:
xzless- View compressed files with less- Views compressed files using the
lesspager. Decompresses files on-the-fly for viewing. xzmore- View compressed files with more- Views compressed files using the
morepager. Decompresses files on-the-fly for viewing. lzless->xzless- Views .lzma compressed files using
less. lzmore->xzmore- Views .lzma compressed files using
more.
How polymorphism works:
The XZ utilities use a common Unix pattern called “name-based polymorphism” or “argv[0] polymorphism”. When a program is invoked, the operating system passes the program name as the first argument (argv[0]). The xz binary checks this name to determine its behavior:
- If invoked as
xz, it compresses/decompresses .xz files - If invoked as
lzma, it compresses/decompresses .lzma files - If invoked as
xzcatorlzcat, it decompresses to stdout - If invoked as
unxzorunlzma, it forces decompression mode
This design allows:
- Space efficiency: One binary provides multiple tools
- Consistency: All tools share the same core implementation and behavior
- Compatibility: Legacy tool names (like lzma) continue to work
- Flexibility: Users can choose the most intuitive name for their task
All symlinks point to the same xz binary, which adapts its behavior based on how it was invoked. This is why you can use xzcat, lzcat, unxz, or unlzma and they all work correctly despite being the same underlying program.
Example usage:
# Load the module
module load xz/<version>
# Compress a file
xz myfile.txt # Creates myfile.txt.xz
# Decompress a file
unxz myfile.txt.xz # Restores myfile.txt
# or
xzcat myfile.txt.xz # Decompresses to stdout
# Search in compressed files
xzgrep "pattern" *.xz
# View compressed file
xzless archive.xz
# Compare compressed files
xzdiff file1.xz file2.xz
Warning
When processing large files or multiple files, use Slurm batch jobs to execute these utilities on compute nodes rather than login nodes.