XZ
High compression ratio data compression library and tools
Overview
XZ is a free general-purpose data compression library and tools that provide a high compression ratio.
Warning
The version of XZ that was affected by the embedded backdoor code, as outlined in the CERT.EU 2024-032, was not installed on Discoverer and will never be installed in our software repository.
Note
We provide XZ installation that is faster and more reliable than the system-wide one. Therefore, we recommend using our installation instead of the system-wide one (see below).
Available versions
To view available xz versions:
$ module avail xz
Build recipes and configuration details are maintained in our GitLab repository:
Build optimizations
Our XZ installations are optimized for maximum performance on Discoverer’s hardware. We use the recent LLVM Compiler Infrastructure compilers to build the XZ library code, which are the default compilers on Discoverer Petascale Supercomputer.
Compiler optimizations:
Link Time Optimization (LTO): Full LTO (
-flto=full) is enabled for both compilation and linking, allowing cross-module optimizations that significantly improve performance.CPU-Specific Optimizations: -
-march=native: Optimizes for the native CPU architecture, enabling all available instruction sets --mtune=native: Tunes the generated code specifically for the target CPU --mfma: Enables FMA (Fused Multiply-Add) instructions for improved floating-point performancePosition Independent Code:
-fPICis used to enable shared library support.
Linker optimizations:
LLD Linker: We use LLVM’s LLD linker (
CMAKE_LINKER_TYPE=LLD) for faster linking and better optimization support.LTO at Link Time:
-flto=full -Wl,--lto-O3enables full link-time optimization with optimization level 3, allowing the linker to perform whole-program optimizations.
Build configuration:
Release Build:
CMAKE_BUILD_TYPE=Releaseensures all optimizations are enabled.Hardware-Accelerated CRC:
DXZ_CLMUL_CRC=ONenables CLMUL (Carry-less Multiplication) hardware acceleration for CRC32 and CRC64 checksums, providing significant performance improvements on modern CPUs.Multi-threading:
DXZ_THREADS=yesenables multi-threaded compression and decompression support.Match Finders: Multiple match finder algorithms are enabled (
hc3;hc4;bt2;bt3;bt4) to provide the best compression ratio and speed trade-offs.Checksum Support: All checksum types are enabled (
crc32;crc64;sha256) for data integrity verification.Memory Optimization:
DXZ_ASSUME_RAM="512"assumes 512MB of available RAM, allowing the build system to optimize for this memory configuration.Full Feature Set:
DXZ_SMALL=OFFensures all features are enabled, prioritizing performance over binary size.
Build system:
Build Tool: Ninja build system is used for fast parallel builds.
Parallel Compilation: Builds use 4 parallel compilation jobs for efficient resource utilization.
Testing: All builds are tested using the comprehensive test suite (ctest) before installation, ensuring correctness and reliability.
Dual Library Builds: Both shared (
.so) and static (.a) libraries are built and installed, providing flexibility for different use cases.
These optimizations ensure that our XZ installation provides the fastest possible compression and decompression performance for CPU-based applications on Discoverer, while maintaining full compatibility with the standard XZ API.
Compiler support
Warning
For not on we will support only LLVM builds of XZ. No other builds will be officially supported.
Supported builds
Production builds:
module avail xz
Legacy builds (retiring soon, deprecated, do not use)
module avail xz/*/*llvm # LLVM build (this is not default but we do not use the compiler name in the module name)
module avail xz/*/*gcc # GCC build (deprecated, will be retired soon)
module avail xz/*/*intel # Intel oneAPI build (deprecated, will be retired soon)
module avail xz/*/*aocc # AMD AOCC build (deprecated, will be retired soon)
Available libraries
XZ provides the liblzma shared library that is installed by default:
liblzma.so- LZMA compression libraryThis library implements the LZMA (Lempel-Ziv-Markov chain Algorithm) compression algorithm, providing high compression ratios with good performance.
Header file:
lzma.hLink flag:
-llzmapkg-config:
liblzma
Note
The library uses optimized implementations and can be used in both C and C++ applications.
Library variants
The liblzma library is available as both static (.a) and shared (.so) libraries. The Environment Modules automatically configure the appropriate paths for dynamic linking, which is the recommended approach for HPC environments.
- Shared libraries (recommended):
liblzma.sois used by defaultAutomatically configured when loading the module
Recommended for HPC environments
- Static libraries:
liblzma.ais also availableUse only if your application specifically requires static linking
Requires explicit
-staticflag during linking
Linking your application
After loading the xz module, the environment variables are automatically configured. You can link your application using one of the following methods:
Method 1: Using environment variables (recommended)
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -llzma
clang -o myapp myapp.c $CFLAGS $LDFLAGS -llzma
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma
Method 2: Using pkg-config
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c $(pkg-config --cflags --libs liblzma)
clang -o myapp myapp.c $(pkg-config --cflags --libs liblzma)
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp $(pkg-config --cflags --libs liblzma)
clang++ -o myapp myapp.cpp $(pkg-config --cflags --libs liblzma)
Method 3: Manual linking
# Load the module first
module load xz/<version>
# Link against liblzma - C code
gcc -o myapp myapp.c -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
clang -o myapp myapp.c -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
# Link against liblzma - C++ code
g++ -o myapp myapp.cpp -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
clang++ -o myapp myapp.cpp -I$XZ_ROOT/include -L$XZ_ROOT/lib64 -llzma
Static linking (if required):
If your application specifically requires static linking:
# C code
gcc -o myapp myapp.c $CFLAGS $LDFLAGS -llzma -static
clang -o myapp myapp.c $CFLAGS $LDFLAGS -llzma -static
# C++ code
g++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma -static
clang++ -o myapp myapp.cpp $CXXFLAGS $LDFLAGS -llzma -static
Note
The Environment Modules automatically set CFLAGS, CXXFLAGS, and LDFLAGS when you load the module. Using these variables is the recommended approach as they remain correct even if the module path changes.
Replacing the system-wide xz installation
To use the liblzma.so library from our installation instead of relying on the system-wide installation:
module load xz/<version>
./your_program # will automatically use liblzma.so from xz installation
This way your executable will use the xz library from our installation instead of the system-wide one.
Command-line utilities
XZ provides a comprehensive set of command-line utilities for compression, decompression, and working with compressed files. After loading the xz module, these utilities are available in your PATH.
Main compression/decompression tools:
xz- Main compression and decompression toolThe primary utility for compressing and decompressing files in the .xz format. It can also handle .lzma files when invoked under different names (see polymorphism below).
Compresses files to .xz format
Decompresses .xz and .lzma files
Supports various compression levels (0-9)
Supports multi-threading for faster compression
xzdec- Simple decompressorA lightweight decompression-only tool for .xz files. It is smaller than
xzand does not support compression, making it useful for embedded systems or when only decompression is needed.lzmadec- Simple LZMA decompressorA lightweight decompression-only tool specifically for .lzma files. Similar to
xzdecbut for the legacy LZMA format.lzmainfo- LZMA file informationDisplays information about .lzma compressed files, including compression method, uncompressed size, and other metadata.
Convenience tools (symlinks to xz):
Several tools are implemented as symlinks to the main xz binary. The xz program detects which name it was invoked under (using argv[0]) and adjusts its behavior accordingly. This polymorphism allows one binary to provide multiple interfaces:
xzcat->xzDecompresses .xz files to standard output (equivalent to
xz -dc). Useful for piping decompressed data to other commands.lzcat->xzDecompresses .lzma files to standard output. Provides compatibility with legacy LZMA format.
lzma->xzCompresses files to .lzma format (legacy format). When invoked as
lzma, the tool uses the older LZMA format instead of the newer .xz format.unxz->xzDecompresses .xz files (equivalent to
xz -d). Provides an intuitive name for decompression operations.unlzma->xzDecompresses .lzma files. Provides compatibility with legacy LZMA format.
Comparison tools:
xzdiff- Compare compressed filesCompares two compressed files by decompressing them and running
diff. Useful for comparing versions of files stored in compressed format.xzcmp->xzdiffCompares compressed files using
cmpinstead ofdiff. Useful for binary file comparisons.lzdiff->xzdiffCompares .lzma compressed files using
diff.lzcmp->xzdiffCompares .lzma compressed files using
cmp.
Search tools:
xzgrep- Search compressed filesSearches compressed files for patterns using
grep. Decompresses files on-the-fly and searches the content without requiring manual decompression.xzegrep->xzgrepSearches compressed files using
egrep(extended regular expressions).xzfgrep->xzgrepSearches compressed files using
fgrep(fixed strings).lzgrep->xzgrepSearches .lzma compressed files using
grep.lzegrep->xzgrepSearches .lzma compressed files using
egrep.lzfgrep->xzgrepSearches .lzma compressed files using
fgrep.
Viewing tools:
xzless- View compressed files with lessViews compressed files using the
lesspager. Decompresses files on-the-fly for viewing.xzmore- View compressed files with moreViews compressed files using the
morepager. Decompresses files on-the-fly for viewing.lzless->xzlessViews .lzma compressed files using
less.lzmore->xzmoreViews .lzma compressed files using
more.
How polymorphism works:
The XZ utilities use a common Unix pattern called “name-based polymorphism” or “argv[0] polymorphism”. When a program is invoked, the operating system passes the program name as the first argument (argv[0]). The xz binary checks this name to determine its behavior:
If invoked as
xz, it compresses/decompresses .xz filesIf invoked as
lzma, it compresses/decompresses .lzma filesIf invoked as
xzcatorlzcat, it decompresses to stdoutIf invoked as
unxzorunlzma, it forces decompression mode
This design allows:
- Space efficiency: One binary provides multiple tools
- Consistency: All tools share the same core implementation and behavior
- Compatibility: Legacy tool names (like lzma) continue to work
- Flexibility: Users can choose the most intuitive name for their task
All symlinks point to the same xz binary, which adapts its behavior based on how it was invoked. This is why you can use xzcat, lzcat, unxz, or unlzma and they all work correctly despite being the same underlying program.
Example usage:
# Load the module
module load xz/<version>
# Compress a file
xz myfile.txt # Creates myfile.txt.xz
# Decompress a file
unxz myfile.txt.xz # Restores myfile.txt
# or
xzcat myfile.txt.xz # Decompresses to stdout
# Search in compressed files
xzgrep "pattern" *.xz
# View compressed file
xzless archive.xz
# Compare compressed files
xzdiff file1.xz file2.xz
Warning
When processing large files or multiple files, use Slurm batch jobs to execute these utilities on compute nodes rather than login nodes.
Getting help
For additional assistance:
See the Getting help documentation