xnLZ is a multi-threaded lzma (de)compressor heavily based on plzip, of which it is fork of and NOT compatible with anymore.
Its main purpose is to offer alternative to existing 4x4:lzma chain(also known as xlzma) in FreeArc(or other archivers that
can utilize external compressors), but with 64bit support and same cool features that made 4x4 so great. Although xnLZ was
intended to be used for <stdio> mode with FreeArc, it can also work on its own with files, or can be chained with other tools
that support external compressors.
Here are the main features:
- xnLZ utilize entropy check that scan incoming data blocks during compression and can copy them directly if they are not
compresseable enough. All this can be defined by user. This means some data that do not benefit from typical lzma compression
can bypass compress routine completely, speeding whole process. On combined data sets this could usually be anywhere
between 50-300% speedup and decompression speed will benefit as well.
- Data blocks that are copied directly still have their CRC attached and are verified during test or decompression.
- xnLZ was designed to utilize all CPU cores during decompression through <stdio> for maximum speed. This feature actually
existed(as of time of fork at least) in original plzip in complete infrastructure, all that was needed was to increase
buffer/packet limit, which was artificially hardcoded to low value. Now user can directly define how much buffer to allocate,
but it comes at the cost of utilizing more memory during decompression. As example, to fully utilize 4 core CPU with data
blocks of 256MB, ~1300MB-2GB may be needed during decompression, depending on settings. In my tests I was able to load my
CPU fully with slots value 128 on data blocks of 256MB size, which used only around ~1GB RAM.
- xnLZ use hardcoded lzma:lc(literal context) parameter set to 8 instead of default 3 for maximum compression.
- xnLZ expose additional lzma:mc(matchfinder cycles count) option for the user to set, which tend to have big impact on compression speed
- xnLZ never delete original file(s) and went through other changes regarding settings and behavior
xnLZ utilize same block structure format as plzip/lzip, but header use different ID and contain additional byte to control
whether data were compressed or not.
Thanks to original plzip/lzlib authors, to Matt Mahoney for entropy code and Bulat Ziganshin for assistance.
Quick random benchmark on combined data folder with lot of zip files, few big exes and dlls:
Note although I cannot confirm it yet, its possible that slower decompression with disabled entropy against 7z may be due to lc8 parameter of lzma1. I got
xlzma(4x4:lzma) & xnlz = d64m, b64m, fb32, mc32, E99% <- both use lzma1:lc8
7z 16.04 = 0=lzma2:d=64m:fb=32:lc=4:mc=32 <- use lzma2
orig data: 3.24g
xlzma(4x4)- c 1:15m, d 5s, 2.76g
xnlz - c 1:40m, d 13s, 2.74g
xnlz:E98 - c 1:23m, d 11s, 2.76g
7z - c 8:12m, d 26s, 2.70g
xnlz:E100 - c 6:15m, d 49s, 2.71g
this mail from original plzip author when I asked him back then why he did not used bigger lc in his software:
"The reason to not use lc = 8 is that it requires 768 KiB of fast RAM vs the 24 KiB required by lc = 3. This would make decompression slow on machines with, say, 64 KiB of cache."
I may need to verify if this is the case here or just lzma/xnlz being slower to lzma2 in general. Also, xlzma was lc8 as well and did not suffer.
EDIT: here is the result on uncompressed data:
orig file: 660.3mb
7z: c 1:15m, d 12s, 270.19mb
xnlz: c 1:17m, d 7s, 270.85mb
The reason why xnlz was slower during decompression in first test when entropy skipping was disabled and despite using all cores was because archive contained compressed data(zip files) and entropy skipping was disabled. Lzma2 was quicker because it does better job handling compressed data than lzma1. However, first it is no match on whole block entropy skipping in terms of speed when compared to xlzma or xnlz, and second here it was finally slower with inflated data decompression.