It is practically impossible to have a single threaded lz77, that can be close to or faster than memcpy.
Originally Posted by lz77
Selkie has the same decompressor as Mermaid. The optimization on the compressor side consist to select lz77 matches having
small (L1/L2 cache) offsets or a large match length. This can lead to a worse compression ratio, depending on the data files.
For the compression ratio, decompression can be considered as very fast in Selkie.
In other words, fast decompression = L1/L2 cache optimization. There is no magic algorithm.
Blosc is also claiming, that it is faster than memcpy.
In their benchmarks they are simply comparing an 8 threads blosc against a single threaded memcpy!
Im my experiments the only compressors than can be faster than memcpy are Run length encoding
and SIMD integer compression.
CPU i5-6300HQ 2.30 GHz. pd3d.tar game data files from RAD Game Tools
Note that all oodle compressors are optimized for game data and can perform poorly on other data types. see this benchmark
C Size ratio% C MB/s D MB/s Name
10778010 33.7 8.00 568 libdeflate 12
11061382 34.6 6.38 319 zlib 9
11463934 35.9 3.52 2428 oodle 119 (Selkie)
14237227 44.6 3.72 3350 lzturbo 19
14279732 44.7 46.72 2616 lz4 9
31952900 100.0 6354.24 6489 memcpy