I'm proud to announce the release of a new version of my compressor BCE
While BCE v0.2 and v0.3 were mostly research compressors, I've spend
some time to make BCE v0.4 a more reasonable compressor.
The main differences to BCE v0.3 are:
- Improved performance (mostly by reducing cache misses)
- Improved stability (tested against Calgary, Silesia and enwik8/9)
- Improved compression ratio using an adaptive coder rather than the old uniform (still available in the code)
- Multiple coders allow for up to 8 threads at the encoding stage synchronizing after each order (OpenMP)
- libdivsufsort instead of sais (times are now strongly influenced by bwt/unbwt)
Memory for compression/decompression is 5N but I've made available the oldCode:File / Corpus | Compressed size | Comp. Time (s) | Decomp. Time(s) | Mem (KB) | v0.3 Times enwik8 | 20.926.428 | 20.8 | 22.7 | 422.668 | 100 / 188 enwik8.drt | 20.722.282 | 12.2 | 13.5 | 274.208 | enwik9 | 164.648.620 | 253.6 | 275.6 | 3.394.480 | 1.151 / 2.444 enwik9.drt | 164.264.278 | 170.2 | 180.2 | 1.941.336 | text8 | 19.603.636 | 20.8 | 22.7 | 430.132 | Calgary | 835.876 | ~9.8 | 10.5 | - | Silesia | 47.576.935 | 74.9 | 92.6 | - | *.drt times don't include time for DRT Machine: Core i7-4770K, 8 GB DDR3, Samsung 840Pro 128 GB, Fedora 22 64 bit in VBox (Host: Win 10, 6 GB RAM available), gcc 5.3.1
slow bitwise unbwt (option: -ds) for low memory PCs. The memory this needs is
given as Mem (KB). It equals the memory the coding stage uses.
I've made the code available at https://github.com/akamiru/bce.
Maybe someone can help me build an Windows executable. Feel free
to contact me if you find any bug/improvements/problems.
The original thread about BCE can be found here:
Thanks for your attention,
@Matt Mahoney: I would be very thankful if you could update LTCB and Silesia ! If you have memory problems
encoding enwik9 I could upload an compressed archive so maybe you could decode it with the -ds option.