Results 1 to 13 of 13

Thread: Best algorithms ranking

  1. #1
    Member
    Join Date
    Jun 2019
    Location
    Poland
    Posts
    24
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Best algorithms ranking

    Is needed ranking, in contrast to
    http://prize.hutter1.net/#contestants
    should have only free open source programs and fast (up to 3x slower than 7zip for example)
    What programs would be on top these list?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Slower than 7-zip doing what?
    LZMA encoding is pretty slow (1-2MB/s), and 7-zip includes ppmd too, so based on encoding speed, most BWT/CM/PPM compressors would fit.

    I guess you can search http://www.mattmahoney.net/dc/text.html for your criteria.

    In any case, many of the best codecs (nanozip, rz and rest of Christian's codecs, ppmonstr) are not open-source.
    But it doesn't mean that there's nothing to learn from them.

  3. #3
    Member
    Join Date
    Jun 2019
    Location
    Poland
    Posts
    24
    Thanks
    0
    Thanked 0 Times in 0 Posts
    drt|lpaq9m is quite fast. All most compressing archivers uses neural networks?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    NN are not especially good for compression, and there're very few of NN-based compressors
    (basically NNCP and lstmcompress, and a couple of others that include lstmcompress as submodel).
    Its usually plain contextual statistics... the trick is actually not in mathematics, but rather in programming of it -
    there's a trade-off of precision/memory/speed, and its pretty hard to balance it out.

  5. #5
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    To add to Shelwien's answer:

    If speed is a concern, the fastest you are going to get at the compression levels of nanozip would be MCM 0.83.
    http://encode.ru/threads/2127-MCM-LZ...ll=1#post43220
    http://www.mattmahoney.net/dc/text.html#1449

    Beyond that, you are stuck with significant speed drops for compressors like paq, NNCCP, etc.

  6. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by AndrzejB View Post
    Is needed ranking, in contrast to
    http://prize.hutter1.net/#contestants
    should have only free open source programs and fast (up to 3x slower than 7zip for example)
    What programs would be on top these list?
    The two best practical open source compressors are brotli and zstd. Brotli for tasks that benefit from density or compression speed, and zstd for faster decompression speed.

  7. #7
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Make your own benchmarks with your own data using Turbobench Compression Benchmark
    including lz77, bwt, context-mixing (zpag) algorithms.

  8. The Following User Says Thank You to dnd For This Useful Post:

    Jyrki Alakuijala (13th July 2019)

  9. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Isn't zpaq around 50x slower to decode than lzma? The original poster asked no more than around 3x.

  10. #9
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    The two best practical open source compressors are brotli and zstd. Brotli for tasks that benefit from density or compression speed, and zstd for faster decompression speed.
    Define "practical". Zstd and brotli are the most popular new open source compressors and are backed by big companies. However, it is not clear they are the best solutions for the Original Poster (x3 times slower than 7zip at compressing ? decompressing ? what kind of data?). I find Brotli's compression very slow at high compression ratios.


    Code:
    Silesia
    i7-7700K @4.20GHz, 32GB RAM, Ubuntu 18.04
    
    |        Compressor           | Encoding (sec)  | Decoding (sec)  |    Size          |
    |-----------------------------|-----------------|-----------------|------------------|
    |Original                     |                 |                 |   211,938,580    |    
    |Gzip 1.6                     |        6.0      |       1.0       |    68,227,965    |        
    |Gzip 1.6    -9               |       14.3      |       1.0       |    67,631,990    |        
    |Zstd 1.3.3 -13               |       11.9      |       0.3       |    58,789,182    |
    |Brotli 1.0.5 -9              |       94.3      |       1.4       |    56,289,305    |
    |Lzma 5.2.2 -3                |       24.3      |       2.4       |    55,743,540    |
    |Bzip2 1.0.6 -9               |       14.1      |       4.8       |    54,506,769    |
    |Zstd 1.3.3 -19               |       45.2      |       0.4       |    53,977,895    |
    |Lzma 5.2.2 -9                |       65.0      |       2.4       |    48,780,457    |
    For me MCM 0.83 and BSC are the codecs to beat (although MCM may be too slow for the OP) and zstd has the best decompression speed vs ratio.

  11. #10
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by hexagone View Post
    I find Brotli's compression very slow at high compression ratios.
    If you compile brotli with release/optimized settings, then you get better performance. Usually you would see a difference of 3x from that.

    Quote Originally Posted by hexagone View Post
    Code:
    |Zstd 1.3.3 -13               |       11.9      |       0.3       |    58,789,182    |
    |Brotli 1.0.5 -9              |       94.3      |       1.4       |    56,289,305    |
    If I run them at quality -10 and -5 on silesia, I get:

    Code:
    |Zstd 1.4.1 -10               |       12.0      |       0.390       |    59,568,473    |
    |Brotli 1.0.7 -5              |       10.6      |       0.739       |    59,611,601    |
    Zstd's low q matcher seems to be tuned better for longer data like silesia average file size of ~20 MB -- my guesswork is longer hashes for longer data (something we should do for brotli's encoder, too). Brotli's inherent compression density/speed advantage (which also slows down the decoding) is more clear when using brotli's slow matcher (quality 10 or 11) or on shorter data. Brotli:11 gives 49383136 bytes vs. vs zstd -22 --ultra gives 52748892 bytes, ~7 % more bytes for zstd (12 minutes vs 2.5 minutes of compression time).

    I couldn't figure out how to choose window size for zstd (like brotli's -w and --large_window parameters). What is the correct way to do that?
    Last edited by Jyrki Alakuijala; 14th July 2019 at 18:36.

  12. #11
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    my compressor Orz is now competitive with zstd/brotli. especially for text compression (like enwik8 ), orz is compressing 10x faster than zstd/brotli while getting same compression ratio

  13. The Following 2 Users Say Thank You to RichSelian For This Useful Post:

    Jarek (18th July 2019),Stephan Busch (15th July 2019)

  14. #12
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by RichSelian View Post
    my compressor Orz is now competitive with zstd/brotli. especially for text compression (like enwik8 ), orz is compressing 10x faster than zstd/brotli while getting same compression ratio
    Could it be that you are comparing compressors with different backward reference window size?

  15. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by RichSelian View Post
    my compressor Orz is now competitive with zstd/brotli. especially for text compression (like enwik8 ), orz is compressing 10x faster than zstd/brotli while getting same compression ratio
    Congratulations!
    I see you use Huffman - have maybe tried ANS? It would not only allow you to get a bit better compression ratio (especially o1), but e.g. James' implementations are also faster: https://github.com/jkbonfield/rans_static

Similar Threads

  1. How parallelizable are PAQ algorithms ?
    By boxerab in forum Data Compression
    Replies: 6
    Last Post: 19th May 2017, 18:20
  2. Sorting Algorithms Benchmark
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 39
    Last Post: 1st June 2015, 09:38
  3. LIBSSC : SHARC's algorithms unleashed
    By gpnuma in forum Data Compression
    Replies: 131
    Last Post: 18th April 2014, 21:19
  4. identity of obscure algorithms
    By asmodean in forum Data Compression
    Replies: 2
    Last Post: 6th August 2009, 08:50
  5. Symbol ranking compression
    By Matt Mahoney in forum Forum Archive
    Replies: 21
    Last Post: 29th October 2007, 16:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •