Results 1 to 16 of 16

Thread: Silesia Open Source Compression Benchmark

  1. #1
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts

    Silesia Open Source Compression Benchmark

    Silesia Open Source Compression Benchmark
    deserves a separate topic because it's a different entity.

    Looks like the rows are sorted by the first column, though it's not explicitely stated yet.
    If it's true, why is xwrt two rows higher than it must be?

    What are the criteria for options selection? For example, why was lpaq9m tested with -6 and not with -8 or -9?

    Also, why not to suggest downloading silesia.7z or silesia.zpaq ? 40.5 Mb is way better than 67.6.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    xwrt was in the wrong place. Thanks for noticing.

    I added lpaq9m 9. Usually I choose options for best compression if I know them. Anyone is welcome to suggest improvements. For example, there are probably better options for freearc, xwrt, and ppmvc than the ones I used, but I'm not going to do an exhaustive search because there are a lot of them.

    Edit: posted silesia.zpaq (47 MB, compressed with -m2). Decompression time is 83 sec on a 2 GHz T3200. Decompression of silesia.zip is 4 sec. So I guess it depends on your download speed. Mine is 7 Mb/s, so zip is faster.
    Last edited by Matt Mahoney; 12th April 2012 at 22:26.

  3. #3
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Can you please explain the goals of this benchmark?
    Replacing "calgary" and "canterbury" to show comparative results in scientific papers?
    If so - ok. Otherway - without time and resource usage - it is not even curios, i think.

  4. #4
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Edit: posted silesia.zpaq (47 MB, compressed with -m2). Decompression time is 83 sec on a 2 GHz T3200.
    silesia.7z is only 2% bigger, but decompression is ~10 times faster most likely. Why not to post silesia.7z too?

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    49,729,056 silesia.7z, 131 sec, 11 sec. But I am personally fond of zpaq I figure when I develop some better compression algorithms, I can replace the archive with a smaller one. I can't do that with 7zip.

    But anyway, the goal is to encourage research into better compression algorithms, not necessarily faster. Sort of like http://www.maximumcompression.com/data/summary_sf.php and http://mailcom.com/challenge/ which don't consider speed.

    Nobody would use paq8px_v69 to back up their files. It is too slow. But I still find it very useful for two things. First, when I am developing an algorithm for some new data type and I want to see what I might reasonably achieve. For example, in the Pistoia SequenceSqueeze contest, I ran tests with paq8px_v69 and then developed custom algorithms that were hundreds of times faster and still beat it. If I didn't beat it or come close, then I knew there was a problem with my algorithm.

    Second, many of the techniques can be applied usefully in practical compressors like lpaq9m or zpaq, which would not exist without this earlier research in extreme compression. This research wouldn't have happened if every benchmark considered speed. It's for the second reason that I only include open source programs.

  6. #6
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Some of ppmd entries have pmd as program name, and some have ppmd.

    Quote Originally Posted by Matt Mahoney View Post
    But I still find it very useful for two things.
    Also there are applications aside from data compression. For example, if you wish to see if the sequence looks really random you better use model mixing and paq8, not RLE/LZ/BWT/PPM/etc

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    For example, if you wish to see if the sequence looks really random you better use model mixing and paq8, not RLE/LZ/BWT/PPM/etc
    I think the only thing that really distiguishes PAQ from other methods is the use of sparse models. Sparse models are useful for structured data but I think that usually it would be way more efficient (in terms of compression ratio) to just make a compressor that is aware of that structure instead of adding more models to generalized PAQ-like compressor. And using super heavy weight versions of PAQ to guess some of the structure should be the last resort. I've read that context mixing isn't infinitely scalable and after some threshold, if you add another model, compression becomes worse. So a more "intelligent" tool for structure recognition is needed.
    Last edited by Piotr Tarsa; 15th April 2012 at 12:51.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Fixed "pmd" and added uc2 (4 compression levels).

  9. #9
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Since Precomp is open source now, here are results for the newest version, v0.4.4:

    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
     -----------------------------------------------------------------------------------------------
     52612677  2799 16598 2441  1812 2860  2802 1246  3969 4941  8647  4051  440 precomp v0.4.4
     52612677  2799 16598 2441  1812 2860  2802 1246  3969 4941  8647  4051  440 precomp v0.4.4 -intense
     36603712  2094 10233 2181  1251 1765  2204  956  2352 3899  5667  3669  326 precomp v0.4.4 -cn | zpaq 7.05 -method 7
    
    211938580 10192 51220 9970 33553 6152 10085 6627 21606 7251 41458  8474 5345 Uncompressed
    230825350 10192 58181 9970 33553 6152 10085 6627 33532 7251 41458  8474 5345 precomp v0.4.4 -cn
     54506769  2799 17914 2441  1812 2862  2802 1246  4549 4940  8644  4051  441 bzip2
     38995519  2079 12019 2176  1246 1754  2204  946  3043 3898  5646  3656  323 zpaq 6.21 -method 7
    The first two lines show results for Precomp's built-in bZip2 compression. "-intense" doesn't do anything, so all streams have proper headers and are detected in normal mode.

    Last four lines are for reference. "-cn" is "decompression only", allowing other compressors to proceed. Only mozilla and samba testsets contain recompressible streams.

    I don't have enough memory for "precomp v0.4.4 -cn | cmix v8", so maybe someone else could try?
    Last edited by schnaader; 18th January 2016 at 16:58.
    http://schnaader.info
    Damn kids. They're all alike.

  10. The Following 2 Users Say Thank You to schnaader For This Useful Post:

    Dimitri (18th January 2016),Matt Mahoney (19th January 2016)

  11. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I updated the Silesia benchmark. BTW zpaq 7.05 -method 7 is the same as -method 5.

  12. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    schnaader (21st January 2016)

  13. #11
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Finally, got a result for Precomp + paq8l. First place

    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
     -----------------------------------------------------------------------------------------------
     32540467  2009  8397 2094   964 1440  2084  812  1886 3774  5191  3609  275 precomp v0.4.4 -cn | paq8l -8
    
     33307593  1893  9565 2020   838 1338  2020  770  2591 3767  4689  3568  244 cmix v8
    This beats the cmix result and gives an upper limit for "precomp | cmix" result which should be better than 31.430.000 bytes.
    http://schnaader.info
    Damn kids. They're all alike.

  14. The Following 3 Users Say Thank You to schnaader For This Useful Post:

    byronknoll (21st January 2016),hexagone (21st January 2016),Matt Mahoney (22nd January 2016)

  15. #12
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    223
    Thanks
    106
    Thanked 102 Times in 63 Posts
    Nice results! I will try running "precomp | cmix" soon.

  16. #13
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    223
    Thanks
    106
    Thanked 102 Times in 63 Posts
    Here are the results for "precomp v0.4.4 -cn | cmix v8"

    dicke: 1893390
    mozil: 7577697
    mr: 2021025
    nci: 838417
    ooff: 1338298
    osdb: 2020774
    reym: 770398
    samba: 1702182
    sao: 3767215
    webst: 4688993
    x-ray: 3568238
    xml: 244307


    total: 30430934

  17. The Following 3 Users Say Thank You to byronknoll For This Useful Post:

    hexagone (25th January 2016),Matt Mahoney (27th January 2016),schnaader (24th January 2016)

  18. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    precomp | cmix takes #1 in the Silesia corpus. http://mattmahoney.net/dc/silesia.html
    All of the gains from precomp come from mozilla and samba. mozilla is 21% smaller. samba is 34% smaller.

  19. #15
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    216
    Thanks
    97
    Thanked 128 Times in 92 Posts
    Results (all verified) for m1x2_0.6_100206, source code is available for free download https://sites.google.com/site/toffer86/m1-project .
    Memory usage 8 and 9 gives to me the following error (I have 8 GB RAM):
    terminate called after throwing an instance of 'std::bad_alloc'
    what(): std::bad_alloc

    optimize.zip: there is a directory for every Silesia file, every directory contains:
    - log.txt : stores the optimization progress log.
    - pop.txt : stores the state of the genetic optimizer.
    - best.txt: stores the state of the hill climbing algorithm.
    - opt.txt : stores the best parameter profile found during optimization.

    Usage example:
    m1 7 dickens\best.txt Silesia\dickens dickens.m1
    m1 d dickens\best.txt dickens.m1 dickens.m1.d
    Code:
    Memory usage 0 for all files    : 46.966.067
    Memory usage 1 for all files    : 46.003.431
    Memory usage 2 for all files    : 45.314.209
    Memory usage 3 for all files    : 44.870.039
    Memory usage 4 for all files    : 44.621.035
    Memory usage 5 for all files    : 44.497.589
    Memory usage 6 for all files    : 44.435.643
    Memory usage 7 for all files    : 44.405.349
    Best memory usage for every file: 44.405.257
    To compress and decompress a file, m1 needs a parameters file of size 5082 bytes.
    
    Sorted by name+memory usage         Sorted by name+size
         2.315.965 dickens-0.m1      2.225.459 dickens-7.m1
         2.265.013 dickens-1.m1      2.225.577 dickens-6.m1
         2.239.685 dickens-2.m1      2.225.843 dickens-5.m1
         2.229.567 dickens-3.m1      2.226.621 dickens-4.m1
         2.226.621 dickens-4.m1      2.229.567 dickens-3.m1
         2.225.843 dickens-5.m1      2.239.685 dickens-2.m1
         2.225.577 dickens-6.m1      2.265.013 dickens-1.m1
         2.225.459 dickens-7.m1      2.315.965 dickens-0.m1
    
        15.581.836 mozilla-0.m1     14.214.080 mozilla-7.m1
        15.152.122 mozilla-1.m1     14.240.922 mozilla-6.m1
        14.801.740 mozilla-2.m1     14.292.516 mozilla-5.m1
        14.546.038 mozilla-3.m1     14.382.650 mozilla-4.m1
        14.382.650 mozilla-4.m1     14.546.038 mozilla-3.m1
        14.292.516 mozilla-5.m1     14.801.740 mozilla-2.m1
        14.240.922 mozilla-6.m1     15.152.122 mozilla-1.m1
        14.214.080 mozilla-7.m1     15.581.836 mozilla-0.m1
    
         2.251.900 mr-0.m1           2.223.340 mr-7.m1
         2.237.652 mr-1.m1           2.223.390 mr-6.m1
         2.229.354 mr-2.m1           2.223.514 mr-5.m1
         2.225.144 mr-3.m1           2.223.858 mr-4.m1
         2.223.858 mr-4.m1           2.225.144 mr-3.m1
         2.223.514 mr-5.m1           2.229.354 mr-2.m1
         2.223.390 mr-6.m1           2.237.652 mr-1.m1
         2.223.340 mr-7.m1           2.251.900 mr-0.m1
    
         1.670.457 nci-0.m1          1.667.129 nci-7.m1
         1.668.325 nci-1.m1          1.667.129 nci-6.m1
         1.667.677 nci-2.m1          1.667.157 nci-5.m1
         1.667.259 nci-3.m1          1.667.223 nci-4.m1
         1.667.223 nci-4.m1          1.667.259 nci-3.m1
         1.667.157 nci-5.m1          1.667.677 nci-2.m1
         1.667.129 nci-6.m1          1.668.325 nci-1.m1
         1.667.129 nci-7.m1          1.670.457 nci-0.m1
    
         2.389.862 ooffice-0.m1      2.334.096 ooffice-7.m1
         2.363.850 ooffice-1.m1      2.334.392 ooffice-6.m1
         2.348.110 ooffice-2.m1      2.335.174 ooffice-5.m1
         2.340.200 ooffice-3.m1      2.336.746 ooffice-4.m1
         2.336.746 ooffice-4.m1      2.340.200 ooffice-3.m1
         2.335.174 ooffice-5.m1      2.348.110 ooffice-2.m1
         2.334.392 ooffice-6.m1      2.363.850 ooffice-1.m1
         2.334.096 ooffice-7.m1      2.389.862 ooffice-0.m1
    
         2.441.153 osdb-0.m1         2.368.383 osdb-7.m1
         2.409.683 osdb-1.m1         2.368.575 osdb-6.m1
         2.387.327 osdb-2.m1         2.368.909 osdb-5.m1
         2.374.795 osdb-3.m1         2.370.283 osdb-4.m1
         2.370.283 osdb-4.m1         2.374.795 osdb-3.m1
         2.368.909 osdb-5.m1         2.387.327 osdb-2.m1
         2.368.575 osdb-6.m1         2.409.683 osdb-1.m1
         2.368.383 osdb-7.m1         2.441.153 osdb-0.m1
    
         1.042.280 reymont-0.m1      1.031.526 reymont-7.m1
         1.034.814 reymont-1.m1      1.031.530 reymont-6.m1
         1.032.398 reymont-2.m1      1.031.568 reymont-5.m1
         1.031.788 reymont-3.m1      1.031.642 reymont-4.m1
         1.031.642 reymont-4.m1      1.031.788 reymont-3.m1
         1.031.568 reymont-5.m1      1.032.398 reymont-2.m1
         1.031.530 reymont-6.m1      1.034.814 reymont-1.m1
         1.031.526 reymont-7.m1      1.042.280 reymont-0.m1
    
         3.841.213 samba-0.m1        3.591.457 samba-7.m1
         3.729.807 samba-1.m1        3.593.033 samba-6.m1
         3.658.779 samba-2.m1        3.596.823 samba-5.m1
         3.622.355 samba-3.m1        3.605.539 samba-4.m1
         3.605.539 samba-4.m1        3.622.355 samba-3.m1
         3.596.823 samba-5.m1        3.658.779 samba-2.m1
         3.593.033 samba-6.m1        3.729.807 samba-1.m1
         3.591.457 samba-7.m1        3.841.213 samba-0.m1
    
         4.479.830 sao-0.m1          4.458.168 sao-7.m1
         4.464.462 sao-1.m1          4.458.180 sao-6.m1
         4.459.212 sao-2.m1          4.458.200 sao-5.m1
         4.458.248 sao-3.m1          4.458.200 sao-4.m1
         4.458.200 sao-4.m1          4.458.248 sao-3.m1
         4.458.200 sao-5.m1          4.459.212 sao-2.m1
         4.458.180 sao-6.m1          4.464.462 sao-1.m1
         4.458.168 sao-7.m1          4.479.830 sao-0.m1
    
         6.784.527 webster-0.m1      6.173.049 webster-7.m1
         6.531.025 webster-1.m1      6.174.331 webster-6.m1
         6.358.995 webster-2.m1      6.179.153 webster-5.m1
         6.252.563 webster-3.m1      6.198.745 webster-4.m1
         6.198.745 webster-4.m1      6.252.563 webster-3.m1
         6.179.153 webster-5.m1      6.358.995 webster-2.m1
         6.174.331 webster-6.m1      6.531.025 webster-1.m1
         6.173.049 webster-7.m1      6.784.527 webster-0.m1
    
         3.744.982 x-ray-0.m1        3.705.814 x-ray-6.m1
         3.731.328 x-ray-1.m1        3.705.906 x-ray-7.m1
         3.717.462 x-ray-2.m1        3.705.936 x-ray-5.m1
         3.709.136 x-ray-3.m1        3.706.676 x-ray-4.m1
         3.706.676 x-ray-4.m1        3.709.136 x-ray-3.m1
         3.705.936 x-ray-5.m1        3.717.462 x-ray-2.m1
         3.705.814 x-ray-6.m1        3.731.328 x-ray-1.m1
         3.705.906 x-ray-7.m1        3.744.982 x-ray-0.m1
    
           422.062 xml-0.m1            412.756 xml-7.m1
           415.350 xml-1.m1            412.770 xml-6.m1
           413.470 xml-2.m1            412.796 xml-5.m1
           412.946 xml-3.m1            412.852 xml-4.m1
           412.852 xml-4.m1            412.946 xml-3.m1
           412.796 xml-5.m1            413.470 xml-2.m1
           412.770 xml-6.m1            415.350 xml-1.m1
           412.756 xml-7.m1            422.062 xml-0.m1
    Attached Files Attached Files
    Last edited by Mauro Vezzosi; 15th June 2016 at 00:26. Reason: Added some blank lines

  20. #16
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    216
    Thanks
    97
    Thanked 128 Times in 92 Posts
    I found 2 compressors with source files and not present in LTCB and SOSCB (Silesia O.S.C.B.).

    CTS - Context Tree Switching (2011): http://jveness.info/software/default.html
    v1.0 CTS source and Windows binaries: http://jveness.info/software/cts-v1.zip
    Technical description: http://arxiv.org/abs/1111.3182

    SkipCTS - Skip Context Tree Switching: https://github.com/mgbellemare/SkipCTS
    In http://jveness.info/publications/default.html there are 2 links to pubblication/conference papers: http://jveness.info/publications/icm...%20skipcts.pdf , http://dblp.uni-trier.de/rec/bibtex/.../BellemareVT14

Similar Threads

  1. Silesia compression corpus
    By encode in forum Data Compression
    Replies: 29
    Last Post: 8th June 2012, 10:53
  2. Fast, portable, open source LZH?
    By m^2 in forum Data Compression
    Replies: 25
    Last Post: 24th March 2012, 16:00
  3. Open source JPEG compressors
    By inikep in forum Data Compression
    Replies: 8
    Last Post: 22nd October 2011, 00:16
  4. Non open source Data compression Tools
    By ehsansad in forum Data Compression
    Replies: 9
    Last Post: 22nd September 2011, 00:41
  5. PeaZip - open source archiver
    By squxe in forum Data Compression
    Replies: 1
    Last Post: 3rd December 2009, 22:01

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •