View Poll Results: I would prefer:

Voters
23. You may not vote on this poll
  • An improved BALZ v1.13 (ROLZ compression)

    4 17.39%
  • An improved BCM v0.02 (BWT compression)

    5 21.74%
  • The brand new CM-based compressor

    10 43.48%
  • The brand new LZ77-based compressor

    1 4.35%
  • Other, not listed

    3 13.04%
Page 1 of 2 12 LastLast
Results 1 to 30 of 47

Thread: Flagship compressor - which one?

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Flagship compressor - which one?

    I want to choose my best compressor and make it my main and the flagship project. Which one of my compressors works the best for you? Poll and/or post your opinion!

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    PIMPLE v1.43 is still my favorite.

  3. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here is my comments about your well-known compressors:

    QUAD
    I really like it. It is fast and efficient. You may focus on optimal parsing for future improvement. I think, without optimal parsing, we cannot make a huge step.

    BCM
    I know, it's your first BWT experience. So, everything can be expected. For me, it's a bit slow. I think, this is about totally BWT worst-case on certain data.

    TC
    It seems efficient. But, not very fast. Works as a PAQ clone. Do not misunderstood me. I don't mean that you cheat some codes from PAQ source. Just works deadly slow like PAQ.

    PIMPLE and PIM
    They are very practical compressor. I like them.

    LZPM
    It's a strong archiver from your collections. For me, it's big brother of QUAD. I wish QUAD had same compression level with it's current speed.

    BALZ
    I really dislike this project since it was started. I never like it. As a note again, it's latest release is very slow and only benefits for highly redundant data. It's impractical for me.

    I think, best thing for you to focus on PIMPLE/PIM or QUAD/LZPM. If you go on QUAD/LZPM side, you should keep the LZ nature (enough speed with accaptable compression level). We don't need highest compression for a LZ based compressor. But, you may add an extra option.

    I hope, these comments could help you at choosing your way. Good luck

  4. #4
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Hmm, tough question... I'd like to see some new pure CM inspired by PIMPLE and TC, and/or a new segmentation filter, this combination could easily be your new flagship
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  5. #5
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    i think the new BCM (BWT compression)
    is a good step in the direction to become a star in the world of compression

    from 001 to 002 it compresses better in less time

    but i dont know: Is there a big unsolved problem on this program?

    Stephan Busch wrotes:

    "fails in compression of a special dataset with very redundant data,
    on which some others fail as well
    (p.ex. BBB by Matt Mahoney and also Florin Ghido's QLFC had problems).

    "means in this case that it has been working for over 24 hours on that
    dataset in sorting stage and didn't come to encoding of that certain block"

    if it is an unsolvable problem for the program,
    may be it would better to leave bcm
    and do further improvements for balz / tc

    or why not try to improve PIMPLE 2.0 ?

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by joerg View Post
    but i dont know: Is there a big unsolved problem on this program?
    It may be fixed by adding an LZP-preprocessor or improving sorting algorithm.

    Quote Originally Posted by joerg View Post
    or why not try to improve PIMPLE 2.0 ?
    Improving PIMPLE means complete rewrite - i.e. brand new CM coder. PIMPLE is too slow. Since then, I explored many new CM-related stuff: new arithmetic encoder; new and efficient fast counters; fast mixers; etc.
    Having said that new CM may be as fast as current BCM at decompression, being MUCH stronger at compression. I noticed that LZ/BWT stuff seriously limits CM in compression.

  7. #7
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts

    a fast open-source context mixing compressor

    I believe that many people are waiting for open-source context mixing compressor (library) with a compression ratio of CCM/CMM/LPAQ and speed of CCM. It can be done by creating a new compressor or improving speed of LPAQ.
    This compressor should take place of widely-used PPMd (e.g. WinRAR, WinZIP, XMLPPM, SCMPPM, XBzip).

    Przemyslaw

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by inikep View Post
    I believe that many people are waiting for open-source context mixing compressor (library) with a compression ratio of CCM/CMM/LPAQ and speed of CCM. It can be done by creating a new compressor or improving speed of LPAQ.
    This compressor should take place of widely-used PPMd (e.g. WinRAR, WinZIP, XMLPPM, SCMPPM, XBzip).

    Przemyslaw
    Yep, PPMd is quite old. These days we have many new things/inventions which may be successfully applied to a modern CM/PPM compressor. The modern, open-source, improved and optimized PPMd should be totally aftercharts...

  9. #9
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by inikep View Post
    I believe that many people are waiting for open-source context mixing compressor (library) with a compression ratio of CCM/CMM/LPAQ and speed of CCM
    I noticed that CCM is not pure CM compressor. It has an switch which turn on LZ layer for extra speed. I think, it's speed comes from here. Else, we cannot reach such that speed (1-3 MB/s on my laptop) with pure CM. toffer was written a pure CM which achieves ~1 MB/s speed. He does lots of speed optimization which can be. Both of them have automatic switch for turn on/off submodels for catching speed. I think, we can made a CM compressor which have a LZ submodel. I know match model does already similar thing. But, I mean totally different thing.

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    What do you mean with "brand new cm/lz"? Writing new compressors? I'm mostly interested in statistical compression.

    @osmanturan: there are more things which can be done. Especially with higher orders.

  11. #11
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by toffer View Post
    there are more things which can be done. Especially with higher orders.
    AFAIK, CCM have order 0-4, 6 contexts, a match model and some spare models. Also, it have well designed filters: x86 and delta transform. Roughly, there is a difference between CCM and CMM: spare models and delta filters. Both of them have x86 transform. So, roughly everything is same.

    Did you notice CCM and CMM have same speed on some files? These kind of files are not compressible well with dictionary based methods. CCM mainly optimized for common files. When compressing a common file, it achieve about ~3 MB/s. I think, this tell us it has a LZ layer.

  12. #12
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I meant when there are only a few symbols under a high order context you are can use more space efficient memory structures, e.g. Store some tree like structure per byte to avoid hashing.

    Are you sure about the ccm details? Christian sometimes said a bit about it, but never that much. High speeds on redundant data don't necessarily inlove LZ, it could be the effect of increasing cache efficiency (only a few context, compared to e.g. precompressed data).

    What is the difference between a LZ submodel and a match model, in your terminilogy? Do you mean something like a bultin LZ preprocessor?

    I've got some ideas to implement these things. You can also implement a "match model" without an external lz buffer, like storing some chains which contain additional hashes for collision detection.

    Without a match model CMM is faster. With better data structures for higher orders, it will be faster too...
    On data where CCM can't use its filters CMM outpreforms it in most cases (but not that drastically). Christian did a really good job.

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by toffer View Post
    What do you mean with "brand new cm/lz"? Writing new compressors?
    Yep, from scratch...

  14. #14
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I would suggest you to do this, with CM of course . But something non-paqish.

  15. #15
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by toffer
    I meant when there are only a few symbols under a high order context you are can use more space efficient memory structures, e.g. Store some tree like structure per byte to avoid hashing.
    Yes, I know. It's surely helps.

    Quote Originally Posted by toffer
    Are you sure about the ccm details? Christian sometimes said a bit about it, but never that much. High speeds on redundant data don't necessarily inlove LZ, it could be the effect of increasing cache efficiency (only a few context, compared to e.g. precompressed data).
    Yes, I'm sure. Because, I have collected these information by reading all of his posts in this forum Note that, in the beginning of CCM, he talked about "LZ layer" couple of times.

    Quote Originally Posted by toffer
    What is the difference between a LZ submodel and a match model, in your terminilogy? Do you mean something like a bultin LZ preprocessor?
    We normally made a match model for predicting next bit. I describe a LZ submodel like that: under a certain conditions (such as long match, small LZ offset) totally switch off the other models. This kind of compressor becomes a LZ literal coder. But, there is a difference for me. We only use LZ layer under a certain conditions. Also, I think collecting statistics from matched phrase can also benefit for CM side. By doing this, we don't break predictions by LZ.

    Quote Originally Posted by toffer
    I've got some ideas to implement these things. You can also implement a "match model" without an external lz buffer, like storing some chains which contain additional hashes for collision detection.
    I'm sure that you always good ideas I want to see running version of them.

    Quote Originally Posted by toffer
    Without a match model CMM is faster. With better data structures for higher orders, it will be faster too...
    Surely a match model slow down a CM coder.

    Quote Originally Posted by toffer
    On data where CCM can't use its filters CMM outpreforms it in most cases (but not that drastically). Christian did a really good job.
    Yes, I meant that before. It's well designed for common files. He always said that, his filters only a few lines. But, detection algorithm is much more.

    Edit: corrected spelling mistakes...
    Last edited by osmanturan; 1st July 2008 at 13:42.

  16. #16
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by toffer View Post
    Are you sure about the ccm details? Christian sometimes said a bit about it, but never that much.
    On old forum I've found ("CMM fast context mixing compressor" thread):
    Quote Originally Posted by Christian
    If I recall correct ccm mixes orders 1-6. It does nibble based hashing with simple collision handling. Additionally, it has a match model which is quite similar to that of lpaq1. Besides, it does have some sparse models and models some other stuff implicitly. Further on, you can implement a mechanism to turn models on/off on the fly to improve speed. You should definitely add some probability mapping and dynamic model mixing since plain "naked" models and fixed weights gently tend to leave a lot of bits behind. Some simple data filters can help, too. As a note, if you want to go for speed you'll have to decide against compression ratio almost always.
    Quote Originally Posted by toffer View Post
    Without a match model CMM is faster. With better data structures for higher orders, it will be faster too...
    On data where CCM can't use its filters CMM outpreforms it in most cases (but not that drastically). Christian did a really good job.
    Can you give us some results (time and ratio)? What is decrease in compression ratio at CMM speed?

  17. #17
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by osmanturan View Post
    We normally made a match model for predicting next bit. I describe a LZ submodel like that: under a certain conditions (such as long match, small LZ offset) totally switch off the other models. This kind of compressor becomes a LZ literal coder. But, there is a difference for me. We only use LZ layer under a certain conditions. Also, I think collecting statistics from matched phrase can also benefit for CM side. By doing this, we don't break predictions by LZ.
    In last days I've tried to join LZP and LZ77 with CM:

    1. I've joined flzp with lpaq8 (match lengths encoded using order 2 arithmetic coder). I've achieved about 10% faster compression, but 10% bigger files (the results are moderate as it depends on minimal match lenght).

    2. I've also joined quicklz with lpaq8. Compression ratio was too bad.

  18. #18
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by inikep View Post
    In last days I've tried to join LZP and LZ77 with CM:

    1. I've joined flzp with lpaq8 (match lengths encoded using order 2 arithmetic coder). I've achieved about 10% faster compression, but 10% bigger files (the results are moderate as it depends on minimal match lenght).

    2. I've also joined quicklz with lpaq8. Compression ratio was too bad.
    This kind of sequential method does not benefit anything (mostly). You cannot reach any good level by doing this. I think, you forgot LZ match offset/length pairs.

  19. #19
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by osmanturan View Post
    This kind of sequential method does not benefit anything (mostly). You cannot reach any good level by doing this. I think, you forgot LZ match offset/length pairs.
    You can get very good result encoding LZ77 match offset/length pairs as RZM does. Of course with very sophisticated (and slow) parsing.

    My idea was to encode long matches (16, 32, ..., and longer) with LZ77. Literals were encoded using lpaq.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by inikep View Post
    You can get very good result encoding LZ77 match offset/length pairs as RZM does. Of course with very sophisticated (and slow) parsing.

    My idea was to encode long matches (16, 32, ..., and longer) with LZ77. Literals were encoded using lpaq.
    Better use a nicely implemented byte wise LZP.

  21. #21
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Code:
    cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ time ./run_cmm4 .cmm4_43 43
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 828527/842468 bytes (7.87 bpc)
      Speed: 192 kB/s (5068.4 ns/byte)
      Time: 4.27 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1184968/3870784 bytes (2.45 bpc)
      Speed: 235 kB/s (4149.0 ns/byte)
      Time: 16.06 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 451105/4067439 bytes (0.89 bpc)
      Speed: 276 kB/s (3525.6 ns/byte)
      Time: 14.34 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 3650013/4526946 bytes (6.45 bpc)
      Speed: 206 kB/s (4738.3 ns/byte)
      Time: 21.45 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 426899/20617071 bytes (0.17 bpc)
      Speed: 293 kB/s (3327.8 ns/byte)
      Time: 68.61 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1596934/3782416 bytes (3.38 bpc)
      Speed: 219 kB/s (4444.2 ns/byte)
      Time: 16.81 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 748289/4168192 bytes (1.44 bpc)
      Speed: 268 kB/s (3632.3 ns/byte)
      Time: 15.14 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 742157/4149414 bytes (1.43 bpc)
      Speed: 270 kB/s (3612.6 ns/byte)
      Time: 14.99 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 511397/4121418 bytes (0.99 bpc)
      Speed: 269 kB/s (3622.5 ns/byte)
      Time: 14.93 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 455580/2988578 bytes (1.22 bpc)
      Speed: 262 kB/s (3717.5 ns/byte)
      Time: 11.11 s
    
    real    3m21.906s
    user    3m19.324s
    sys     0m1.596s
    cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ ls -l /tmp/*.cmm4_43
    -rw-r--r-- 1 cm cm  828527 2008-07-01 13:11 /tmp/A10.jpg.cmm4_43
    -rw-r--r-- 1 cm cm 1184968 2008-07-01 13:11 /tmp/AcroRd32.exe.cmm4_43
    -rw-r--r-- 1 cm cm  451105 2008-07-01 13:11 /tmp/english.dic.cmm4_43
    -rw-r--r-- 1 cm cm 3650013 2008-07-01 13:11 /tmp/FlashMX.pdf.cmm4_43
    -rw-r--r-- 1 cm cm  426899 2008-07-01 13:13 /tmp/FP.LOG.cmm4_43
    -rw-r--r-- 1 cm cm 1596934 2008-07-01 13:13 /tmp/MSO97.DLL.cmm4_43
    -rw-r--r-- 1 cm cm  748289 2008-07-01 13:13 /tmp/ohs.doc.cmm4_43
    -rw-r--r-- 1 cm cm  742157 2008-07-01 13:13 /tmp/rafale.bmp.cmm4_43
    -rw-r--r-- 1 cm cm  511397 2008-07-01 13:14 /tmp/vcfiu.hlp.cmm4_43
    -rw-r--r-- 1 cm cm  455580 2008-07-01 13:14 /tmp/world95.txt.cmm4_43
    cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ time ./run_cmm4 .cmm4_43_no_mm 43
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 828442/842468 bytes (7.87 bpc)
      Speed: 199 kB/s (4902.3 ns/byte)
      Time: 4.13 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1207298/3870784 bytes (2.50 bpc)
      Speed: 253 kB/s (3849.3 ns/byte)
      Time: 14.90 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 463227/4067439 bytes (0.91 bpc)
      Speed: 308 kB/s (3169.1 ns/byte)
      Time: 12.89 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 3660702/4526946 bytes (6.47 bpc)
      Speed: 214 kB/s (4546.1 ns/byte)
      Time: 20.58 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 512532/20617071 bytes (0.20 bpc)
      Speed: 315 kB/s (3095.5 ns/byte)
      Time: 63.82 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1618920/3782416 bytes (3.42 bpc)
      Speed: 235 kB/s (4153.4 ns/byte)
      Time: 15.71 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 783433/4168192 bytes (1.50 bpc)
      Speed: 283 kB/s (3447.5 ns/byte)
      Time: 14.37 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 742851/4149414 bytes (1.43 bpc)
      Speed: 301 kB/s (3239.0 ns/byte)
      Time: 13.44 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 576707/4121418 bytes (1.12 bpc)
      Speed: 295 kB/s (3307.1 ns/byte)
      Time: 13.63 s
    CMM4 v0.1f by C. Mattern  Jul  1 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 118158 kB.
    Encoding: done.
      Ratio: 492872/2988578 bytes (1.32 bpc)
      Speed: 285 kB/s (3419.7 ns/byte)
      Time: 10.22 s
    
    real    3m7.444s
    user    3m5.196s
    sys     0m1.728s
    cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ ls -l /tmp/*.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  828442 2008-07-01 13:18 /tmp/A10.jpg.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm 1207298 2008-07-01 13:18 /tmp/AcroRd32.exe.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  463227 2008-07-01 13:19 /tmp/english.dic.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm 3660702 2008-07-01 13:19 /tmp/FlashMX.pdf.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  512532 2008-07-01 13:20 /tmp/FP.LOG.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm 1618920 2008-07-01 13:20 /tmp/MSO97.DLL.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  783433 2008-07-01 13:20 /tmp/ohs.doc.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  742851 2008-07-01 13:21 /tmp/rafale.bmp.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  576707 2008-07-01 13:21 /tmp/vcfiu.hlp.cmm4_43_no_mm
    -rw-r--r-- 1 cm cm  492872 2008-07-01 13:21 /tmp/world95.txt.cmm4_43_no_mm
    10.595.869 bytes vs 10.886.984 bytes

    real 3m21.906s
    user 3m19.324s
    sys 0m1.596s

    vs

    real 3m7.444s
    user 3m5.196s
    sys 0m1.728s

    Note that CMM would be faster, i only converted out the match finding (most time consuming), since other things would require more work (joined some things in a tricky way ).

  22. #22
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Bytewise LZP is bad here, since the length codes interfer with the original alphabet. That means your context is destroyed right after a match. It would be better to use a string substitution (a byte maps to a unique, frequently appearing string). The mapping could slowly adapt. That way you virtually increase the context, while lowering the file size.

    BTW: cmm1 is a CM/LZP hybrid.

  23. #23
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Tips:


    It is possible to create a assymetric CM?

    Like 192 mb for compression and 32 mb for descompression?

    Only non symmetric compressors have chance in practical compression

    1536 mb for compression and 1500 for descompression it's not a good idea

    Use CM carefully.

    The compressor also filters non redundant data?

    Compression time is not so important.
    Descompression time is very important.
    Last edited by lunaris; 1st July 2008 at 15:23.

  24. #24
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > It is possible to create a assymetric CM?
    > Like 192 mb for compression and 32 mb for descompression?
    Yes, it's possible to make a compressor which have different speed for it's compressing and decompressing stages. But, I'm not sure about memory usage.

    > Only non symetric compressors have chance in practical compression
    I don't think so. Total time is important for me: compressing+transmitting+decompressing. Some asymetric compressors have very time consuming compressing algorithm.

    > 1536 mb for compression anda 1500 for descompression it's not a good idea
    Memory usage surely helps for statistical compressors i.e PPM and CM. But, a well designed statistical compressor is generally sufficient under 512 MB memory usage. Note that, most of people has at least 256 MB RAM. BTW, why nobody uses file mapped memory pointers as a fallback? Windows natively support it.

    > Use CM carefully.
    Surely

    > The compressor also filters non redundant data?
    I think, a universal compressor could take care of different file structures rather than generic filter algorithms.

    > Compression time is not so important.
    > Descompression time is very important.
    You are surely a lover of LZ based compressor

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by lunaris View Post
    Only non symmetric compressors have chance in practical compression
    Not really. If compressor is *FAST* at compression, then the same decompression speed is OK!

  26. #26
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I already presented a basic approach to asymmetric CM... Search this form for M01a

  27. #27
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    > It is possible to create a assymetric CM?
    > Like 192 mb for compression and 32 mb for descompression?
    >Yes, it's possible to make a compressor which have different speed for it's >compressing and decompressing stages. But, I'm not sure about memory >usage.

    I don know about M01 from toffer.But he explains some ideas about asymmetric CM. It's gpl v3 source code.

    > Only non symetric compressors have chance in practical compression
    >I don't think so. Total time is important for me: >compressing+transmitting+decompressing. Some asymetric compressors >have very time consuming compressing algorithm.

    Yes, it's important,but most of projects which uses compression (like package distribution ) does not like agressive compressors

    Even PPM and DMC is not adopted in actual projects.
    The main problem?
    Symmetric algorithms and not so optimized.
    Encode.ru is a very rare forum where people develops such algorithms.



    > 1536 mb for compression anda 1500 for descompression it's not a good idea
    >Memory usage surely helps for statistical compressors i.e PPM and CM. But, a well designed statistical compressor is generally sufficient under 512 MB memory usage. Note that, most of people has at least 256 MB RAM. BTW, why nobody uses file mapped memory pointers as a fallback? Windows natively support it.

    But 512 MB is very high and is very limited .The compressor must offer a lot of options like freearc. Symmetric and Asymmetric with option of memory usage.

    It is possible use BCM to do this.

    > Use CM carefully.
    >Surely


    > The compressor also filters non redundant data?
    >I think, a universal compressor could take care of different file structures >rather than generic filter algorithms.

    Very hard to do this and requires a lot of time.Sometimes is more practical to use some filters..

    > Compression time is not so important.
    > Descompression time is very important.
    >You are surely a lover of LZ based compressor

    No ,it's not, the lz family have a lot of algorithms.
    Look at DMC,PPM CM and SR.
    Last edited by lunaris; 1st July 2008 at 15:50.

  28. #28
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > I don know about M01 from toffer.But he explains some ideas about
    > asymmetric CM. It's gpl v3 source code.
    Yes, it's a good example about asymmetric CM. Also, there is asymmetric binary coder which can be used instead of arithmetic coder. AFAIK, it's a bit slower than actual arithmetic coder nowadays. Still in development...

    > Yes, it's important,but most of projects which uses compression (like
    > package distribution ) does not like agressive compressors
    My laptop is enough fast (Core2 Duo 2.2GHz, 2 GB RAM). But, it's raw writing speed is around 20-25 MB/s. So, practically I don't need a decompressor which have >20 MB/s in decompressing. About at least 1 MB/sec decompressing speed with strong compressing is sufficient at most of time for me.

    > Even PPM and DMC is not adopted in actual projects.
    Did you know that WinRAR use PPM for text based files. Also, WinZip 11 has an option for PPM in ZIP archives.

    > The main problem?
    Lazy developers


    > Symmetric algorithms and not so optimized.
    Symmetric statistical algorithms is easier than asymmetric statistical algorithms. That's the reason why we see lots of symmetric compressor.

    > Encode.ru is a very rare forum where people develops such algorithms.
    Think of about WinRAR and WinZip 11 PPM usage again

    > But 512 MB is very high and is very limited .The compressor must offer a lot
    > of options like freearc. Symmetric and Asymmetric with option of memory
    > usage.
    If I can extract a compressed file under 512 MB RAM usage, there is no problem for me. If I had a 256 MB, I would accept slow decompressing due to file mapped memory at this level. For me 256 MB RAM very insufficient for users on WinXP. I haven't mentioned Vista which approximately needs 1 GB RAM yet.

    > It is possible use BCM to do this.
    BWT's nature it's a bit hard to improve without improving it's sorting algorithm.

    > No ,it's not, the lz family have a lot of algorithms.
    > Look at DMC,PPM CM and SR.
    I knew them already. Thanks anyway.

  29. #29
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by osmanturan View Post
    > I don know about M01 from toffer.But he explains some ideas about
    > asymmetric CM. It's gpl v3 source code.
    Yes, it's a good example about asymmetric CM. Also, there is asymmetric binary coder which can be used instead of arithmetic coder. AFAIK, it's a bit slower than actual arithmetic coder nowadays. Still in development...

    > Yes, it's important,but most of projects which uses compression (like
    > package distribution ) does not like agressive compressors
    My laptop is enough fast (Core2 Duo 2.2GHz, 2 GB RAM). But, it's raw writing speed is around 20-25 MB/s. So, practically I don't need a decompressor which have >20 MB/s in decompressing. About at least 1 MB/sec decompressing speed with strong compressing is sufficient at most of time for me.

    > Even PPM and DMC is not adopted in actual projects.
    Did you know that WinRAR use PPM for text based files. Also, WinZip 11 has an option for PPM in ZIP archives.

    > The main problem?
    Lazy developers


    > Symmetric algorithms and not so optimized.
    Symmetric statistical algorithms is easier than asymmetric statistical algorithms. That's the reason why we see lots of symmetric compressor.

    > Encode.ru is a very rare forum where people develops such algorithms.
    Think of about WinRAR and WinZip 11 PPM usage again

    > But 512 MB is very high and is very limited .The compressor must offer a lot
    > of options like freearc. Symmetric and Asymmetric with option of memory
    > usage.
    If I can extract a compressed file under 512 MB RAM usage, there is no problem for me. If I had a 256 MB, I would accept slow decompressing due to file mapped memory at this level. For me 256 MB RAM very insufficient for users on WinXP. I haven't mentioned Vista which approximately needs 1 GB RAM yet.

    > It is possible use BCM to do this.
    BWT's nature it's a bit hard to improve without improving it's sorting algorithm.

    > No ,it's not, the lz family have a lot of algorithms.
    > Look at DMC,PPM CM and SR.
    I knew them already. Thanks anyway.
    Well ,maybe you are right.Actually,there are a lot of Asymmetric compressors which people can choice like 7zip,winzip,winrar,freearc.

    Maybe encode can focus in high compression.Specially for large packages and a lot of mixed data.

    Although , the compressor probably stay restricted for compression lovers.

    But remember ,use CM carefully.
    Last edited by lunaris; 1st July 2008 at 16:36.

  30. #30
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by encode View Post
    Better use a nicely implemented byte wise LZP.
    Already done. As I wrote several posts before:
    I've joined flzp with lpaq8 (match lengths encoded using order 2 arithmetic coder). I've achieved about 10% faster compression, but 10% bigger files (the results are moderate as it depends on minimal match lenght).

    Quote Originally Posted by toffer
    10.595.869 bytes vs 10.886.984 bytes (3% worse)
    201.906s vs 187.444s (7% faster)
    It 's not so good. I've received similar or better results disabling 2 APM stages in lpaq.


    Quote Originally Posted by encode
    If compressor is *FAST* at compression, then the same decompression speed is OK!
    I agree. There is no need to think about asymmetric compression if you can improve speed of symmetric compression/decompression (with the same compression ratio).

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •