Page 1 of 2 12 LastLast
Results 1 to 30 of 50

Thread: CMM4

  1. #1
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts

    CMM4

    Hi!

    Despite of all other stuff i tried to improve CMM4 and it hardly refuses to improve compression (i tried almost every idea on my todo list...) - at least i could improve it a bit along with some speed optimizations. Memory requirements dropped a bit due to better SSE quantization. I compiled it using gcc-4.3.0 alpha (mingw): "May produce incorrect code". I verified (de)compression on my testset, but it's experimental anyway.

    http://freenet-homepage.de/toffer_86...0527-gcc4.3.7z

    Have fun!

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Thank you!

    Do you plan to release it as an open source project?

  3. #3
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts

    Thumbs up Hi Toffer

    I have tested your M01 along with Ilia's FCM2P3 and it scores at the top 20 deflate-class compressors (near LHARK 0.4d). I'll put the results online later this day.. The newest CMM4 is on my ToDo-List. Is that avatar an image of you which I may use for my Hall of fame (along with the birthday)?

    Yours,

    Stephan

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Chris!

    Mirror: Download

  5. #5
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Oh, sorry Stephan. I completely forgot it - of course you can use the picture & birthday.

  6. #6
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    hm.. there'll gonna be a small delay.. I want to include newest CMM4 0.1f in the update and release the results tomorrow..

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Thank you!

    Do you plan to release it as an open source project?
    Not until i think it's complete. During my next vacation i'll investigate context merging via state machines and my aim is to generate tehm from training data. When i was successful, there won't be much more room for improvement (at least with the current architecture).

  8. #8
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by toffer View Post
    Not until i think it's complete. During my next vacation i'll investigate context merging via state machines and my aim is to generate tehm from training data. When i was successful, there won't be much more room for improvement (at least with the current architecture).
    Why to wait? I think that you should release sources now. Some people can help you in improving speed or compression ratio. It can speed up developement of CMM. I believe that PAQ became the most powerful compressor as it was open-source. Moreover many ideas in CMM or CCM are taken from PAQ.

  9. #9
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @inikep
    That is true.

    But for now, it's still my "littel fun project" - if everything is cleaned and works as i want, i'll release it.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

  11. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks!

  12. #12
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    I spent a little bit of free time on trying some of the ideas from my TODO-list. Here's a slightly improved version, it's faster and offers better compression on X86 data (it seeds context hashes differently).

    http://freenet-homepage.de/toffer_86/cmm4_02_080710.7z

    Looking forward to comments & tests, enjoy!

  13. #13
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Chris!

    Mirror: Download

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    This benchmark http://ctxmodel.net/files/MIX/mix_v2.htm
    now includes the recent CMM4 results.
    Seems like it got slower on my machine

  15. #15
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Shelwien View Post
    This benchmark http://ctxmodel.net/files/MIX/mix_v2.htm
    now includes the recent CMM4 results.
    Seems like it got slower on my machine
    It's a bit slower on my systeem too (SFC files in SFC.QFC) :
    (C2Q E6600 / 4gb RAM / Vista Prem.)

    CMM4 v0.1f :
    Code:
    G:\test\SFC>timer cmm41 67 sfc.qfc sfcqfc_cmm4_01f_67.cmm41
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    CMM4 v0.1f by C. Mattern  Jun  4 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 1640206 kB.
    Encoding: done.
      Ratio: 10807941/53135003 bytes (1.63 bpc)
      Speed: 824 kB/s (1184.2 ns/byte)
      Time: 62.92 s
    Global Time  =    64.740 = 00:01:04.740 = 100%

    CMM4 v0.2 :
    Code:
    G:\test\SFC>timer cmm4 67 sfc.qfc sfcqfc_cmm4_02_67_nr2.cmm42
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    CMM4 v0.2 by C. Mattern  Jul 10 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 1641934 kB.
    Encoding: done.
      Ratio: 10807938/53135003 bytes (1.63 bpc)
      Speed: 794 kB/s (1229.9 ns/byte)
      Time: 65.35 s
    Global Time  =    67.158 = 00:01:07.158 = 100%
    Last edited by pat357; 10th July 2008 at 19:49.

  16. #16
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You should try single file compression, which gives a significant gain (and you can see what the improved models do). The previous version was profiled against enwik. This version was profiled using (de)compression
    the whole SFC. That is strange. Could you please compare the results (time&speed) without tarring everything together. Since i'm not doing any segmentation/data analysis you bypass the improved exe modelling and x86 filter!

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Here's SFC for you: http://ctxmodel.net/files/MIX/mix_v2_SFC.htm
    Of course, files are compressed separately in my tests.

  18. #18
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I'll try to use a profile from enwik, as before. This seems to be strange...

    -> Thanks for testing to all of you
    Last edited by toffer; 10th July 2008 at 21:15.

  19. #19
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    in my test cmm4 compresses my testfile very good

    cmm4 compresses not so good as paq8
    but the paq8 needs 24 h and cmm4 needs only 16 min

    if increase memory-use then compression-ratio increases a little bit
    if increase memory-use then compression-time increases a little bit

    cmm4 43 17.961.415 bytes
    cmm4 54 17.807.663 bytes
    cmm4 65 17.593.042 bytes
    cmm4 76 17.519.224 bytes
    cmm4 87 17.494.956 bytes

    but for me it is no big difference


    CMM4 v0.2 by C. Mattern Jul 10 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.

    cmm4 43

    Allocated 118222 kB.
    Ratio: 17961415/648331264 bytes (0.22 bpc)
    Speed: 663 kB/s (1471.5 ns/byte)
    Time: 954.04 s

    cmm4 54

    Allocated 232910 kB.
    Ratio: 17807663/648331264 bytes (0.22 bpc)
    Speed: 656 kB/s (1488.3 ns/byte)
    Time: 964.89 s

    cmm4 65

    Allocated 462286 kB.
    Ratio: 17593042/648331264 bytes (0.22 bpc)
    Speed: 649 kB/s (1502.4 ns/byte)
    Time: 974.08 s

    cmm4 76

    Allocated 921038 kB.
    Ratio: 17519224/648331264 bytes (0.22 bpc)
    Speed: 642 kB/s (1520.1 ns/byte)
    Time: 985.56 s

    cmm4 87

    Allocated 1838542 kB.
    Ratio: 17494956/648331264 bytes (0.22 bpc)
    Speed: 637 kB/s (1531.4 ns/byte)
    Time: 992.88 s

    good work !

    but because it does not support compression of a whole directory
    it will be difficult for further comparing tests
    if i compress a iso-file which contain several files there is not a good result

    may be the compressor in a further version
    can treat (compress) each file within the iso-file separately ?

    or
    what do you propose for testing "compress multiple file within a directory" ?
    Last edited by joerg; 11th July 2008 at 11:46.

  20. #20
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for testing.

    I don't recommend using it for archiving purposes, since different versions are usually incompatible. And i don't know if it contains bugs, i haven't encountered any since CMM4 .1b, but you can never be sure.

    At the moment i haven't implemented any archiver functionality, in most cases it will be the best compressing every individual file on its own.

    The main problem when compressing TARs or some container which holds several different files is, that if you cross a boundary between two files A, B the model uses statistics from file A to compress file B, it adapts relatively slow. That's not a problem for homogenous files (e.g. if file A and B are text files), here you even benefit. But if A is a text file and B contains PCM data, that's horrible.

    During the next few weeks i'll have more time to work on it. I might include some mechanisms for either archiving and quick file detection or segmentation.

    Against what file types are you testing?

  21. #21
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    i do not want to use the program for real Backups for now
    (for the real backup-purposes i am using the wonderful 7zip)

    but i want to test it and compare the results

    the big file db.dmp is a dump-file from an oracle database

    first look:
    for this special file cmm4 needs half the time as 7zip
    and compresses 2 times better as 7zip

    because this i want test it with a set of 315020 files
    (20 GB within a directory with 10000 subdirectories)

    the most of this files are *.doc *.txt *.ini - files
    but there are *.exe *.dll *.mdb - files and other files too

    it would be wonderful
    if you can complete your compressor
    with some mechanism for archiving (compressing of a whole directory)
    or "quick file detection or segmentation" within a single file

    maybe:
    - a iso-file "originally a image of a CD" can contain a whole directory-tree
    - if we can compress each file within the iso-Files separately
    and store all compressed files in a new iso-file in compressed form
    within the existing directory-tree then we have solved too the problem

    but i dont know this will be a practically way for a archiver-implementation

    thank you very much

    best regards joerg

  22. #22
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The easiest thing is to implement a "file collector", which groups files by file type, applies specific filters to files/or tells the main compressor to use special models. To solve the problem of adaption between different types it is sufficient so simply reset the model. This file list along with a directory tree, etc... could be stored within an archive.

    I think implementing this won't be a big deal, but will lead to massive improvements in benchmarks like maximumcompression, squeezechart, etc.

    If you want to estimate the performance do the following:

    1. make a file list sorted by extension (this isn't perfect)
    2. TAR together similar file types (e.g. txt, ini, html in one tar; exe, dll in another, ...)
    3. compress these TARs seperately.

  23. #23
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    I fixed the issues. It turns out, that the windows compile is faster without using any profiling! While under linux things are like supposed to be. Strange, isn't it?! I've modified the data structures, which results in a bit worse comrpession, but it leaves room for future speed improvements, you'll see

    http://freenet-homepage.de/toffer_86/cmm4_02a_080712.7z

  24. #24
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Chris!

    Mirror: Download

  25. #25
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    joerg, we use freearc for this purpose it already precofigured for a lot of compressors hope somene will share cmm settings with us

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    CMM is good! My tests shown that in many cases it's even stronger than CCM by Christian Martelock (and MMM by Serge Mavrody). Only one thing its speed... So I think you may freely gain some speed at some cost of the compression power.

  27. #27
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    78
    Thanks
    436
    Thanked 22 Times in 17 Posts
    Quote Originally Posted by encode View Post
    (and MMM by Serge Mavrody)

  28. #28
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks!

    CCM beats CMM only, when it uses its filters (that's what my experience shows). I don't have any expect x86 (which uses a crappy "MZ" detection). As i said some time ago, the core compression engine's performance is equivalent to LPAQ1/2 (not on text) - usually better for very redundant files.

    ATM it is somehow between LPAQ and CCM in terms of compression and speed.

    @Ilia
    What's MMM, i've never heard of it? Any links?

    If you tested it, could you post some results? I'm unsure if its "as" bug free as the previous version, since i changed alot of source to make room for upcoming speed improvements.


    @Bulat:
    Isn't an integration of CMM into FA trivial? (just some config changes?). I'm unsure, since i only saw your website . If you could give me a link to the config layout i would do it.

  29. #29
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    78
    Thanks
    436
    Thanked 22 Times in 17 Posts
    MMM

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Some testing results with calgary.tar:

    CCM 1.30c, 7 -> 738,016 bytes
    CCMx 1.30c, 7 -> 725,423 bytes
    CMM4-0.1e, 77 -> 695,057 bytes
    CMM4-0.2a, 77 -> 694,336 bytes

    CMM4 v0.1e was very slow - working at ~10 KB/sec, new version is crazy - compresses even better at ~627 KB/sec!

    Congratulations!

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •