Page 1 of 5 123 ... LastLast
Results 1 to 30 of 130

Thread: CMV

  1. #1
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts

    CMV

    Hi.
    I am a long time lurker here and this is my first post.
    In the past I wrote programs (never published) on DMC, LZ and PPM/CTW, then I started developing cmv about 2 years and half ago to play with CM.
    I wanted to check if there was improvements in using a counter that is a mix of 1 slow and 1 fast non-stationary counter, and the "best" of them, pointed out by the mixer, has an influence on the other (speeding up or slowing down the adaption rate of the "worst"). These are the standard counters used in cmv.
    Cmv is a closed source, badly written, very slow (my main focus is on compression ratio, not speed), CLI Windows program.
    At the moment there is only a 32 bit version of the program, it can use up to 4 Gb RAM on 64 bit O.S. but full options needs ~4.1 Gb, so, in this case, I often disable the bit 12 of the switches (variable order and memory model) because it seems already modelled quite well by order N and match models. In the next days/weeks I would install a new C/C++ compiler compatible with Windows 8.1 to make a 64 bit executable to test all the options enabled.
    Cmv is a single file compressor, not an archiver; it handles file size up to 4 Gb - 1, it saves original and compressed file size, it doesn't save and restore date, hour, attributes, etc..
    Cmv hasn't filters, you can use paq8pxd16 -s0 or -f0, DRT, DC, etc..
    Initialization of the models can take some time.
    The switches shortcuts and operators are the worst characters I can choose, but I like them: enclose the model+switches with "".
    Use cmv only for test on saved files. Future versions will break backward compatibility.
    "cmv -h" for a short help, "cmv -hv" for the long one.
    If you have any questions, feel free to ask.


    Code:
    Some benchmarks:
    -     9.594.090 Maximum Compression corpus
    -    35.570.898 Silesia corpus
    -    18.153.319 ENWIK8
        150.226.739 ENWIK9 (-m2,3,+, to do with better options)
    -       613.782 Calgary corpus (14 files, tarball)
    -       331.785 Canterbury corpus (tarball)
    -   840.486.023 Huge Files Compression Benchmark (vm.dll) (-m1,3, to do with better options)
    - 1.286.407.213 Lossless Photo Compression Benchmark (-m1,3, to do with better options)
    -    62.002.677 Testing compressors with artificial data
    -    0.97376289 Generic Compression Benchmark
    More benchmarks are in the file Benchmarks-00.01.00-2015.09.06.zip.
    
    Some files compressed well by cmv compared with the best known:
    - LOG.txt and NUM.txt (Specific case - High redundant data in a pattern)
      23.009 LOG.cmv (-m2,0,0x01ec039f (^0x02000008))
      23.571 LOG.nz (-cO)
         819 NUM.cmv (-m2,0,0x00ec8bd9 (^0x0300884e))
       3.404 NUM.nz (???)
    - SRR062634.filt.fastq (Compression Competition -- $15,000 USD)
      14.958.423 SRR062634.filt.fastq.cmv (-m2,3,0x03ededff (>&b12))
      15.035.056 test0-8.paq8px (paq8px_v69 -8 !?)
      15,165,514 test0.paq8px (paq8px_v69 -5 ?)
    - x-ray (Silesia Open Source Compression Benchmark)
      3.568.??? x-ray.cmix7
      3.568.??? x-ray.cmix6
      3.568.??? x-ray.cmix5
      3.569.??? x-ray.cmix4
      3.569.??? x-ray.cmix3
      3.570.??? x-ray.cmix2
      3.571.523 x-ray.cmv (-m1,0,0x00a3619f (*^0x034e9400))
      3.577.??? x-ray.cmix1
    - FFADMIN.EXE (Why Does NanoZip Compress This File More Than PAQ8 And cmix?)
      5.015.668 FFADMIN.nz (-cc ?)
      5.265.328 FFADMIN.cmv (-m2,3,0x03ededff (>&b12))
      5.292.665 FFADMIN.cmix (v6 !?)
    Attached Files Attached Files

  2. The Following 7 Users Say Thank You to Mauro Vezzosi For This Useful Post:

    Bulat Ziganshin (7th September 2015),Cyan (7th September 2015),Darek (10th September 2015),encode (1st March 2016),JamesB (17th September 2015),Mike (9th September 2015),Stephan Busch (6th September 2015)

  3. #2
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    317
    Thanks
    168
    Thanked 51 Times in 37 Posts
    Great!

    I will play around with it and report back with any bugs I find.

    EDIT: Wow, when using "-hv" there are an incredible amount of options to play with! If your LZ program is just as complex and powerful as CMV, please reconsider releasing it publicly.
    Last edited by comp1; 7th September 2015 at 16:40.

  4. #3
    Member
    Join Date
    Aug 2015
    Location
    Nice
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Cmv is a closed source, badly written, very slow (my main focus is on compression ratio, not speed), CLI Windows program.
    Basically you joined to dump your garbage in here: a "closed source", "badly written", "slow", "Windows program". The only more useless thing than that would be the same program implementing a patented algorithm and proprietary "non-standards".

    Sorry if I come out as rude, but all this nonsense really angers me! Why do you think it was a great idea to make anything like that? What are you afraid of? That people will "steal" your "intellectual property"? And more importantly, why on Earth do you think I'll ever run anything like that on my machine? After studying compression from documentation written by other people and made available for free, what's the best you come up with? A proprietary, closed source blob for Windows...

    If you want your efforts to be taken seriously, please make yourself a favor and choose a free license instead. There are plenty.

    In the next days/weeks I would install a new C/C++ compiler compatible with Windows 8.1 to make a 64 bit executable to test all the options enabled.
    If you don't care enough to use a proper OS, you should at least install Cygwin and make a free program that is compatible with a free OS.

  5. #4
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts

    Hi!

    At first I have to mention that I found this compressor very interesting.

    However I've some issues with put this longer help (-hv) to textfile - dos function ">textfile" didn't work with this file . Strange.
    Could you made some instruction or even extended help version of CMV in text/doc file?

    I've started to test it and compression ratio is quite interesting especially that for me time is not so important if we could achieve better results.

    Regards
    Darek
    Last edited by Darek; 7th September 2015 at 20:36.

  6. #5
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    coyote is right about some things but in my opinion the charges go to far. Blaming a release for being compiled for Windows only isn't right in my opinion. Windows has the biggest market share of all operating systems even though Windows is only made for the x86 architecture. There is a lot of so called "free" software where the programmer doesn't release a compiled file at all but expects every user to have enough programming experience and software on his computer to compile the source code so that in the end the software is being compiled hundreds of times. But in this case most people just use a different software because they can't compile the source code. There are even c compilers where you don't get the compiled file of the compiler but only the source code of the compiler :-D Not releasing the compiled files at all is a failure but only releasing the compiled files for the biggest operating system is not so good but acceptable.

    Quote Originally Posted by coyote
    ... free license ...
    Software which is under a license is not free. Public domain software is free and freeware is pretty close to free. GPLed software for example is copyright protected software. You are only allowed to copy the software if you agree to a very restrictive license which reduces freedomes. Otherwise you would violate copyright laws.

    Quote Originally Posted by coyote
    If you don't care enough to use a proper OS...
    I hope that you aren't talking about Linux. Because from a low level programmers point of view Linux is a catastrophe while Windows is the king of the hill. Did you ever try to create a "hello world" programm for Linux if your "hello world" has non-ascii characters? It's a pain in the a*s to just figure out which character encoding Linux uses. High level languages hide this but this is still a huge design bug in Linux. Then if you would like to work with the local time you again have to go through a hell of sh*t to figure out the local time. In Windows simple issues like these are solved much better. Besides of design issues Linux also lacks a good documentation while Windows is pretty good documented. So I don't think that it's appropriate to call Windows a bad operating system. But I had this opinion too. I thought that Linux is cool and fast. That was in the past till I learned to programm software instead of just using software (including using a high level compiler).

    I don't like the new way that Microsoft Corporation goes since the release of Windows XP (integrating spyware, trojan horses, activation procedures, etc. into the operating system) but all in all Windows is still an acceptable operating system.
    Last edited by just a worm; 8th September 2015 at 00:09.

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,475
    Thanks
    702
    Thanked 645 Times in 347 Posts
    can you please stop talking about nonsense? you can do it in offtopic areas if you really need it

  8. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    Intrinsic (8th September 2015)

  9. #7
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by comp1 View Post
    Wow, when using "-hv" there are an incredible amount of options to play with! If your LZ program is just as complex and powerful as CMV, please reconsider releasing it publicly.
    I never published my old programs because they aren't good or innovative, especially those on LZ, they are just like many others (probably except one on DMC and one on PPM (or it was a CTW?!)).
    Instead, cmv has good compression ratio and I still have many things to do about it.

    The help of cmv can be quite hard to understand, ask for any doubt.
    Some info about "-m" switch (see also "cmv -hv").
    First of all, you must choose the method (default: -m1):
    -m0: "fast" and worse compression ratio.
    -m1: slow and good compression ratio.
    -m2: slower and better compression ratio.
    Then choose how much memory gives to it (default: 0):
    0: 1x, minimum.
    1: 2x.
    2: 4x.
    3: 8x.
    Finally, choose optional models, mixers, memory, etc. (default: depends by the method), some shortcuts:
    <: disable all models and mixers (enables only the minimum options).
    -: disable some options (it depends on -m<N>), it is about 2 times faster.
    +: enable more options (it depends on -m<N>), it is about 2 times slower.
    *: enable all models.
    >: enable all options (full options).
    Number, &, |, ^: manually selection of the options.
    Example:
    cmv c "-m1,0,|b9" File_To_Compress File_Compressed_With_Default_Method_Plus_Word_Mode l.cmv
    cmv e File_Compressed_With_Default_Method_Plus_Word_Mode l.cmv File_Expanded
    Quote Originally Posted by Darek View Post
    However I've some issues with put this longer help (-hv) to textfile - dos function ">textfile" didn't work with this file . Strange.
    Could you made some instruction or even extended help version of CMV in text/doc file?
    Cmv can output the OutputFile to stdout:
    cmv c . . < file1 | cmv e . . > file2
    To avoid mixing, it writes the message (help, error, "progress row" (what is it called?)) to stderr.
    Use "2>" to redirect stderr:
    cmv -hv 2> doc.txt
    (To redirect the "progress row" you can use "-vn -voFile.txt")
    Quote Originally Posted by Darek View Post
    I've started to test it and compression ratio is quite interesting especially that for me time is not so important if we could achieve better results.
    Are you sure that time is not important? It's quite annoying to wait days to compress ENWIK9 (full options enabled).

    @coyote and @just a worm
    I don't know english very well and it takes time to answer (and I have trouble to edit the post).
    I'll answer next time.

    Bye

  10. #8
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Are you sure that time is not important? It's quite annoying to wait days to compress ENWIK9 (full options enabled).
    Yeaaah, I'm sure.

    The first is that I'm looking for maximum compression ratio which I can reach witch my conditions - CPU/GPU performance, RAM constrains, etc. That's the idea - maximum possible compression ratio.
    Of course if compresion time would be counted in months then I'll wait for more powerful machines. However I'll still to try, test, compare. Time counted in days for tests is reasonable for me. Seriously.

    Maybe now, some compressors looks very heavy and slow, however in next couple of years it's status could be changed to really usable like CM compressors (i.e. nanozip or even PAQ).
    I've started to test compressors on Commodore64 when lenght of compressor programs was counted in bytes and when about today's capabilities no one even dreamed.

    Thank you for advise how convert help to text. It works.

    I'll test CMV on my testbed and will try to give you feedback, then I'll wait for next versions.

    Darek

  11. #9
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    First, quick test on my testbed:

    Click image for larger version. 

Name:	cmv_tst1.jpg 
Views:	198 
Size:	522.3 KB 
ID:	3803

    1. Generally, there are very small difference between max option ">" and all models "*", however all models option is a tiny better in summary.

    2. I've spotted memory allocation issue for test "-m2,3,>" mode, despite my computers have 16 and 32GB of ram.

    3. Compression ratio is comparable to CMIX v6 in total test (CMV is worse by 2.3%), however this compressor have good text model and difference in textual files is about 16.8%. CMIX v7 build from Byron Knoll is not stable and it didn't finisz the test with Dictionary.

    4. paq8pxd v16 skbuild for nonmodel files have similar compression to CMV, however it wins with CMV in total due to special models and algorithm to find graphic, wav, text parts inside compressed file. There are WAV model, BMP/TIF/TGA model, JPEG recompression model and TEXT dictionary model.

    5. paq8kx_v7 - still unbeaten in my testbed due to good overall compression and all models usage (however these models are worse than paq8pxd v16).

    6. There no huge difference between Word and nonWord option - 1.6% on CMV. If I proper understand that option |b9 switch on Word model and &512 switches it off....

    I've a question then:

    Is there more powerful option in CMV to compress better than "*" and ">" switches generally or for specific file types (wave, graphics, etc.)?

    Darek
    Last edited by Darek; 8th September 2015 at 17:11.

  12. The Following User Says Thank You to Darek For This Useful Post:

    Mauro Vezzosi (9th September 2015)

  13. #10
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by coyote View Post
    Why do you think it was a great idea to make anything like that?
    I don't think it was a great or little idea, I write cmv for curiosity and my own pleasure, I like to program and compression algorithm.
    Quote Originally Posted by coyote View Post
    That people will "steal" your "intellectual property"?
    Cmv is closed source, but algorithm/information about it are not closed, feel free to ask and I try to answer (don't ask to document exactly, in depth and throughout the programm!).
    Quote Originally Posted by coyote View Post
    why on Earth do you think I'll ever run anything like that on my machine?
    I don't want anyone will use cmv for any real application, I don't want to make money or sell it, I don't want to be famous.
    If you want, I can use it for tests or have some ideas to try on your programm, otherwise ignore it.
    Quote Originally Posted by coyote View Post
    After studying compression from documentation written by other people...
    When I read some documentation, I understand only the first pages (e.g. the introduction), I don't understand theory and dimostration.
    When I read source code, I am too lazy to understand exactly how it works (unless it's an easy piece of code), instead I try to understand the main idea behind it, if it can be usefull or not for me, if I can replicate it in the same or different way, etc..
    Quote Originally Posted by coyote View Post
    A proprietary, closed source blob for Windows
    There are many good compressor that are closed source: nanozip, PPMMonstr, zcm, pimple, etc.. (sorry for all other good compressor I don't list).
    Quote Originally Posted by coyote View Post
    If you want your efforts to be taken seriously, ...
    No, please, don't take my program seriously, take it like a toy to throw away tomorrow.

  14. #11
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by Darek View Post
    The first is that I'm looking for maximum compression ratio which I can reach witch my conditions - CPU/GPU performance, RAM constrains, etc. That's the idea - maximum possible compression ratio.
    I'm looking for maximum compression ratio too, and try new models, counters, adaption of parameters, etc..
    If I'm not wrong, if and when we will have a quantum computer we can think about a CM compressor that, for every bit, reconstruct from the scratch the models for the contexts of that bit only: it will need much less memory (just some MB?) and it won't have collision in the hash tables (improving the compression ratio). The mixers will be handle in the standard way.
    Quote Originally Posted by Darek View Post
    I've started to test compressors on Commodore64...
    The Commodore 64 was a great machine, I still have it somewhere.

  15. #12
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by Darek View Post
    First, quick test on my testbed: ...
    Thank you for your test!
    Quote Originally Posted by Darek View Post
    1. Generally, there are very small difference between max option ">" and all models "*", however all models option is a tiny better in summary.

    2. I've spotted memory allocation issue for test "-m2,3,>" mode, despite my computers have 16 and 32GB of ram.

    3. Compression ratio is comparable to CMIX v6 in total test (CMV is worse by 2.3%), however this compressor have good text model and difference in textual files is about 16.8%. CMIX v7 build from Byron Knoll is not stable and it didn't finisz the test with Dictionary.

    4. paq8pxd v16 skbuild for nonmodel files have similar compression to CMV, however it wins with CMV in total due to special models and algorithm to find graphic, wav, text parts inside compressed file. There are WAV model, BMP/TIF/TGA model, JPEG recompression model and TEXT dictionary model.

    5. paq8kx_v7 - still unbeaten in my testbed due to good overall compression and all models usage (however these models are worse than paq8pxd v16).

    6. There no huge difference between Word and nonWord option - 1.6% on CMV. If I proper understand that option |b9 switch on Word model and &512 switches it off....

    I've a question then:

    Is there more powerful option in CMV to compress better than "*" and ">" switches generally or for specific file types (wave, graphics, etc.)?
    1. With -m2, ">" is "*" + 2x memory for some models. On such small files, improvement is small.
    2. Already advised in my first post:
    At the moment there is only a 32 bit version of the program, it can use up to 4 Gb RAM on 64 bit O.S. but full options needs ~4.1 Gb, so, in this case, I often disable the bit 12 of the switches (variable order and memory model) because it seems already modelled quite well by order N and match models.
    3. Do you compare cmv with cmix? Cmix is much better, I think that current "competitor" of cmv is FP8 v1-v3.
    4. and 5. Cmv hasn't specific models except for text (Word model), I prefer to write generic models. Cmv has many sparse models.
    6. Word model is usefull on text-like data (your K.WAD and M.DBF files?), on other kind of data it have small impact.
    The Word model have the value 0, 1, 2 and it have 2 bit of the options.
    "|" switch on, "&" switch off". The number can be written in many ways: 512 = b9 = B21000000000.
    Switch on the first bit of the Word model: |512 = |b9 = |B21000000000.
    Switch off the first bit of the Word model: &512 = &b9 = &B21000000000.
    7. No more powerful or specific file types option exist. ">" enables all options but sometimes it isn't the best:
    x-ray best -m1,0,0x00a3619f (*^0x034e9400).
    LOG.txt best -m2,0,0x01ec039f (^0x02000008 ).
    NUM.txt best -m2,0,0x00ec8bd9 (^0x0300884e).
    Last edited by Mauro Vezzosi; 9th September 2015 at 00:33. Reason: Forgot to quote the Darek's post

  16. #13
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    @Darek
    -m2 already enable Word model 1, therefore -m2,3 is the same of -m2,3,|b9.
    Could you check why you have 2 different file lenght on K.WAD? (2765921 and 2765907)
    Have you compared the expanded file with the original?
    If you have time, can you test also -m2,3,0x03ededff?
    TIA.

    EDIT
    I understand. The difference is 14 bytes, that is the standard file header of cmv.
    You took 2765907 from the final row displayed by cmv, which is not the file length because the header is not take into account.
    The file header can be 14 or 18 bytes, depending if you change an option or not.
    Use -vf to display also the final file lenght.

    Examples:
    cmv c MaximumCompression\world95.txt world95.txt.cmv
    In 2988578 out 399980 ratio 0.13384 bpb 1.0707 time 03m56 (235.66s)

    cmv c -vf MaximumCompression\world95.txt world95.txt.cmv
    MaxComprStd\world95.txt in 2988578 out 399980 (file 399994) ratio 0.13384 bpb 1.0707 time 03m55 ( 235.25 seconds) method 1,0,0x03ec0195

    cmv c -m0,3 -vf MaximumCompression\world95.txt world95.txt.cmv
    MaxComprStd\world95.txt in 2988578 out 481654 (file 481668 ) ratio 0.16116 bpb 1.2893 time 16s57 ( 16.57 seconds) method 0,3,0x00100010
    14 bytes file header, don't need to extend the file header, method and memory (e.g. -m0,3) are included in the standard file header (14 bytes).

    cmv c "-m1,0,|b0" -vf MaximumCompression\world95.txt world95.txt.cmv
    MaxComprStd\world95.txt in 2988578 out 399980 (file 399994) ratio 0.13384 bpb 1.0707 time 03m52 ( 232.38 seconds) method 1,0,0x03ec0195
    14 bytes file header, don't need to extend the file header, -m1 already enables sparse model so "|b0" don't change the options.

    cmv c "-m1,0,|b9" -vf MaximumCompression\world95.txt world95.txt.cmv
    MaxComprStd\world95.txt in 2988578 out 397697 (file 397715) ratio 0.13307 bpb 1.0646 time 04m02 ( 242.35 seconds) method 1,0,0x03ec0395
    18 bytes file header, need to extend the file header, cmv must add 4 bytes to the header to save the options because differs from the default (-m1 don't enables Word model 1).

    I know, cmv is not user friendly.
    Last edited by Mauro Vezzosi; 9th September 2015 at 09:49. Reason: See EDIT

  17. #14
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    3. Do you compare cmv with cmix? Cmix is much better, I think that current "competitor" of cmv is FP8 v1-v3.
    I don't agree with CMV compression ratio comparison to FP8 level. I've tested it already. FP8 wins only on special model files (BMP/TIF/TGA/WAV/JPG/TXT) - to be honest FP8_v3 JPG model is the best of all paq variants...

    FP8_v3, which is the best of this three FP8 programs, for non-model files, for my test is about 6.3% worse than CMV - and this is a quite huge difference - see my new table.

    I've added to table another compressor - WinRK 3.03b to comparison and then you have (for my testbed) three the best compressors - CMIX, PAQ (in two variants: pxd and kx) and WinRK.

    You should compare rather to these three state of the art compressors. Comparison for non-model files looks like follows:

    CMV vs. WinKR 3.03b - CMV wins by 1.1%,
    CMV vs. Paq8pxd v16 skbuild 3- CMV wins by 0.2% - best comparison for me,
    CMV vs. Paq8kx v7 - CMV lost by 2.0%,
    CMV vs. Cmix v6 - CMV lost by 3.6%.

    Of course due to Word model w/o dictionary - the comparison with enwik8 and enwik9 will be shows advantage of dictionary copressors but not compression at all.

    If you add special models for txt (esp. dictionary) and multimedia files then CMV could beat most of contemporary compressors. For me you have only two of it to beat...

    table:

    Click image for larger version. 

Name:	cmv_tst2.jpg 
Views:	160 
Size:	613.5 KB 
ID:	3808

    Thank you for compression options - I'll test it.
    K.WAD difference - I've checked it and compress again, and it was a some mistake. Both scores are the same.

    Darek

  18. The Following 2 Users Say Thank You to Darek For This Useful Post:

    comp1 (9th September 2015),Mauro Vezzosi (11th September 2015)

  19. #15
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by just a worm View Post
    ...Because from a low level programmers point of view Linux is a catastrophe while Windows is the king of the hill.
    If you think you can made a better OS, please write and share with us.

    Turning back IT it's not so clear for me why smaller size is so much more relevant than speed.
    With loseless encoding we know that there is a entropy limit, and pratical-multipurpose compressor speed is (for me) the real interesting playground.

    Compressing a week to gain some bytes is almost useless, like compute pi to a zillion digit.
    Maybe good for stressing hardware

  20. #16
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    Quote Originally Posted by fcorbelli View Post
    Compressing a week to gain some bytes is almost useless, like compute pi to a zillion digit.
    Maybe good for stressing hardware
    @fcorbelli: From practical perspective you are absolutely right, and we all probably use 7zip or Rar to archive files istead CMIX. I agree with you in 100%.

    However, we should differentiate practical approach from exploratory one - race for "best ever" compression ratio is a completely different topic than made most effective archiver. There are two different ways. I am a fan of experimental compression, but, to be fair, I understand practical approach too.

    At now it might look as useless to use these COMPRESSORS, but algorithms founded in experimental approach could be used in future in next generations of practical ARCHIVERS.

    Some people notwithstanding compute pi to a zilion digit... Why they do it? Because is there something irrationally beautifull in the pursuit of infinity? Maybe...

  21. The Following 2 Users Say Thank You to Darek For This Useful Post:

    Gonzalo (14th October 2017),Skymmer (10th September 2015)

  22. #17
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Completely agree with Darek and his excellent words. I can only add that the maximum compression is the only compression area where you can definitely determine the best competitor for a given data set, because (N-1) bytes is always smaller than N.
    Fast and effective compressors are nothing more than endless speculation with possibility of biasing through different efficiency calculations, different OSes, measurement tools, CPUs, bitness, weather outside, mood of the tester and so on.

  23. The Following 2 Users Say Thank You to Skymmer For This Useful Post:

    Darek (10th September 2015),Matt Mahoney (14th September 2015)

  24. #18
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by Darek View Post
    CMV vs. Paq8kx v7 - CMV lost by 2.0%,
    Most probably 2.0% is too much for the next version of cmv.

    Quote Originally Posted by Darek View Post
    If you add special models for txt (esp. dictionary) and multimedia files then CMV could beat most of contemporary compressors.
    At the moment I'm not interested to add specific models, it takes time to implement model for only one kind of file, I prefer to write general or new (at least for me) models.
    However, I can change my point of view in future.
    To improve compression on some file you can run a program (e.g. paq8pxd16 -s0) to filters the files before launch cmv (see Benchmarks.txt for some comparisons on txt, exe, bmp, etc.).
    Example:
    Without filter:
    cmv c -m2,0,"*" rafale.bmp rafale.bmp.cmv --> 653.804
    cmv o rafale.bmp.cmv rafale.bmp

    With filter:
    paq8pxd_v16 -s0 rafale.bmp
    cmv c -m2,0,"*" rafale.bmp.paq8pxd16 rafale.bmp.paq8pxd16.cmv --> 644.030 1,49% gain.
    cmv o rafale.bmp.paq8pxd16.cmv rafale.bmp.paq8pxd16
    paq8pxd_v16 rafale.bmp.paq8pxd16

    Quote Originally Posted by Darek View Post
    For me you have only two of it to beat
    Thank you for your opinion, but I don't race for "best ever" compression ratio or to beat other compressors, my first "competitor" is cmv itself, not another program.
    Obviously I'm happy if my program has good compression ratio.

    Thank you for your comparison table!
    Last edited by Mauro Vezzosi; 11th September 2015 at 01:00. Reason: Text tuning

  25. #19
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Most probably 2.0% is too much for the next version of cmv.
    I undestand. That's the fact is CMV is a compressor worth of notice and to trace it's development.

    Click image for larger version. 

Name:	cmv_tst3.jpg 
Views:	181 
Size:	1.48 MB 
ID:	3816

    I've tested some additional models and options. For my testbed "-m2,3,*" and "-m2,3,0x03ededff" are still the best selections, however all "-m1,3,*", "-m1,3,>" and "-m2,2,>" are also worth of watching.

    One change to table - I've added header's bytes to comparison to be fair to other compressors - these are measured as whole compressed file. Blue filling cells means the best score from table, yellow - the best score for CMV for file.

    I haven't noticed any problems and errors during compression - then from this side I cant help much. CMV works fine, w/o any issues and it's quite fast compared to CMIX.

    Great program! I'll wait for next releases.

    Darek

  26. The Following User Says Thank You to Darek For This Useful Post:

    Mauro Vezzosi (12th September 2015)

  27. #20
    Member
    Join Date
    Sep 2015
    Location
    Madrid
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Great compression ratio. It's pity, that the decompression is not assymetrical. Good for science, not so good practically. I guess it's the major problem for all content-mix models...

  28. #21
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by Skymmer View Post
    Completely agree with Darek and his excellent words. I can only add that the maximum compression is the only compression area where you can definitely determine the best competitor for a given data set, because (N-1) bytes is always smaller than N.
    Fast and effective compressors are nothing more than endless speculation with possibility of biasing through different efficiency calculations, different OSes, measurement tools, CPUs, bitness, weather outside, mood of the tester and so on.
    You can make the "perfect" compressor, without much effort (but much time), with the (brutal) monkey-algo, for a given dataset.
    But Shannon's theorem put a very heavy "stone" on this research (maximum compression), but nothing on speed so, in theory, you can make a o(n) or even o(1) time.
    And, on speed, you can achieve huge differences, not some % (as in compression) but even order(s) of magnitude

  29. #22
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Most probably 2.0% is too much for the next version of cmv.
    I mean 0.2.0.
    0.1.1 won't change compression ratio and will be backwards compatible.

    Quote Originally Posted by Darek View Post
    I've tested some additional models and options. For my testbed "-m2,3,*" and "-m2,3,0x03ededff" are still the best selections, however all "-m1,3,*", "-m1,3,>" and "-m2,2,>" are also worth of watching.
    Thank you very much, your table is interesting.
    The cmv best overall is 12.614.645.
    U.DOC has 2 yellow cells: is it right?

    Quote Originally Posted by Darek View Post
    I haven't noticed any problems and errors during compression - then from this side I cant help much. CMV works fine, w/o any issues and it's quite fast compared to CMIX.
    Very good. Thank you for your effort!

    Quote Originally Posted by Polynauter View Post
    Great compression ratio. It's pity, that the decompression is not assymetrical. Good for science, not so good practically. I guess it's the major problem for all content-mix models...
    Statistical compressors need to do the same things in compression and decompression mode, therefore they are simmetrical (time and memory).
    Probably some other guys can give a better answer to you.

  30. The Following User Says Thank You to Mauro Vezzosi For This Useful Post:

    Darek (12th September 2015)

  31. #23
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by fcorbelli View Post
    You can make the "perfect" compressor, without much effort (but much time), with the (brutal) monkey-algo, for a given dataset.
    IMHO, a practical (not theoretical) monkey-algorithm is usefull to tune some parametrical value, not to achieve "big" improvement, like M1 compressor, LTCB: "M1 0.2a is a free, open source (GPL) file compressor by Christopher Mattern, released Oct. 3, 2008. It uses context mixing with only two contexts. The contexts are 64 bits with some bits masked out. The masks and several other parameters were selected by a combination of a genetic and hill climbing algorithms running for several hours to 3 days to optimize compression on this benchmark as discussed ...".
    I'm just working on this "monkey": he has already found a better options value for NUM.txt: -m2,0,0x016cc19a, file length 807 (previous best: -m2,0,0x00ec8bd9, file length 819) .

    Quote Originally Posted by fcorbelli View Post
    But Shannon's theorem put a very heavy "stone" on this research (maximum compression), but nothing on speed so, in theory, you can make a o(n) or even o(1) time.
    I understand your point of view but I agree with Darek and Skymmer.
    I'll continue my "research", so in the future we will have a compressor with compression ratio very close to the Shannon's theorem in o(1) time.
    Last edited by Mauro Vezzosi; 12th September 2015 at 20:42. Reason: Text tuning

  32. The Following User Says Thank You to Mauro Vezzosi For This Useful Post:

    Darek (12th September 2015)

  33. #24
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    848
    Thanks
    483
    Thanked 333 Times in 246 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    The cmv best overall is 12.614.645.
    U.DOC has 2 yellow cells: is it right?
    Yes, absolutely: CMV best score summary is 12'614'645 for my whole testbed (4'th place! - that especially make impression when we we know that couple of compressors behind CMV use multimedia models! ), 5'881'429 for non-model files (3'rd place!).

    U.DOC - That's a my mistake. I've painted cells manually and that's a reason. Proper table attached.

    Click image for larger version. 

Name:	cmv_tst3.jpg 
Views:	180 
Size:	1.48 MB 
ID:	3820

    Darek

  34. #25
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,253
    Thanks
    304
    Thanked 768 Times in 482 Posts
    Hi Mauro. Can you give me compress and decompress times, memory used, and hardware configuration for enwik8 and enwik9? Then I can add your results to LTCB benchmark. If you decide to make your code open source, then I will also add to the Silesia benchmark.

    Edit: cmv ranks 9th in LTCB. http://mattmahoney.net/dc/text.html
    enwik8 took 8 hours to compress and 8 to decompress, so I used your result for enwik9.
    Last edited by Matt Mahoney; 15th September 2015 at 18:33.

  35. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Mauro Vezzosi (15th September 2015)

  36. #26
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Can you give me compress and decompress times, memory used, and hardware configuration for enwik8 and enwik9? Then I can add your results to LTCB benchmark.
    It wasn't a serious test, I forget to use timer.exe, in the meantime I hibernated the computer, I did some other tests, etc.
    I prefer redo the final test when I'll make the 64 bit version of cmv to enable all options.
    Out of curiosity:
    ENWIK8 18.153.319: "cmv c -m2,3,0x03ededff" (0x03ededff == ">&b12"), time: ~20 hours, memory doesn't checked (estimated: little bit less than 4 Gb).
    ENWIK9 150.226.739: "cmv c -m2,3,+", time: ~60 hours, maximum memory I saw in Task Manager: 2.801.640 KB.
    ENWIK9 compression (or decompression) with all options enabled will takes ~~200 hours!
    Hardware configuration: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz (up to 3.50GHz), 8 GB RAM DDR3, Windows 8.1 64 bit.

    Quote Originally Posted by Matt Mahoney View Post
    If you decide to make your code open source, then I will also add to the Silesia benchmark.
    I know Silesia benchmark lists only open source programs.
    At the moment I'm not interested to make my code open source, I'll think about it in (far) future.

    Quote Originally Posted by Matt Mahoney View Post
    Edit: cmv ranks 9th in LTCB. http://mattmahoney.net/dc/text.html
    enwik8 took 8 hours to compress and 8 to decompress, so I used your result for enwik9.
    Thank you very much!
    The link of nanozip is broken, it should be http://mattmahoney.net/dc/text.html#1493, not http://mattmahoney.net/dc/text.html#1483.

  37. #27
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,253
    Thanks
    304
    Thanked 768 Times in 482 Posts
    Thanks. I updated LTCB.

  38. #28
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    723
    Thanks
    62
    Thanked 246 Times in 174 Posts
    cmv c -m2,3,0x03ededff enwik8 enwik8.cmv
    In 100000000 out 18153301 ratio 0.18153 bpb 1.4523 time 13h25 (48300.97s)
    18,153,319 bytes enwik8.cmv in 13:25:00.984 (48300.984 Seconds)

    cmv e enwik8.cmv enwik8.new
    In 18153301 out 100000000 ratio 0.18153 bpb 1.4523 time 13h32 (48743.66s)
    100,000,000 bytes enwik8.new in 13:32:23.676 (48743.676 Seconds)
    Compare Ok

    Memory use 3997MB.

  39. The Following User Says Thank You to Sportman For This Useful Post:

    Mauro Vezzosi (17th September 2015)

  40. #29
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    723
    Thanks
    62
    Thanked 246 Times in 174 Posts
    I get a smaller size for enwik9:

    cmv c -m2,3,0x03ededff enwik9 enwik9.cmv
    In 1000000000 out 149648348 ratio 0.14965 bpb 1.1972 time 05d14 (482630.36s)
    149,648,366 bytes enwik9.cmv in 5 days 14:03:53.199 (482633.199 Seconds)

    Memory use same as enwik8 little under 4GB.
    Last edited by Sportman; 23rd September 2015 at 12:17.

  41. The Following User Says Thank You to Sportman For This Useful Post:

    Mauro Vezzosi (23rd September 2015)

  42. #30
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    178
    Thanks
    84
    Thanked 101 Times in 73 Posts
    @Sportman: Thank you!
    Code:
                           ENWIK8        ENWIK9   ENWIK9 / ENWIK8   Memory MB
    -m2,3,+            18,218,283   150,226,739         8,2459329        2817
    -m2,3,0x03ededff   18,153,319   149,648,366         8,2435816        3997
    Gain                 0,356587%    0,3850000%                     41,88853%
    A lot more memory for a little gain in compression ratio (the models added with 0x03ededff are secondary importance for text).
    I have already planned to try to add some kind of LZ stage to improve compression on text and data with "long" repetition, perhaps I'll do it in ~0.3.0 version (~2016).

    Does anyone knows what LTCB top 10 compressors use a dictionary preprocessor (specifically in the LTCB benchmark)?
    AFAIK the following compressors use a dictionary preprocessor:
    cmix
    durilca'kingsize
    paq8pxd_v12_biondivers_x64
    paq8hp12any
    drt|lpaq9m
    mcm
    xwrt

    The following don't use a dictionary preprocessor:
    cmv

    I don't know if the following compressors use a dictionary preprocessor or not in LTCB:
    zpaq
    nanozip

    Cmv with DRT dictionary preprocessor on ENWIK9:
    drt | cmv c -m2,3,+ = 141.286.403 (without lpqdict0.dic = ~454 KB)

Page 1 of 5 123 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •