Thanks Chris!
Mirror: Download
A happy new year to everyone!
I hope everyone has had a beautiful holiday. Last weekend I found some time and motivation to look at CCM again. The new version is a bit faster again.
Most speed is gained by reducing/removing some data structures or LUTs. The compression loss is more or less compensated by a slightly improved match model.
I want to thank Stephan Busch very much for testing some interim versions over the weekend.
download CCM 1.30
Thanks Chris!
Mirror: Download
Just when I thought such a good thing couldn't get better, it did. Here's some benchmarks of 1.26b vs 1.30 on my system:
Test System: AMD Athlon 64 3000+ (Venice, single core) with 2GIG Corsair DDR CL2 RAM
ccm126b->ccm.exe 3 enwik8 (22325507 Bytes) - 1 m 36 s -
ccm126b->ccm.exe d enwik8 - 1 m 37 s -
ccm130->ccm.exe c 3 enwik8 (22351098 Bytes) - 1 m 25 s - compression 12.9% faster size .1% larger
ccm130->ccm.exe d enwik8 - 1 m 26 s - decompression 12.7% faster
ccm126b->ccmx.exe 3 enwik8 (21608039 Bytes) - 2 m 3 s -
ccm126b->ccmx.exe d enwik8 - 2 m 5 s -
ccm130->ccmx.exe c 3 enwik8 (21646059 Bytes) - 1 m 53 s - compression 8.8% faster size .1% larger
ccm130->ccmx.exe d enwik8 - 1 m 57 s - decompression 6.8% faster
ccm126b->ccm.exe 7 enwik8 (21960600 Bytes) - 1 m 39 s -
ccm126b->ccm.exe d enwik8 - 1 m 40 s -
ccm130->ccm.exe 7 enwik8 (21980533 Bytes) - 1 m 29 s - compression 11.2% faster size .09% smaller
ccm130->ccm.exe d enwik8 - 1 m 28 s - decompression 13.6% faster
ccm126b->ccmx.exe 7 enwik8 (20819731 Bytes) - 2 m 7 s -
ccm126b->ccmx.exe d enwik8 - 2 m 10 s -
ccm130->ccmx.exe 7 enwik8 (20857925 Bytes) - 1 m 57 s - compression 8.5% faster size .1% larger
ccm130->ccmx.exe d enwik8 - 2 m 3 s - decompression 5.6% faster
I don't know about everyone else out there, but ccm seems to be a terrific balance of speed and ratio. However, I wouldn't mind seeing CCMX's ratio increased at the cost of some speed (since it is only is about 30% slower). Maybe down to half the speed of regular CCM if it means getting 5% or so more ratio. It would also make CCM harder to catch by FreeArc ratiowise.
Thanks for all your hard work Christian!
Here's a few more this time with enwik9, and lots of memory.
ccm126b->ccmx.exe 7 enwik9 (174146696 Bytes) - 20 m 4 s -
ccm126b->ccmx.exe d enwik9 - 20 m 24 s -
ccm130->ccmx.exe 7 enwik9 (174142092 Bytes) - 18 m 41 s - compression 7.4% faster size .0026% smaller
ccm130->ccmx.exe d enwik9 - 18 m 58 s - decompression 7.5% faster
Thank you all!
@Francis
Ive seen your benchmark, good job! Though, there are some broken characters when I view the page. Anyway, cant wait for new numbers.
@Hahobas
Thanks for giving the new version a try. Its a little bit sad, that the increase in performance is so different on different platforms. Ive moved to C2D recently and there I have much better results:
CCM 3 on ENWIK8:
1.26 -> 21802.25 KiB (ratio 22.33%, speed 2355 KiB/s), ~41s
1.30 -> 21827.24 KiB (ratio 22.35%, speed 2891 KiB/s), ~34s
The first interim version Stephen tested was much faster on his Operton, too. Ill look into this further. Perhaps its just a compiler switch again.
In theory this is a very good suggestion. But practically, CCMX is just CCM plus ~10 lines more code. I cant improve the one version much without heavily effecting the other one. And I do not want to develop two seperate branches because of serveral reasons - mostly time and motivation.Originally Posted by Hahobas
Nonetheless, Ill probably rewrite CCM or something similar from scratch one day. And this one will most probable have stronger compression.
ratio is down?
CCM 3 on ENWIK8:
1.26 -> 21802.25 KiB (ratio 22.33%, speed 2355 KiB/s), ~41s
1.30 -> 21827.24 KiB (ratio 22.35%, speed 2891 KiB/s), ~34s
being modest, i think that ccm is 2nd best compression program of the yearOriginally Posted by Hahobas
Christian, isnt it possible to add more models which will be used only in ccmx?
another suggestions:
- better table analysis with ability to substract successive elements
- splitting imput file into several pieces and compressing them in parallel (for multicore cpus)
- segmentation (it seriously improves compression for durilca - look at MFC results with and without -t1 switch). http://www.compression.ru/ds/seg_file.rar is publicshed by DS, durilcas author - i dont know whether it is technique actually used there
> - better table analysis with ability to substract successive elements
I'd never seen a single case where a simple delta was the best solution,
actually. Sometimes just a proper 2D context is enough, sometimes its
good to subtract an extrapolation by some previous lines, but never
simple delta was the best.
Btw, I'm currently thinking about a design with dynamic symbol ranking
for unary coding. Well, typical solution is to sort symbols by context
order, then by MtF rank (aka number of different symbols since last
occurence). And it works much better than seemingly "more valid"
solutions like descending probability order. But anyway these are
empiric methods without any foundations.
Well, it might seem unrelated, but its actually the same thing as with
deltas in 2D table encoding. I think its obvious if you consider unary
encoding - delta gives rank0 to the same value as previous etc.
So, my current opinion is that rank0 should be assigned to the symbol
with max codelength in some context (descending codelength order).
Higher (=lesser) rank usually means more complex model and more precise
estimation in unary schemes, so that's why.
Any suggestions?
> - segmentation (it seriously improves compression for durilca - look at MFC
> results with and without -t1 switch). http://www.compression.ru/ds/seg_file.rar
> is publicshed by DS, durilca's author - i don't know whether it is technique
> actually used there
<div class="jscript"><pre>
-t1 - segmentation of file (default: (5,(0-3)));
-t1[(k[,(b1-t1)[,...]])]
k - precision of segmentation;
(b1-t1) - apply trick number t1 to block number b1;
(0-t1) - apply trick number t1 to all blocks;
</pre></div>
Actually, in first durilca releases was even a hidden "-l" switch
for dumping these segments into separate files... With it I made
a bruteforce optimizer of durilca parameters for Werner![]()
Do you have a linux version available yet ?
I love to compress my large syslogfiles with ccmx since I can use the stdin and stdout now and since it's text I get rather high ratios.
I'd love the speed gain to be available on linux too.![]()
It depends on the data - for some files ratio is "better". Anyway, in my opinion the speed increase more than justifies the very tiny compression loss.Originally Posted by l1t
Being modest myself, I think I could live with that.Originally Posted by Bulat Ziganshin
It is possible, but this would be too much work. CCM isnt as construction-kit-like as PAQ where you just plug in a new model and youre set.Originally Posted by Bulat Ziganshin
Currently, I dont do any table analysis. Is it really worth it? But all your suggestions are already on my todo list. Honestly, I work very rarely on compression stuff these days. CCM means sparetime and fun - when I look at that todo list its not fun anymore. So I just put it aside.Originally Posted by Bulat Ziganshin
Sorry, I first want to sort out the compiler-related speed anomalies on windows. Maybe the next version will get a linux compile again.Originally Posted by Jeroen
Christian, please make unix versions too. this means that all your users will be happy and ccm will be useful to transfer data between systems
Here is 1.30a which is fully compatible with 1.30. I just used different compiler switches. On my system both versions are equally fast, but on Stephan's Operton 1.30a is ~10% faster. So I recommend the new version.
If anyone has some time to spare feel free to compare both versions for speed and post your results (speed, system, # of runs).
download CCM 1.30a
About the linux version - I'll put something together in the next days.
Thanks Chris!
Mirror: Download
Have made a will the version 1.30a and on my Intel Core duo 2 E6600 has the same speed around of the precedent! however always stays "Monster of compression!"
Surprisingly CCM 1.30a didn't perform much differently. Out of curiosity, what compiler switches are you using?
Test System: AMD Athlon 64 3000+ (Venice, single core) with 2GIG Corsair DDR CL2 RAM
ccm130->ccm.exe 3 enwik8 (22351098 Bytes) - 1 m 24 s -
ccm130->ccm.exe d enwik8 - 1 m 25 s -
ccm130a->ccm.exe 3 enwik8 (22351098 Bytes) - 1 m 24 s - No Change
ccm130a->ccm.exe d enwik8 - 1 m 25 s - No Change
-----------------
ccm130->ccmx.exe 3 enwik8 (21646059 Bytes) - 1 m 51 s -
ccm130->ccmx.exe d enwik8 - 1 m 56 s -
ccm130a->ccmx.exe 3 enwik8 (21646059 Bytes) - 1 m 51 s - No Change
ccm130a->ccmx.exe d enwik8 - 1 m 54 s - decompression 1.7% faster
The best compressor of the world! for me!
MONSTER OF COMPRESSION
________________-Time comp-Time dec.- 306.189.145
1) CCMX 1.30a___ 343,695 344,236 -141.140.962
2) CCM 1.30a____ 324,527 323,926 -141.572.117
3) LPAQ8________ 787,837 794,032 -141.591.047
Thanks Francesco and Hahobas.
Im using these switches: "-fomit-frame-pointer -fexpensive-optimizations -O3 -mmmx -s"Originally Posted by Hahobas
Before, I used "-march=pentiumpro" in addition to that. It turned out, that without it, CCM is 500 seconds faster on Stephans system (5000 vs 5500 seconds). For you and me it obviously doesnt make a difference. Any ideas?
What version of GCC are you using?
Hi Christian. Fantastic new release. Thanks that you give us your skill without fun while working one it.
It?s absolutly more then a good ratio improvement. The bigger file size is like nothing. Keep up the way you go in developing a superb fast CM. Please don?t improve compression with loss of speed (other direction ins fine :-P).
Last time I checked, it was 3.4.2.Originally Posted by Hahobas
Thanks! Substancial compression improvements are unlikely to happen until I do a complete rewrite. Regarding further speed improvements - if I simplify the algorithm even further therell only be NOOP-methods left.Originally Posted by Simon Berger
![]()
For Christian "King of Compression"
For more speed (+10-20%)
download : gcc-4.1.2-mingw installer
download :code::Blocks
Thanks a lot for the new version, Christian!
It depends... I dont have any experience myself, but tests on one media codec packs forum have shown, that GCC 4 binaries are 1-2% slower (and smaller) than those produced by GCC 3.Originally Posted by Nania Francesco Antonio
![]()
all tests run with default settings
no time information (sorry)
redline.tar - 363 MB (381,501,952 bytes)
redline.tar.ccm126 - 190 MB (199,526,883 bytes)
redline.tar.ccmx126 - 190 MB (199,344,496 bytes)
redline.tar.ccm130 - 190 MB (199,316,296 bytes)
redline.tar.ccmx130 - 189 MB (199,131,526 bytes)
redline.tar.ccm130a - 190 MB (199,316,296 bytes)
redline.tar.ccmx130a - 189 MB (199,131,526 bytes)
So i se a compression efficiency gain in the 1.30 version
System
Core2 Quade Q6600 OC to 3.2ghz (8x400)
4x1 GB cas4 800mhz memory (1:1)
Win Vista 64bit ultima
ccmx 7 compression is slightly worse on enwik8, slightly better on enwik9. Speed is better.
http://cs.fit.edu/~mmahoney/compression/text.html#1741
There's really no one answer as to which GCC version is the best for performance. If the source code were available I'd be happy to mess around with different compiler options.
I'm no expert, but, I thought I'd do some messing with switches for Matt's fpaqc.cpp for kicks and share the result.
I started out with your switches you use for CCM. I was able to get about 27% better performance using gcc 4.1.2-33 instead of gcc 3.4. So I would definitely try gcc 4.1.2 or newer with CCM.
Next, I tried fiddling with the switches to get even more performance out of fpaqc. By adding the following switches I was able to squeeze about 8% more speed out of it.
-floop-optimize2 -mtune=prescott
As I understand it, using mtune still maintains backwards compatibility with older processors, but just optimizes for a particular platform. I also found it strange that the -mtune=prescott (A flavor of Pentium 4's) gave me slightly better performance than -mtune=athlon64 (which is what my processor is).
Later, for curiosity sake, I tried 64bit compiles and was able to get an additional 8% performance. So if you ever feel generous, you could provide us with 64bit binaries.
But again, this was not with CCM, so who knows what the best all around compiler options for everyone are.
This is quite a surprising discovery by toffer. Is there any code in CCM that could be suffering from gccs poor mmx/sse support? I suppose one way easy way to find out is to compile it with intels compiler and compare.Originally Posted by toffer
Hi again!
Sorry for the late reply. So, finally I do have some fresh linux binaries for you. I included binaries produced with GCC 3.4.6 and 4.1.2.
CCM 1.30a (win32+linux32)
Im quite sure, that the ones created with 4.1.2 are slower, but now you can try. There wont be any intel binaries. I do not want to spend my time on compiler related issues/trials.
The windows binaries are untouched. GCC 4.12 is much slower for me - at least with CCM.
I tried code::blocks - its really nice. Thanks for the tip.Originally Posted by Nania Francesco Antonio
Frankly, Im sure that GCC is messing up some code. But there is not any code like the one posted. Anyway, I rather prefer looking into changes/improvements to the algorithm than tweaking code.Originally Posted by Hahobas
I wonder how you do weighting without a dot product?! (or some vector math)