http://cs.fit.edu/~mmahoney/compression/text.html#1796
Compression is worse, though.![]()
http://cs.fit.edu/~mmahoney/compression/text.html#1796
Compression is worse, though.![]()
Thanks Matt! I guess it's because the context model only allocated around 100mb - compared to around 320 for cmm3. Tomorrow i'll upload a version with switches for memory usage.
BTW: Did you have any good results while combining lzp (like paq9a) with context mixing? I tried several variants with high speed gains, but poor compression. So i decided to drop a lzp layer.
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
Another test...
Test machine: Intel PIII (Coppermine) @750 MHz, 512 MB RAM, Windows 2000 Pro SP4
Test File: ENWIK9 (1,000,000,000 bytes)
Timed with AcuTimer v1.2
ENWIK9 > 186,395,591 bytes
Elapsed Time: 01:47:57.718 (6477.718 Seconds)
Ok i managed to get it done today, not tomorrow (in fact it is tomorrow now). I'm going to bed, here's an updated version with some small tweaks:
http://freenet-homepage.de/toffer_86/cmm4.exe
When doing single file compression using a script or something, set the sliding window to the size of the largest file (if it isn't sized a few 100 mb). Maybe at most 128mb, since the pointers for the match model are replaced quickly (like in LZP). The most important thing is the context model's memory.
Please compare against the previous version using the switches: cmm4 43 input output
4 - 2^4 mb sliding window, 3 - 3*2^5 96 mb context model, in addition there are a few mb allocated for some tables, sse, etc...
Have fun & GN8![]()
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
Thanks
tested with 43 parameter:
ENWIK8: 21,384,445
ENWIK9: 185,530,891
Thanks toffer!![]()
Quick test...
Setting: 43
A10.jpg > 831,031
AcroRd32.exe > 1,333,257
english.dic > 472,348
FlashMX.pdf > 3,658,288
FP.LOG > 472,333
MSO97.DLL > 1,698,384
ohs.doc > 758,295
rafale.bmp > 746,153
vcfiu.hlp > 527,797
world95.txt > 473,466
Total = 10,971,352 bytes
Setting: 45
A10.jpg > 830,974
AcroRd32.exe > 1,332,505
english.dic > 472,135
FlashMX.pdf > 3,657,259
FP.LOG > 472,346
MSO97.DLL > 1,694,892
ohs.doc > 757,782
rafale.bmp > 745,756
vcfiu.hlp > 527,850
world95.txt > 472,950
Total = 10,964,449 bytes
Setting: 54
A10.jpg > 830,983
AcroRd32.exe > 1,332,812
english.dic > 472,154
FlashMX.pdf > 3,657,623
FP.LOG > 472,349
MSO97.DLL > 1,695,769
ohs.doc > 757,926
rafale.bmp > 745,803
vcfiu.hlp > 527,818
world95.txt > 473,094
Total = 10,966,331 bytes
Setting: 55
A10.jpg > 830,974
AcroRd32.exe > 1,332,505
english.dic > 472,135
FlashMX.pdf > 3,657,259
FP.LOG > 472,341
MSO97.DLL > 1,694,892
ohs.doc > 757,782
rafale.bmp > 745,756
vcfiu.hlp > 527,850
world95.txt > 472,950
Total = 10,964,444 bytes
Here's another quick release, i changed some SSE contexts. Speed is slightly hit and compression improved.
http://freenet-homepage.de/toffer_86/cmm4_080315_01a.7z
Could anyone try the options 7<maximum possible> on enwik9? I downloaded enwik9, but my machine only has 1gb of RAM.
And Nania, what about your MOC? I'm quiet sure, there'll be a visible improvement, especially when using more memory.
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
Thanks toffer!![]()
Quick test...
Setting: 43
A10.jpg > 830,969
AcroRd32.exe > 1,332,680
english.dic > 455,473
FlashMX.pdf > 3,657,793
FP.LOG > 456,548
MSO97.DLL > 1,698,323
ohs.doc > 757,037
rafale.bmp > 746,250
vcfiu.hlp > 528,184
world95.txt > 471,813
Total = 10,935,070 bytes
Setting: 45
A10.jpg > 830,898
AcroRd32.exe > 1,331,679
english.dic > 454,945
FlashMX.pdf > 3,656,703
FP.LOG > 456,506
MSO97.DLL > 1,694,552
ohs.doc > 756,439
rafale.bmp > 745,795
vcfiu.hlp > 528,164
world95.txt > 471,210
Total = 10,926,891 bytes
Setting: 54
A10.jpg > 830,918
AcroRd32.exe > 1,332,159
english.dic > 455,049
FlashMX.pdf > 3,657,080
FP.LOG > 456,518
MSO97.DLL > 1,695,568
ohs.doc > 756,617
rafale.bmp > 745,876
vcfiu.hlp > 528,166
world95.txt > 471,393
Total = 10,929,344 bytes
Setting: 55
A10.jpg > 830,898
AcroRd32.exe > 1,331,679
english.dic > 454,945
FlashMX.pdf > 3,656,703
FP.LOG > 456,493
MSO97.DLL > 1,694,552
ohs.doc > 756,439
rafale.bmp > 745,795
vcfiu.hlp > 528,164
world95.txt > 471,210
Total = 10,926,878 bytes
Intel Core duo 2 E6600
SFC Test
Option 11->11.020.842 comp.65,030 s. dec. 67,405 s.
Option 22->10.984.616 comp.61,893 s. dec. 67,831 s.
Option 33->10.971.419 comp.62,827 s. dec. 68,517 s.
Option 43->test failed (A10.jpg)
Could you upload the compressed and corruped a10.jpg? I just tested it on my machine, it works.
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
In what way did it fail?Originally Posted by Nania Francesco Antonio
The version that have made a will of CMM4 and the version 0.0 and not the last! the program of test that use is automatic and compares the equivalence (content and length of the files) an error has signalled me in the verification (only whith optin 43) of A10.jpg! the last CMM4 I don't succeed in unloading him/it from the site!
Try downloading from here...Originally Posted by Nania Francesco Antonio
http://rapidshare.com/files/99746159...15_01a.7z.html
With some severe disk thrashing, I managed to squeeze out one more test...
Setting: 66
A10.jpg > 830,894
AcroRd32.exe > 1,331,418
english.dic > 454,920
FlashMX.pdf > 3,656,513
FP.LOG > 456,499
MSO97.DLL > 1,694,242
ohs.doc > 756,351
rafale.bmp > 745,796
vcfiu.hlp > 528,131
world95.txt > 471,130
Total = 10,925,894 bytes
What a fantastic little compressor!![]()
toffer
Please make it so that the next version of CMM defaults to 43 if settings are omitted from the command line.
Setting the sliding window to 6 doesnt improve compression. The largest file is fp.log with about 20 mb. So 32 mb (option 5) is enough.Originally Posted by LovePimple
Ok, tomorrow ill fix this.Originally Posted by LovePimple
Hmm, on my computer it worked. I once had problems like this and the reason was uninitialized memory. On some platforms memory seems to be zeroed out always (calling new). Ill check it.Originally Posted by Nania Francesco Antonio
But v 0.1a worked?
Thanks for the tests! Please only test the newest version. Ill see if i can find any memory errors like the possible one mentioned above.
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
OK!Originally Posted by toffer
Thanks!Originally Posted by toffer
![]()
SFC Test
CMM4 last version
Option 43 -> 10.935.070 b. comp. 66,533 s. - dec. 72,111 s. (ok)
Option 55 -> 10.926.878 b. comp. 73,456 s. - dec. 78,500 s. (ok)
ENWIK8 Test
Ratio: 20958794/100000000 bytes (1.68 bpc)
Global Time (Timer) = 132.687 = 00:02:12.687
Test File: VALLEY.CMB (19,776,230 bytes)
Settings: 25
Compressed Size: 8,250,786 bytes
Settings: 35
Compressed Size: 8,246,163 bytes
Settings: 45
Compressed Size: 8,243,414 bytes
Settings: 55
Compressed Size: 8,243,231 bytes
MOC Test result:
145.935.796 comp. 644,345 s. dec. 684,143 (test ok)
Thanks again for testing. I haven't found any bugs concerning memory allocation/initialization, but i haven't encountered a bug either. I tested against lots of files chosen randomly from my program partition.
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
try test on small like 200 bytes file or big like <2gb, no success( "cmm4.exe has generated errors and will be closed by windows" at ~1.5gb point, options 11 and 47)
VALLEY.CMB
option:47 -> 8.212.941
toffer
I'm looking forward to the next release of your awesome compressor.![]()
hi toffer,
I wonder that you implement momentum term in your NN? Because, I've done several tests on momentum term which seems useful. The gain is about 20kb on calgary corpus. I believe this can be improvement by choosing right values. I choose learning rate as 15 and momentum term coefficient as 3. I've only done a few tests with different values. I will try a brute-force aproach on valley.cmb. But, before I have a really hard exam which includes advanced english questions on this Sunday. If I can't pass this exam, I won't be able to a study on mastering in the university![]()
BIT Archiver homepage: www.osmanturan.com
Could you upload such a file (the smaller ones, of course). I havent encountered any problems. Big files (>2 gb) wont work. This is a glibc issue, i might switch to fopen64, if needed, but at the moment this compressor is experimental and i dont see a reason to do so.Originally Posted by Zonder
@osmanturan
No, i havent done it yet. I Spend some time on tuning sse context while preserving speed. This will be a point for improvement in the future. Maybe one should add a bias neuron(s) too?
A learning rate of 15? This is far too much! You usually keep learning rates less than unity around 0.1 to .01
Dont use valley.cmb. If i remember correctly this file consists of several data types arranged sequentially. So your results will be biased by the last "data type" in the sequence. Use several different file types and select "round about" value.
Good luck for your exam!
Ill release a new version this weekend, got some improvements.
And, of course, thanks again for your feedback!
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
0.1a crash on ALL files smaller than ~5kb.
And I said that it crash on big files smaller than 2gb ("<2gb") .
Whatson log says "Exception number: c0000094 (divide by zero)"
http://rapidshare.com/files/101100280/drwtsn32.log.html ( it's for small 4kb file )
CMM crashed for Sportsman too:
http://www.encode.ru/forums/index.ph...page=1#msg9132