I found an interesting test set.
Open Office documents are zips with several files, mostly xmls. I wanted to know whether zip is a good tool for the task and results turned out to be quite interesting.
Data: Spreadsheet, archivers testing results with some charts. Original size - 3 185 340 B. Zipped by OO - 210*481 B. Very redundant...
Test set is small, so timing is not very accurate, especially with fast compressors.
I can't measure how fast does OO compress the file, I could use stopwath, but it would be very inaccurate. Also generating xmls increases saving time and I have no idea by how much. The closest thing I could do was to create a zip with 7z.
OO seems to use something close to 7z -mx7, but a bit weaker. It takes probably 0.8-1s.Code:Archiver Size Time 7z -tzip -mx=1 226850 0.187 7z -tzip -mx=3 226850 0.203 7z -tzip -mx=5 216749 0.640 7z -tzip -mx=7 208253 1.203 7z -tzip -mx=9 195038 5.437
How good can zip be? I tried kzip, it claims to generate zips smaller than PKZIP by 1-3%.
Very slow, but the size got down to 187122 B, 11% smaller than original, 4% smaller than 7zip. Very small.Code:Archiver Size Time kzip /s0 /b0 189783 42.516 kzip /s0 /b128 194807 53.640 kzip /s0 /b256 189461 57.406 kzip /s0 /b512 187122 50.406 kzip /s0 /b1024 187816 46.641 kzip /s1 /b0 190463 33.406 kzip /s1 /b128 195240 57.562 kzip /s1 /b256 189862 43.656 kzip /s1 /b512 187444 42.593 kzip /s1 /b1024 188183 38.032 kzip /s2 /b0 319485 0.688 kzip /s2 /b128 325280 1.937 kzip /s2 /b256 319849 1.593 kzip /s2 /b512 317122 1.281 kzip /s2 /b1024 317677 1.125 kzip /s3 /b0 1787918 0.578 kzip /s3 /b128 1763812 1.812 kzip /s3 /b256 1771326 1.468 kzip /s3 /b512 1779670 1.156 kzip /s3 /b1024 1782342 1.015
Now other compressors...The best results:
FastLZ needs just 0.015s, that's over 200 MB/s. IO is definitely cached by OS.Code:Archiver Size Time FastLZ opt -2 370534 0.015 FastLZ -2 365344 0.016 quick -0 311703 0.031 slug 198995 0.046 NanoZip -cd 186876 0.093 4x4 1t 123047 0.171 4x4 2t 122051 0.234 4x4 4t 121034 0.296 FreeArc -m4 -ms 117940 0.875 FreeArc -m5 -ms 116319 1.421 FreeArc -m5 115312 1.437 FreeArc -m7 115305 1.453 CCM 0 108787 1.578 CCM 1 108579 1.609 CCM 2 108481 1.735 CCM 3 108433 1.860 CCMX 0 106744 2.031 CCMX 1 106225 2.094 CCMX 2 105849 2.250 CCMX 3 105634 2.437 FreeArc -max -ms 95661 2.515 FreeArc -max 94654 2.531 FreeArc -max -ma- 86912 3.734 NanoZip -cc 84755 16.297 PAQ8p -1 83124 66.625 PAQ8p -2 81566 67.578 PAQ8p -3 81040 68.375 PAQ8p -4 51999 499.015 PAQ8p -5 50780 501.672 PAQ8p -6 50046 513.891 PAQ8p -7 49870 538.828
Slug makes it smaller than OO while being 15 times faster.
4x4 1t almost halves the OO result and is 5 times faster (!!!).
Then there's nothing really interesting until PAQ8p -3/4...I tested several times, there were no memory issues, -4 really takes that long. And decompresses it's output. I tried to investigate, it seems to get consistent gains on almost all files and is always equally slow.
But there's another thing. Let's calculate efficiency as maximumcompression.com does:
Ladies and gentlemen, welcome the new efficiency king, PAQ8p. Who cares that it gets 6 KB/s, what a great size! I wonder why didn't OO team choose to use it, maybe we should suggest it to them?Code:Archiver Size Time Efficiency(maximumcompression.com) FastLZ opt -2 370534 0.015 340654353845826000.0 FastLZ -2 365344 0.016 176627773533093000.0 quick -0 311703 0.031 197866418022820.0 slug 198995 0.046 46172317.6 NanoZip -cd 186876 0.093 17320813.6 4x4 1t 123047 0.171 4468.6 4x4 2t 122051 0.234 5324.4 4x4 4t 121034 0.296 5847.4 FreeArc -m4 -ms 117940 0.875 11243.8 FreeArc -m5 -ms 116319 1.421 14576.4 FreeArc -m5 115312 1.437 12815.3 FreeArc -m7 115305 1.453 12945.4 CCM 0 108787 1.578 5682.1 CCM 1 108579 1.609 5628.6 CCM 2 108481 1.735 5987.3 CCM 3 108433 1.860 6376.0 CCMX 0 106744 2.031 5505.4 CCMX 1 106225 2.094 5281.2 CCMX 2 105849 2.250 5385.7 CCMX 3 105634 2.437 5661.5 FreeArc -max -ms 95661 2.515 1460.9 FreeArc -max 94654 2.531 1278.2 FreeArc -max -ma- 86912 3.734 642.9 NanoZip -cc 84755 16.297 2079.1 PAQ8p -1 83124 66.625 6775.6 PAQ8p -2 81566 67.578 5534.4 PAQ8p -3 81040 68.375 5204.9 PAQ8p -4 51999 499.015 670.9 PAQ8p -5 50780 501.672 569.3 PAQ8p -6 50046 513.891 526.6 PAQ8p -7 49870 538.828 538.8
I know that some uses might be more sensitive to file size and less to speed than office documents, but that's just ridiculous.
I've been thinking about different measure for efficiency for some time and now it's the time to show my take on the topic.
1. Copying is usually a very viable method of archiving, much more than PAQ. And IMO this is what archivers should be compared to.
2. Extreme slowness = 0 usefulness = 0 score.
3. Use of minimal size is wrong. If I was looking for something under 0.1s on this test, I couldn't care less about PAQ scores. And the fact that I tested it chaged the ranking. Remove things slower than 0.15s and slug wins. Include them - NanoZip is better. You always have to recalculate everything to your time boundaries.
My proposal is: POWER(10;1/10)/((size/original_size)*LOG(size/original_size+1;2)*POWER(POWER(10;1/10);time/time_of_copying)) (A bit unreadable, but I won't learn tech to show it better).
The higher score the better. XCOPY gets 1. I call these which score at least as much "practical".
Results:
There's one more interesting thing.Code:Archiver Size Time Efficiency(proposed) Efficiency(maximumcompression.com) FastLZ opt -2 370534 0.015 66.52 340654353845826000.0 FastLZ -2 365344 0.016 68.26 176627773533093000.0 quick -0 311703 0.031 90.80 197866418022820.0 slug 198995 0.046 213.82 46172317.6 NanoZip -cd 186876 0.093 224.14 17320813.6 4x4 1t 123047 0.171 450.79 4468.6 4x4 2t 122051 0.234 413.32 5324.4 4x4 4t 121034 0.296 379.77 5847.4 FreeArc -m4 -ms 117940 0.875 155.30 11243.8 FreeArc -m5 -ms 116319 1.421 65.44 14576.4 FreeArc -m5 115312 1.437 64.86 12815.3 FreeArc -m7 115305 1.453 63.20 12945.4 CCM 0 108787 1.578 57.83 5682.1 CCM 1 108579 1.609 55.18 5628.6 CCM 2 108481 1.735 45.00 5987.3 CCM 3 108433 1.860 36.72 6376.0 CCMX 0 106744 2.031 28.66 5505.4 CCMX 1 106225 2.094 26.10 5281.2 CCMX 2 105849 2.250 20.38 5385.7 CCMX 3 105634 2.437 15.08 5661.5 FreeArc -max -ms 95661 2.515 16.16 1460.9 FreeArc -max 94654 2.531 16.08 1278.2 FreeArc -max -ma- 86912 3.734 2.67 642.9 NanoZip -cc 84755 16.297 0.00 2079.1 PAQ8p -1 83124 66.625 0.00 6775.6 PAQ8p -2 81566 67.578 0.00 5534.4 PAQ8p -3 81040 68.375 0.00 5204.9 PAQ8p -4 51999 499.015 0.00 670.9 PAQ8p -5 50780 501.672 0.00 569.3 PAQ8p -6 50046 513.891 0.00 526.6 PAQ8p -7 49870 538.828 0.00 538.8
PAQ9a is the first and the only PAQ that's practical. Congratulations, no other (L)PAQ tested even came close.Code:Archiver Size Time Efficiency PAQ9a 1 98727 3.140 5.47 PAQ9a 2 97583 3.062 6.36 PAQ9a 3 97310 3.094 6.07 PAQ9a 4 97137 3.187 5.23 PAQ9a 5 96795 3.359 6.36 PAQ9a 6 97112 3.625 6.07 PAQ9a 7 97527 4.063 5.23 PAQ9a 8 98465 4.953 3.98
That's because of LZP greatly reducing size for CM, right?
P.S.:
I write"Efficiency(maximumcompression.com)" because maximumcompression.com is the most popular site that uses this function, I don't know and don't care who's the founder.
EDIT:
I forgot to attach the results.![]()

).
Reply With Quote
Same as you, I was thinking in the way of comparing it somehow with the copying, but gave up trying to find the solution. I'll look into your formula deeper when I'll be able to put my hands on this stuff, thanks for your research 