Enjoy new release now!
Exactly!Originally Posted by encode
A10.jpg > 832,221
AcroRd32.exe > 1,481,357
english.dic > 955,049
FlashMX.pdf > 3,760,791
FP.LOG > 643,043
MSO97.DLL > 1,892,075
ohs.doc > 830,167
rafale.bmp > 1,067,373
vcfiu.hlp > 691,677
world95.txt > 584,426
Total = 12,738,179 bytes
On my machine LZPM compression times were too slow to keep retesting.
First run (uncached) for LZPM v0.08 compressed FP.log to 643,043 bytes in 000:00:13:16.542 (796.542 Seconds).
First run (uncached) for LPAQ1 (7) compressed FP.log to 402,796 bytes in 000:00:02:47.986 (167.986 Seconds).
First run (uncached) for CCMx v1.23 (c 5) compressed FP.log to 437,856 bytes in 43.322 Seconds.
First run (uncached) for QUAD v1.12 (-x) compressed FP.log to 619,701 bytes in 22.399 Seconds.
First run (uncached) for QUAD v1.12 compressed FP.log to 717,207 bytes in just 3.379 Seconds.
fp.log represents a corner case. This file contains a lots of long matches and Flexible Parsing tries all variants within a match to find the best choice.
For comparison, on my machine LZPM compresses fp.log within 65 seconds.
The catch in decompression. LZPM decompresses fp.log in less than a half second.
If decompression was the same as compression of course I never allow such speeds. But getting extra compression with no decompression speed loss is good idea. In practice, you compress a file just one time and decompress it many, many times.
Clearly that is a very reasonable compression time.Originally Posted by encode
Perhaps someone can explain why LZPM compression times are more than 12x faster on your machine than they are on my Sempron 2400+?
First times - compression, second - decompression.
Process Time = 122.203 = 00:02:02.203 = 99%
Global Time = 122.328 = 00:02:02.328 = 100%
Process Time = 1.921 = 00:00:01.921 = 75%
Global Time = 2.546 = 00:00:02.546 = 100%
Process Time = 17.375 = 00:00:17.375 = 166%
Global Time = 10.437 = 00:00:10.437 = 100%
Process Time = 1.593 = 00:00:01.593 = 82%
Global Time = 1.922 = 00:00:01.922 = 100%
Process Time = 33.296 = 00:00:33.296 = 99%
Global Time = 33.359 = 00:00:33.359 = 100%
Process Time = 0.546 = 00:00:00.546 = 63%
Global Time = 0.859 = 00:00:00.859 = 100%
Microsoft Office Word 11.0.6568
Process Time = 12.734 = 00:00:12.734 = 100%
Global Time = 12.719 = 00:00:12.719 = 100%
Process Time = 1.687 = 00:00:01.687 = 89
Global Time = 1.891 = 00:00:01.891 = 100
Process Time = 11.109 = 00:00:11.109 = 175%
Global Time = 6.328 = 00:00:06.328 = 100%
Process Time = 0.609 = 00:00:00.609 = 99%
Global Time = 0.610 = 00:00:00.610 = 100%
Process Time = 18.328 = 00:00:18.328 = 100%
Global Time = 18.328 = 00:00:18.328 = 100%
Process Time = 0.265 = 00:00:00.265 = 106%
Global Time = 0.250 = 00:00:00.250 = 100%
The key not is only in a faster CPU (Intel Core 2 Duo 2.40 GHz) but also in a faster memory - 2 GB DDR2 @ 800 MHz is a self explanatory.Originally Posted by LovePimple
Small L1 cache causes much cache misses, thus CPU has to use slower L2 cache, or even RAM. IMHO
On dualcore Pentium D 945 @ 3.4 GHz with 2 x 2048 KBytes L2 cache and dualchannel DDR2 @ 266.7 MHz it took 284 seconds to compress fp.log and 0.6 second to decompress.
Core 2 Duo rules!
I think that modern games just force users to purchase a new hardware! And one of the reasons of purchasing my new PC was a F.E.A.R. game.
Only optimisations modern games get are optimisations for income...
according to: http://agner.org/optimize/microarchitecture.pdf core 2 has:
so l1 cache of core 2 duo is 2 times smaller than l1 cache of athlon or duron or sempron. but athlons l1 cache is only 2 way, and l2 cache 8 way. core 2 has much larger l2 cache than current athlons. core 2 has also much better cache logic, ie. it more wisely decides which cache lines should be kept and which should be overwritten.The level-1 data cache is 32 kB, dual port, 8 way, 64 byte line size. The level-1 code cache
has the same size as the data cache.
There is one level-1 cache for each core, while the level-2 cache and bus interface unit is
shared between the cores. The level-2 combined cache is 2 or 4 MB, 16 ways.
in short, core 2 is a completely different architecture from k8/ k9 and benchmark results are likely to be much different than on athlons (speed- wise).
k8 architecture put emphasis on memory read/ write speed while core 2 emphasizes on better use of l2 cache.
Thanks to everyone for attempting to explain. I really didnt think that the difference in hardware was enough to show the 12x plus speed difference. Maybe 5x or 6x I could understand, but not 12x plus!!!
It would be interesting if more people were to post their benchmark timings and hardware spec for direct comparison.
Why is the compression time so much slower than the 65 seconds of Ilias 2.4GHz Core 2 Duo machine?Originally Posted by nimdamsk
IMHO, the main catch in RAM. 266.7 MHz vs. 800 MHz. And again, Core 2 Duo is supreme to the Pentium D anyway.
Why faster memory can make the difference?
LZPM uses 256 MB to store hash chains:
128 MB for "HEAD"
128 MB for "PREV"
During searching, firstly we access the head to find first (latest) entry. After that, we traverse over the "PREV", finding previous occurrences of the current string. But here we traverse with a large back-jumps over the memory. The faster memory access for sure plays a huge role. I guess with LZPM faster memory is more important than a larger L2 cache or something since we must have a fast random access to the 256+16 MB of memory. Larger cache can benefit with decompression, since such things like literal and other models can exactly fit in this cache.
__version______size_________in___out _ speed
LZPM 0.06 __13 696 668___1 910 / 9 548
LZPM 0.07 __13 581 851___1 053 / 9 866
LZPM 0.08 __13 409 100____958 / 9 866 (kB/s)
Thanks for testing!
Looks like LZPM brings itself to a new level totally outperforming QUAD!
Thank you, Matt!
New trick with parsing leads more time penalty than I can expect. Also the CALL translator gets a few seconds at decompression, anyway the record is kept.
Maybe in next versions I will reduce the memory usage, due to slim down the HEAD structure.
Currently, I hash the 32-bit value (4 bytes) to the 24-bit hash.
I already tried 20...23 bit hashes. Generally speaking, compression stayed the same - sometimes speeding up sometimes slowing down the compression but just a little bit.
Looks like LZMA uses 20-bits for hash4 values.
If I change the hash size, LZPM will use:
20-bit hash: 128 + 8 MB + 16 MB = 152 MB
21-bit hash: 128 + 16 MB + 16 MB = 160 MB
22-bit hash: 128 + 32 MB + 16 MB = 176 MB (preferable)
23-bit hash: 128 + 64 MB + 16 MB = 208 MB
24-bit hash: 128 + 128 + 16 MB = 272 MB (current)
Probably, the N/4 is the optimal (22-bits).
20-bit hash: 28,302,346 bytes, 106 sec (152 MB mem use)
21-bit hash: 28,273,448 bytes, 107 sec (160 MB mem use)
22-bit hash: 28,265,901 bytes, 107 sec (176 MB mem use)
23-bit hash: 28,260,579 bytes, 108 sec (208 MB mem use)
24-bit hash: 28,259,984 bytes, 110 sec (272 MB mem use)
I expect that on other machines with smaller and/or slower memory, the benefit of a smaller hash can be greater.
A smaller hash can be used not for speed improvement but as for lower memory requirements. Like I said, with 22-bit hash LZPM will use 176 MB instead of 272, with just a tiny compression loss, preferable on large files.
For comparison, LZPM with 22-bit hash compresses ENWIK9 to 245,266,715 bytes.
Just one question: Worth it?
for N mb dictionary, it uses exactly 4N mb for prev and 4N mb for headOriginally Posted by encode
his memory really 533 MHz, DDR2 speeds starts from 400 MHz (which is just 100MHz of real speed multiplied by 4x acceleration)Originally Posted by encode
Note this not means that LZMA uses hash size of N.Originally Posted by Bulat Ziganshin
LZMA uses 2-3-4 bytes hashing.
hash - is the "head" in terms of deflate
son - is the "prev"
it stores offsets in hash in followed manner:
p->hash[kFix3HashSize + hash3Value] =
p->hash[kFix4HashSize + hashValue] = p->pos;
in other words the hash keeps offsets for 2, 3 and 4 byte hashes.
/* LzHash.h */
#define kHash2Size (1 << 10)
#define kHash3Size (1 << 16)
#define kHash4Size (1 << 20)
#define kFix3HashSize (kHash2Size)
#define kFix4HashSize (kHash2Size + kHash3Size)
#define kFix5HashSize (kHash2Size + kHash3Size + kHash4Size)
look at this accurately: lzma uses 5+4+3+2 hashing now and size of 5-byte hash isn't defined here. i've written about 4.43 version, which used 4-3-2 model
I just briefly looked at LZMA SDK 4.49...
Whos memory is really only 533 MHz?Originally Posted by Bulat Ziganshin
we said about nimdamsks oneOriginally Posted by LovePimple
Thanks Bulat!Originally Posted by Bulat Ziganshin
Played with some parsing ideas and further improved LZPM!
Some new results for LZPM 0.09:
ENWIK8: 27,986,111 bytes
ENWIK9: 242,929,442 bytes
world95.txt: 579,933 bytes
3200.txt: 4,898,392 bytes
book1: 267,448 bytes
bible.txt: 913,315 bytes
In addition, I made a few code optimizations.