15th August 2007, 11:42
15th August 2007, 13:02
Thank you Ilia
15th August 2007, 14:04
15th August 2007, 15:14
Fantastic! Thanks Ilia!
15th August 2007, 16:12
This version achieves about the same compression on ENWIKs as TC 5.1 dev 7, which uses my old variant of ROLZ and <u>an order-2 CM encoder</u> for ROLZ-output coding!
16th August 2007, 01:26
Thanks! Has been tested
__version______size_________in___out _ speed (kB/s)
LZPM 0.06 __13 696 668___1 910 / 9 548
LZPM 0.07 __13 581 851___1 053 / 9 866
LZPM 0.08 __13 409 100____958 / 9 866
LZPM 0.09 __13 400 783___1 028 / 9 866
Both faster and stronger!
17th August 2007, 04:01
17th August 2007, 10:28
18th August 2007, 23:03
Another compression improvement. New parsing scheme is more balanced and advanced than older one. I think it's crazy, taking into account how much conditions it checks. Still here is no actual cost function which calculates real cost of each choice, however, here I use some stuff to emulate cost function to skew output statistics and this one is far more advanced than used with QUAD.
OK, some testing results with LZPM 0.10:
world95.txt: 578,813 bytes
bible.txt: 910,800 bytes
ENWIK8: 27,904,249 bytes
ENWIK9: 242,350,961 bytes
3200.txt: 4,873,698 bytes
PariahInterface.utx: 5,523,325 bytes
18th August 2007, 23:22
Have done a very good job with LZPM!
18th August 2007, 23:27
With latest versions just fight for each byte!
19th August 2007, 14:00
Brief description of current parsing
The baseline flexible parsing is optimal in terms of number of matches - much like backwards parsing with QUANTUM. However, it's not optimal with adaptive encodings. In fact, in some cases it's cheaper to drop a single literal or a shorter match but with smaller distance etc. Anyway, the modern optimal parsing (dynamic programming) is not about the optimal number of matches but the optimal cost of each choice. With dynamic programming, we try to find the optimal choice for each position in a given block (ROLZ2 uses 4 KB blocks), the cheapest one. After that, we go backwards collecting units - this is the optimal. Currently, I don't know how this can fit into my LZPM. Anyway, with the latest versions I emulate the same things. As you know, the basic idea behind FP is:
current_match + followed_match
for all variants of cutdown:
cutdowned_current_match + followed_after_cutdowned
If we find a greater sum, we cutdown current match in favor to get higher compression.
WIth LZPM, instead of directly compare bare match lengths, I made a special cost function, which returns a "score" or value of each choice, based on:
is current unit a literal (literals has a high priority)
so new scheme looks like this:
getvalue(current_match, current_index) + getvalue(followed_match, followed_index)
for all variants of cutdown:
getvalue(cutdowned_match, cutdowned_match_index) + getvalue(followed_match1, followed_index1)
Currently, I'm tunning getvalue() to achieve better results.
19th August 2007, 22:59
New results, LZPM 0.10:
world95.txt: 576,824 bytes
bible.txt: 906,818 bytes
ENWIK8: 27,829,513 bytes
ENWIK9: 241,694,333 bytes
3200.txt: 4,862,377 bytes
21st August 2007, 16:34
Well, here is an additional parameter which applied to ROLZ only, so called "the table size". You can find a reference in RK's README. Anyway, check out the results with LZPM, the vanilla parsing scheme is the same with all modes, the decompression speed is about the same with all modes, although decoder needs some extra memory when dealing with a larger table. The disadvantage of a larger table is drastically affected compression speed - the compression time becomes 1.5-2X longer each time the table size doubles. Note that the memory usage at compression is the same with all modes.
Results of LZPM 0.10 with ENWIK8:
LZPM 0.10, 1K: 29,361,878 bytes
LZPM 0.10, 2K: 28,507,021 bytes
LZPM 0.10, 4K: 27,837,078 bytes (current)
LZPM 0.10, 8K: 27,360,382 bytes
LZPM 0.10, 16K: 27,083,501 bytes
LZPM 0.10, 32K: 26,960,879 bytes
LZPM 0.10, 64K: 26,914,552 bytes
Having said that starting with 16K the compression speed becomes very slow, even on my Core 2 Duo.
Probably, 8K is more than adequate.
Look at the results with ENWIK9, for comparison:
LZPM 0.10, 8K: 237,794,407 bytes
21st August 2007, 18:35
Originally Posted by encode
21st August 2007, 18:51
OK, some detailed results with ENWIK8:
program, table size: compressed size, compression time, memory needed for decompression
LZPM 0.10, 1K: 29,361,878 bytes, 29 sec, 17 MB
LZPM 0.10, 2K: 28,507,021 bytes, 52 sec, 18 MB
LZPM 0.10, 4K: 27,837,078 bytes, 104 sec, 20 MB
LZPM 0.10, 8K: 27,360,382 bytes, 215 sec, 24 MB
LZPM 0.10, 16K: 27,083,501 bytes, 423 sec, 32 MB
To get approx. timings for ENWIK9 multiply it by 10.
For Matt's machine, multiply it by 25!!!
215 * 25 = 5375 sec !!
423 * 25 = 10575 sec !!
Anyway, new LZPM 0.10 even with 4K already introduces many improvements, including some code optimizations - i.e. speed improvements along with compression gain!
21st August 2007, 19:17
it's great to provide many options for users - from very fast compression to super-tight, so i suggest to add it to the program's parameters
21st August 2007, 20:17
Well, the table size is hard coded - it's better for LZPM's performance. But the main option should be choosing parsing method. Let's look at some results with various parsing strategies:
ENWIK8, current LZPM 0.10:
Greedy parsing: 30,581,042 bytes, 20 sec
Lazy matching with 1-byte lookahead: 29,156,102 bytes, 48 sec
Lazy matching with 2-byte lookahead: 28,759,282 bytes, 60 sec
FP with 1-byte lookahead: 29,014,783 bytes, 45 sec
FP with 2-byte lookahead: 28,538,462 bytes, 58 sec
FP with full search: 27,832,683 bytes, 104 sec
Current Flexible Parsing is OK, I guess. Also it should be noted that simplified FP with just one or two byte lookahead is supreme to an advanced Lazy Matching in both terms: speed and compression (NOTE: Lazy Matching used in this test takes into account the distance, or index in terms of ROLZ, of each match and provides a higher compression compared to the bare scheme).
21st August 2007, 20:27
with maximum table size, compression improved by 3%. i think such mode will be interesting for users (decompression speed is the same?)
21st August 2007, 20:37
Yes, exactly the same, maybe even slightly faster (more matches = faster decompression). I just shocked about speed impact. IMHO, at some point LZPM with large table crosses the line. However, if I strictly reduce the hash chain lengths I can control speed over compression. Anyway, the larger table size gives improvement mainly on text and large files. On binary ones, like EXEs, there is no improvement, if at all...
Originally Posted by Bulat Ziganshin
21st August 2007, 21:16
LZPM 0.09: 579,933 bytes
LZPM 0.10, 4K: 576,674 bytes
LZPM 0.10, 8K: 574,887 bytes
LZPM 0.10, 16K: 573,687 bytes
LZPM 0.09: 913,315 bytes
LZPM 0.10, 4K: 906,667 bytes
LZPM 0.10, 8K: 899,516 bytes
LZPM 0.10, 16K: 895,462 bytes
LZPM 0.09: 4,898,392 bytes
LZPM 0.10, 4K: 4,861,777 bytes
LZPM 0.10, 8K: 4,752,334 byte
LZPM 0.10, 16K: 4,679,430 bytes
LZPM 0.09: 873,452 bytes
LZPM 0.10, 4K: 869,924 bytes
LZPM 0.10, 8K: 867,494 bytes
LZPM 0.10, 16K: 866,175 bytes
21st August 2007, 21:30
most of these files are hust too small. only 3200 has compressed size >1mb
21st August 2007, 21:37
world95.txt: 2,988,578 bytes
bible.txt: 4,047,392 bytes
3200.txt: 16,013,962 bytes
calgary.tar: 3,152,896 bytes
Later I will post results with larger files...
21st August 2007, 22:07
pak0.pak (183,997,730 bytes):
LZPM 0.10, 4K: 90,071,415 bytes, 91 sec
LZPM 0.10, 8K: 89,758,842 bytes, 138 sec
LZPM 0.10, 16K: 89,564,352 bytes, 208 sec
Redline.bgd (175,091,407 bytes):
LZPM 0.10, 4K: 81,712,031 bytes, 68 sec
LZPM 0.10, 8K: 81,528,811 bytes, 98 sec
LZPM 0.10, 16K: 81,473,612 bytes, 149 sec
As you can see, with non textual data, the speed penalty is no so huge.
22nd August 2007, 11:15
Just want to ask LovePimple. Look at the results, what do you think? Worth it?
Higher compression is good, I think that even with 16K LZPM can beat TurboLZ on ENWIK9. But I think LZPM may not cross the line regarding compression speed - it must stay usable for the most.
Having said, I will do some additional tests with reduced hash chains - maybe I can gain compression with no speed lost...
22nd August 2007, 11:58
Possibly! Im curious as to how it will perform on slightly older machines, such as Bulats Duron and my own Sempron.
Originally Posted by encode
22nd August 2007, 12:08
i swear tonever run lzpm in super-compression mode
22nd August 2007, 13:06
Some results with reduced hash chain length:
LZPM 0.10, 8k, chain len. 8: 28,749,962 bytes, 36 sec
LZPM 0.10, 8k, chain len. 16: 28,168,227 bytes, 52 sec
LZPM 0.10, 8k, chain len. 32: 27,785,277 bytes, 77 sec
LZPM 0.10, 8k, chain len. 64: 27,565,020 bytes, 108 sec
LZPM 0.10, 16k, chain len. 8: 28,692,673 bytes, 42 sec
LZPM 0.10, 16k, chain len. 16: 28,068,018 bytes, 67 sec
LZPM 0.10, 16k, chain len. 32: 27,634,668 bytes, 106 sec
LZPM 0.10, 16k, chain len. 64: 27,365,878 bytes, 161 sec
In addition, I have an idea about how to speed up the current scheme. If LZPM will have the same compression, but with higher compression speed it will be far more promising... The idea is to kick out all other LZ coders...
22nd August 2007, 13:12
Originally Posted by encode
22nd August 2007, 14:10
OK, I decided to share with various versions of LZPM (4K, 8K, 16K).
These are just DEBUG versions.
I PROHIBIT TO POST RESULTS OF THESE VERSIONS ON OFFICIAL BENCHMARKS (I.E. LARGE TEXT COMPRESSION, BLACK_FOX'S TEST AND ALL OTHERS) POST IT ONLY HERE!!!
Anyway, you can check out what I'm talking about!
Here is a pack with all there executables:
lzpm-dbg-pack.rar (54 KB)
Tags for this Thread