Page 1 of 2 12 LastLast
Results 1 to 30 of 38

Thread: lzpm 0.01 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Okay, here is the new compressor! Still in very early stage of development, but it needs to be tested!

    Enjoy!

    Link:
    lzpm001.zip (26 KB)


  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Good compressor! Very Faster! Good work! Hi!

  3. #3
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    On my testset - 13 780 972 bytes, compression speed 1 104kB/s, decompression 7 789kB/s As compared to QUAD on my files, it compresses nearly as fast and decompresses 3x faster!

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    MC SFC test:

    LZPM v0.01

    A10.jpg > 840,877
    AcroRd32.exe > 1,649,986
    english.dic > 1,020,553
    FlashMX.pdf > 3,777,554
    FP.LOG > 804,037
    MSO97.DLL > 2,022,912
    ohs.doc > 843,615
    rafale.bmp > 1,093,602
    vcfiu.hlp > 723,565
    world95.txt > 614,800

    Total = 13,391,501 bytes


    QUAD v1.12

    A10.jpg > 853,522
    AcroRd32.exe > 1,503,119
    english.dic > 915,432
    FlashMX.pdf > 3,831,098
    FP.LOG > 717,207
    MSO97.DLL > 1,913,758
    ohs.doc > 848,771
    rafale.bmp > 1,036,312
    vcfiu.hlp > 708,048
    world95.txt > 625,831

    Total = 12,953,098 bytes


    ENWIK8:

    LZPM v0.01

    Compressed Size = 29,154,666 bytes

    Compression Time = 00:03:34.484


    QUAD v1.12

    Compressed Size = 29,649,604 bytes

    Compression Time = 00:01:06.625

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    lzpm must be potentially slower than QUAD at compression. However, the decompression speed... ...which can be even faster - I have a few ideas and probably lzpm will have faster decompression than LZMA (and of course faster than ROLZ2).
    Now I'm working on two things:
    + Arithmetic encoder
    + LZ decoder
    So the main LZ encoder stays very draft - since currently I'm trying to get the fastest decompression.

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Yep, new LZPM is faster or very close to the LZMA!!!

    I just implement the same trick as LZMA make use - instead of keeping counters of 0/1, we keep just the probability of a next bit. As a result I get the super-small and super fast arithmetic compression!

    To be continued...

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The new model provides some interesting results. Overall, the compression becomes higher. But looks like in some cases we get slightly worse text compression.

    Okay, check out some results:

    A10.jpg:
    832,328 bytes [!]

    acrord32.exe
    1,634,409 bytes (Again, keep in mind, no filters used)

    world95.txt
    619,408 bytes

    Actually, the upcoming lzpm is just crazy!!!

  9. #9
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Good work, Ilia!

  10. #10
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Actually, the upcoming lzpm is just crazy!!!
    Crazy as in possible super-high compression ratio?

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by LovePimple
    Crazy as in possible super-high compression ratio?
    How fast decompression is!
    Plus, higher binary compression!

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Some timings:

    Decompression of pak0.pak (183,997,730 bytes)
    lzma 4.43: 20 sec
    lzpm 0.02: 22 sec
    mcomp v1.00 (rolz2): 27 sec
    lzpm 0.01: 28 sec
    quad 1.12: 37 sec

    As you can see, new lzpm 0.02 is faster than ROLZ2, QUAD and pretty close to the LZMA!

  13. #13
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode
    new lzpm 0.02 is faster than ROLZ2, QUAD and pretty close to the LZMA!
    Very powerful Increasing compression ratio will increase also the decompression speed, wont it?

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Black_Fox
    Very powerful Increasing compression ratio will increase also the decompression speed, wont it?
    Yepp, better LZ layer can lead some decompression speed boost. (more matches = faster decompression). However, LZPM 0.02, at least currently, has exactly the same LZ part as v0.01. Anyway, 0.02 overall has a higher compression and at the same time faster compresison/decompression speed. Its due to a heavily improved model and arithmetic encoder!

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Some note about Flexible Parsing with this engine - just tried this one. With hashing such algorithm is unefficient - compression speed becomes significantly slower and compression gain is tiny. I also tried do the Flexible Parsing with this engine with brute-force approach - this one gives a nice compression gain, especially with text files. However, like I said, with brute-force approach compression becomes 100X [!] and more time slower!
    It's unpractical!
    Probably in next versions I'll enhance current Lazy Matching, say I will try two bytes lookahead instead of just one.

    Continue digging...

  16. #16
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode
    I also tried do the Flexible Parsing with this engine with brute-force approach - this one gives a nice compression gain, especially with text files. However, like I said, with brute-force approach compression becomes 100X [!] and more time slower!
    Could be something like --ultra-brute mode from UPX

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Black_Fox
    Could be something like --ultra-brute mode from UPX
    Well, at least it is possible. An LZ engine with PAQ8 speed. However, the decompresison speed stays untouched! If just PAQ8 had MUCH faster decompresison...
    Anyway, currently I think its not worth it. Some people say that QUADs "-x" mode is also not really worth the compression time wasted.

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    cabarc and lzma uses binary trees when doing optimal parsing

    btw, i'm working on fast lz search engine. this may make practical optimal/flexible parsing without using bintrees. actually, idea is the same as in quad - use small array of matches for each hash index and cache a few symbols from match in hash in order to make searching faster

    btw, where i can read about flexible parsing? i don't understood it from your sources

  19. #19
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Well, at least it is possible. An LZ engine with PAQ8 speed
    Awesome!

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Bulat Ziganshin
    btw, where i can read about flexible parsing? i dont understood it from your sources
    Some info on FP:
    http://arturocampos.com/ac_flexible_parsing.html


    Quote Originally Posted by Bulat Ziganshin
    cabarc and lzma uses binary trees when doing optimal parsing
    Actually, I has a very small understanding about binary trees. Probably I should carefully look at LZMA source. Can you explain some basics or provide points about these trees in human readable form?

  21. #21
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Binary Trees are very easy to understand have a look at http://en.wikipedia.org/wiki/Binary_tree and to the links on this page.

    B Trees are also a good variant of BTs, which you should know.

    Have a nice day =P)

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    well. are you understand idea of hash chains and in particular why their memory requirements are 4*dictionary size?

  23. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    thometal
    for unserstanding BT used for searching LZ matches, knowing usual BT is a prerequisite, but not the whole picture

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Bulat Ziganshin
    well. are you understand idea of hash chains and in particular why their memory requirements are 4*dictionary size?
    I do. The idea of hashing and hash chains is quite simple. From LZSS.C by Haruhiko Okumura I get just the slight understanding whats going on.
    From LZMA docs Ive found that Binary Trees and Patricia Trees are also uses hashing. Do I use already Binary Tree with LZPM? Since for each hash value I keep many "elements".

  25. #25
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    well, why memory reqs are exaactly 4*ddictsize?

    i ask because it doesn't seem that you understand this important moment

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    prev[], head[] &co.

    Bulat, better explain!

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    But actually, I beleive that we can change the HASHSIZE - and control the memory usage...

  28. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    so, let's start from hash chain

    in 80's, lz searching was slow and required large amounts of memory. either classical binary trees (proposed by Bell in 84) or patricia trees (as in ar002) was used. their memory requirements was so high that 8-16kb dictionary was a maximum and programs was slow, so lzw compressors was quite popular

    hash chains, invented by R. Jung, dramatically decreased both memory requirements and compression time. with hc, there are two arrays - head[] and next[]. head[] may have arbitrary size, it's determined only by amount of bits we keep in hash value. it points to the last string with given hash value. next[] have exactly one word for each byte in dictionary and allow to jump to previous string having the same hash as its index. i.e.

    hash = hash(p) // where p is current pointer
    match = head[hash]
    while match do
    memcmp(match,p)
    match = next[match]

    by using this technique with ar002 compression engine, arj outperformed all other compressors in early 90's. then it was accepted by pkzip and open-source zip; and zip sources becomes starting point for many other projects. using this technique allowed to raise dictsize up to 64kb in mid-90s

    rar 2.0 was among first compressors used even larger, 1mb dictionary. it continued to use HC but it turns to be too slow for such large dictionary - rar2 speed was about 50-100 kb/s on today's cpus. the reason is that with larger dict we have much more strings in each hash chain, so using hc with 1mb dictionary is probably 16x slower that for 64kb one

    of course, things may be even worser if someone ever tried to use it for optimal parsing nevertheless, cabarc published approx. in 97, worked faster that rar2 while having larger dictionary and optimal parsing with *exhaustive* match searching....

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Thanks for info!

    So, with 16 MB dictionary a single next[] needs 4X buffer size!

    The scheme I make use:

    Hash lookup:

    hashtable[HASHSIZE][NUMNODES];

    i.e. - we hash only first N bytes (say MINMATCH), and we have NUMNODES (say 256) different positions to check.

    At each step we update our hashtable as a list. The oldest hashes will be removed from the table.

    With this approach we can easily control the memory usage...

  30. #30
    Member
    Join Date
    Mar 2007
    Posts
    34
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode
    From LZMA docs Ive found that Binary Trees and Patricia Trees are also uses hashing.
    Usual binary search trees do not use hashing, but hashing may be used to address a certain binary tree. Using multiple binary trees is reasonable in order to keep the trees small and to reduce the efforts for tree restructuring.

    Quote Originally Posted by encode
    Do I use already Binary Tree with LZPM? Since for each hash value I keep many "elements".
    According to your previous descriptions: no, you dont.

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •