Page 1 of 2 12 LastLast
Results 1 to 30 of 34

Thread: lzpm 0.07 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Okay, new version is here:
    http://www.encode.ru/lzpm/lzpm.htm

    Enjoy!


  2. #2
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Thanks!
    Decompression is very fast

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  4. #4
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    great! will be tested this weekend

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test..

    QUAD v1.12

    AcroRD32.exe
    Compressed Size: 1,503,119 bytes
    Compression Time: 4.460 Seconds
    Decompression Time: 2.973 Seconds

    rafale.bmp
    Compressed Size: 1,036,312 bytes
    Compression Time: 2.742 Seconds
    Decompression Time: 1.297 Seconds

    world95.txt
    Compressed Size: 625,831 bytes
    Compression Time: 1.741 Seconds
    Decompression Time: 0.739 Seconds



    LZPM v0.07

    AcroRD32.exe
    Compressed Size: 1,623,309 bytes
    Compression Time: 7.864 Seconds
    Decompression Time: 1.070 Seconds

    rafale.bmp
    Compressed Size: 1,070,492 bytes
    Compression Time: 38.093 Seconds
    Decompression Time: 0.700 Seconds

    world95.txt
    Compressed Size: 585,659 bytes
    Compression Time: 31.226 Seconds
    Decompression Time: 0.418 Seconds

    QUAD compresses much faster than LZPM on my machine.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    LZPM potentially must be slower than QUAD, since it's more LZ than anything else. For example, built-in PPM in QUAD plays a role - sometimes showing nice compression, sometimes reduced decompression speed.
    Some properties of current Hash Chains (data type - avg speed):
    Incompressible data - fastest
    Binary data - fast
    Text data - slowest

    On text files, match finder based on hash chains generates a long hash chains - LZPM checks them all. Apparently, with a binary data hash chains are shorter and with compressed/random data chains are shortest.


  7. #7
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    The compression is very good but slow! excellent naturally in decompression!

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The beauty of any LZ-based compressor is that user or programmer can choose a different parsing strategies in favor to control speed over compression.

    For example, with LZPM I already tried:
    1. Greedy parsing - fastest
    2. Lazy matching with one byte lookahead - fast
    3. Lazy matching with two bytes lookahead - fast/normal
    4. Flexible parsing - currently slowest, but providing the best results with ROLZ scheme, at least to my current knowledge.

    I think that current LZPM is not too slow. I simply compared it to the ROLZ2 from mcomp.exe and LZMA from LZMA SDK. The speed is acceptable compared also to many other modern LZ-based compressors like CABARC. Looking at LZPM 0.06 results at MFC I thought that I can add something to improve compression at the cost of speed. Probably, lazy matching more efficient in terms of compression speed vs. ratio, but Flexible Parsing moved LZPM to a new stage, especially if it deals with text files.


  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    0.07 moves up 3 spots on enwik9 but is 2.5 times slower for compression and uses 3.5x more memory for compression. Decompression still uses only 20 MB and is just as fast.
    http://cs.fit.edu/~mmahoney/compression/text.html# 2464

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Thank you for testing!
    It also nice to see there is no compressor that compresses better with faster decompression.

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by LovePimple
    Quick test..
    Compression timings for LZPM 0.07 on my machine:

    acrord32.exe:
    Kernel Time = 0.093 = 00:00:00.093 = 5%
    User Time = 1.531 = 00:00:01.531 = 93%
    Process Time = 1.625 = 00:00:01.625 = 99%
    Global Time = 1.641 = 00:00:01.641 = 100%

    rafale.bmp:
    Kernel Time = 0.187 = 00:00:00.187 = 2%
    User Time = 7.343 = 00:00:07.343 = 97%
    Process Time = 7.531 = 00:00:07.531 = 100%
    Global Time = 7.516 = 00:00:07.516 = 100%

    world95.txt:
    Kernel Time = 0.062 = 00:00:00.062 = 2%
    User Time = 2.796 = 00:00:02.796 = 97%
    Process Time = 2.859 = 00:00:02.859 = 99%
    Global Time = 2.860 = 00:00:02.860 = 100%


  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    It's funny, but I again improved compression! Okay, some testing resilts for LZPM 0.08:

    world95.txt: 584,426 bytes
    fp.log: 643,043 bytes

    ENWIK8: 28,259,984 bytes
    ENWIK9: 245,221,254 bytes


  13. #13
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Little test on Pentium D 820:

    nero.exe 36,003,840
    Nero Burning ROM 7, 5, 7, 0

    quad
    7,572,146

    quad -x
    7,407,461

    lzpm
    7,448,386

    cabarc lzx:21
    7,578,441

    quad compression
    Process Time = 6.640 = 00:00:06.640 = 100%
    Global Time = 6.625 = 00:00:06.625 = 100%

    quad decompression
    Process Time = 3.656 = 00:00:03.656 = 84%
    Global Time = 4.328 = 00:00:04.328 = 100%

    quad -x compression
    Process Time = 15.375 = 00:00:15.375 = 99%
    Global Time = 15.391 = 00:00:15.391 = 100%

    quad -x decompression
    Process Time = 3.671 = 00:00:03.671 = 86%
    Global Time = 4.250 = 00:00:04.250 = 100%

    lzpm compression
    Process Time = 8.937 = 00:00:08.937 = 100%
    Global Time = 8.922 = 00:00:08.922 = 100%

    lzpm decompression
    Process Time = 1.953 = 00:00:01.953 = 77%
    Global Time = 2.531 = 00:00:02.531 = 100%

    cabarc lzx:21 compression
    Process Time = 33.296 = 00:00:33.296 = 99%
    Global Time = 33.359 = 00:00:33.359 = 100%

    cabarc decompression
    Process Time = 0.546 = 00:00:00.546 = 63%
    Global Time = 0.859 = 00:00:00.859 = 100%

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Note that CABARC has E8 transformer, QUAD has E8E9 transformer. LZPM uses a bare ROLZ algorithm. This fact for sure changes the real picture in this test.


  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Also just briefly compared LZPM 0.07 and 0.08.

    nero.exe (nero 6) (13,983,802 bytes)

    LZPM 0.07: 3,471,743 bytes
    LZPM 0.08: 3,464,135 bytes


  16. #16
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by nimdamsk
    lzpm
    7,448,386
    cabarc lzx:21
    7,578,441
    Impressive result from lzpm!

    Are your filters for PIM effectively compatible with LZPM too?

  17. #17
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Some more tests
    First times - compression, second - decompression.

    lzpm01
    7,579,372
    Process Time = 17.609 = 00:00:17.609 = 100%
    Global Time = 17.593 = 00:00:17.593 = 100%

    Process Time = 2.578 = 00:00:02.578 = 81%
    Global Time = 3.157 = 00:00:03.157 = 100%


    lzpm02
    7,447,288
    Process Time = 17.250 = 00:00:17.250 = 99%
    Global Time = 17.266 = 00:00:17.266 = 100%

    Process Time = 2.156 = 00:00:02.156 = 77%
    Global Time = 2.797 = 00:00:02.797 = 100%


    lzpm03
    7,449,582
    Process Time = 7.437 = 00:00:07.437 = 99%
    Global Time = 7.453 = 00:00:07.453 = 100%

    Process Time = 2.109 = 00:00:02.109 = 77%
    Global Time = 2.719 = 00:00:02.719 = 100%


    lzpm04
    7,463,407
    Process Time = 7.640 = 00:00:07.640 = 99%
    Global Time = 7.656 = 00:00:07.656 = 100%

    Process Time = 2.078 = 00:00:02.078 = 76%
    Global Time = 2.704 = 00:00:02.704 = 100%


    lzpm05
    7,444,864
    Process Time = 9.046 = 00:00:09.046 = 100%
    Global Time = 9.031 = 00:00:09.031 = 100%

    Process Time = 1.984 = 00:00:01.984 = 76%
    Global Time = 2.609 = 00:00:02.609 = 100%


    lzpm06
    7,448,386
    Process Time = 8.703 = 00:00:08.703 = 100%
    Global Time = 8.687 = 00:00:08.687 = 100%

    Process Time = 2.062 = 00:00:02.062 = 77%
    Global Time = 2.672 = 00:00:02.672 = 100%

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Black_Fox
    Are your filters for PIM effectively compatible with LZPM too?
    Yes of course. At least I can add the EXE-filter, because many users already tried to test the LZPM exactly on their EXE files.

    Also I tested LZPM with the delta/multimedia filters. Unfortunately, in this case PPM-based algorithms do much better. Anyway, compression with such filters is better. One problem is MM detection. PIM archiver has a special large module called "detector". It determines file types, reads their headers, and properly chooses/configures MM filters. I just wont add such thing to the LZPM, because the size of source of this module is larger than current LZPMs source!

    To taste such thing like E8/E9 transformer consider followed digits:

    acrord32.exe:
    LZPM 0.08: 1,619,211 bytes
    LZPM 0.08+EXEFLT: 1,464,231 bytes

    mso97.dll:
    LZPM 0.08: 1,998,513 bytes
    LZPM 0.08+EXEFLT: 1,895,526 bytes

    Photoshop.exe:
    LZPM 0.08: 7,332,764 bytes
    LZPM 0.08+EXEFLT: 6,286,536 bytes

    Doom3.exe:
    LZPM 0.08: 1,860,838 bytes
    LZPM 0.08+EXEFLT: 1,735,073 bytes

    MPTRACK.EXE:
    LZPM 0.08: 529,540 bytes
    LZPM 0.08+EXEFLT: 507,765 bytes

    Reaktor.exe:
    LZPM 0.08: 2,218,921 bytes
    LZPM 0.08+EXEFLT: 2,082,418 bytes

    nero.exe:
    LZPM 0.08: 3,464,135 bytes
    LZPM 0.08+EXEFLT: 3,193,284 bytes


  19. #19
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Can't stop

    Some 3 Mpx photo from nature

    IMG_0862.bmp
    9,437,238

    lzpm06.exe
    8,080,899
    Process Time = 3.671 = 00:00:03.671 = 99%
    Global Time = 3.672 = 00:00:03.672 = 100%

    Process Time = 2.203 = 00:00:02.203 = 93%
    Global Time = 2.359 = 00:00:02.359 = 100%


    lzpm07.exe
    8,080,252
    Process Time = 3.546 = 00:00:03.546 = 99%
    Global Time = 3.547 = 00:00:03.547 = 100%

    Process Time = 1.953 = 00:00:01.953 = 92%
    Global Time = 2.109 = 00:00:02.109 = 100%


    cabarc lzx:21
    7,781,731
    Process Time = 13.750 = 00:00:13.750 = 99%
    Global Time = 13.765 = 00:00:13.765 = 100%

    Process Time = 0.234 = 00:00:00.234 = 107%
    Global Time = 0.219 = 00:00:00.219 = 100%






    FFMPEG-devel maillist archive

    2007-March.txt
    4,898,478

    lzpm06.exe
    709,683
    Process Time = 1.515 = 00:00:01.515 = 101%
    Global Time = 1.500 = 00:00:01.500 = 100%

    Process Time = 0.250 = 00:00:00.250 = 84%
    Global Time = 0.297 = 00:00:00.297 = 100%


    lzpm07.exe
    686,082
    Process Time = 5.531 = 00:00:05.531 = 100%
    Global Time = 5.516 = 00:00:05.516 = 100%

    Process Time = 0.218 = 00:00:00.218 = 77%
    Global Time = 0.281 = 00:00:00.281 = 100%


    cabarc lzx:21
    706,263
    Process Time = 5.843 = 00:00:05.843 = 100%
    Global Time = 5.829 = 00:00:05.829 = 100%

    Process Time = 0.062 = 00:00:00.062 = 132%
    Global Time = 0.047 = 00:00:00.047 = 100%





    Registry hive file

    software
    26,574,848

    lzpm06.exe
    4,821,118
    Process Time = 15.828 = 00:00:15.828 = 100%
    Global Time = 15.828 = 00:00:15.828 = 100%

    Process Time = 1.484 = 00:00:01.484 = 77%
    Global Time = 1.906 = 00:00:01.906 = 100%


    lzpm07.exe
    4,744,439
    Process Time = 63.734 = 00:01:03.734 = 99%
    Global Time = 63.782 = 00:01:03.782 = 100%

    Process Time = 1.234 = 00:00:01.234 = 71%
    Global Time = 1.734 = 00:00:01.734 = 100%


    cabarc lzx:21
    4,460,355
    Process Time = 34.796 = 00:00:34.796 = 99%
    Global Time = 34.859 = 00:00:34.859 = 100%

    Process Time = 0.406 = 00:00:00.406 = 43%
    Global Time = 0.938 = 00:00:00.938 = 100%






    7-Zip sources

    7z444.tar
    5,368,832

    lzpm06.exe
    611,012
    Process Time = 5.343 = 00:00:05.343 = 99%
    Global Time = 5.360 = 00:00:05.360 = 100%

    Process Time = 0.218 = 00:00:00.218 = 77%
    Global Time = 0.281 = 00:00:00.281 = 100%


    lzpm07.exe
    573,910
    Process Time = 93.812 = 00:01:33.812 = 99%
    Global Time = 93.875 = 00:01:33.875 = 100%

    Process Time = 0.203 = 00:00:00.203 = 72%
    Global Time = 0.282 = 00:00:00.282 = 100%


    cabarc lzx:21
    614,718
    Process Time = 6.078 = 00:00:06.078 = 100%
    Global Time = 6.078 = 00:00:06.078 = 100%

    Process Time = 0.062 = 00:00:00.062 = 132%
    Global Time = 0.047 = 00:00:00.047 = 100%

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I think I should add an exe transformer to the next release (0.0 of the LZPM.

    If you're for or against such addition, let me know.

    Why just E8? I made a few tests, E8/E9 can provide a little bit higher compression on some EXE files, but on non-executable data it hurts compression a little bit and more than just E8. Also on some executables just E8 initially provides a higher compression. In short, E8 hurts only a little in baddest case.


  21. #21
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    have you tried exe transformation from cabarc? maybe it will be better. it has some heuristic to avoid tranformation on non- executable data, eg. if relative offset is bigger than 12345678 or smaller than -12345678 then don't transform it to absolute offset. details are in cabsdk docs.

  22. #22
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    If youre for or against such addition, let me know.
    For!

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by donkey7
    have you tried exe transformation from cabarc?
    I have exactly the same here!

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by donkey7
    it has some heuristic to avoid tranformation on non- executable data, eg. if relative offset is bigger than 12345678 or smaller than -12345678 then dont transform it to absolute offset. details are in cabsdk docs.
    Anyway, if you pass it across non-executable data youll see some tiny loss.

  25. #25
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    maybe add some threshold, eg. if there was > 50 % of failures (non converted offstes) on last 50 e8 sequences then disable e8 transformer completely for the moment.

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Actually, current e8 works pretty well. The loss in many cases just a few bytes - on text files there is even no difference, since 0xe8 not fits in ASCII char-set. I wont add some extra analyzing to the filter since I want to keep MAX speed. I already tested it on ENWIK8 - at almost the decompression speed stayed untouched. In addition, this filter is more cleverly implemented than say QUAD's one. For example:

    int &addr = *(reinterpret_cast<int *>(&buf[i]));

    After, we can make with addr what we want, instead of keeping local variable, modifying and writing new value back.

    The only problem is that in some cases additional E9 processing can give extra compression - especially on executables with a large amount of code - like doom3.exe and photoshop.exe. But I oriented on large file sets with mixed content - in this case single E8 works better anyway.


  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Tested decompression with ENWIK9 - the penalty is less than one second on my PC! Note that if we will decompress executables, the decompression speed be even higher (do you remember the LZ77 rule - higher compression == more matches == faster decompression). I already included this filter to the LZPM. Even junkies from Microsoft inserted such toy to their LZX!

    acrord32.exe:
    LZPM 0.08: 1,481,357 bytes

    mso97.dll:
    LZPM 0.08: 1,892,075 bytes


  28. #28
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    encode
    did you use my approach?

    if you want max speed you can use following strategy:
    - process 4 kb of data with exe filter,
    - if number of failures are lower than threshold process further,
    - otherwise suspend exe filter for 20 kb of data (do not try to transform this data) and restart filter after that,

    note that threshold ive mentioned (50 %) is an ad hoc value. probably it should be way lower (or higher, i dont know).

    Quote Originally Posted by encode
    higher compression == more matches == faster decompression
    i guess you wanted to write longer matches

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by donkey7
    did you use my approach?
    I did. Some time ago I tried at almost ALL possible variants. Current approach is one of the best so far!
    By the way, try new LZPM:
    http://www.encode.ru/lzpm/index.htm


  30. #30
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by encode
    Even junkies from Microsoft inserted such toy to their LZX!
    these "junkies" invented both e8 transformation and price-optimal parsing

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •