Page 1 of 2 12 LastLast
Results 1 to 30 of 31

Thread: plzma codec

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

    plzma codec

    http://nishi.dreamhosters.com/u/plzma_v2.rar

    08-06-2011 10:07 v0

    - First encoder version with integrated LZ transform + lzmarec entropy backend

    08-06-2011 17:37 v1

    - Integrated decoders
    - Support for lzma and lzmarec formats

    10-06-2011 00:19 v2

    - full integration
    - better interface
    - x64 version (up to ~1.5G window)
    - large page support


    Code:
     rc1     rd1     rc2     rd2
     88.530s 1.857s  88.608s 7.379s // x86, i7-930 @ 4.0ghz
     86.565s 1.903s  86.565s 6.740s // x64
     72.244s 1.856s  72.322s 6.677s // x64 + large pages
    
    113.390s 2.109s 113.141s 9.047s // x86, Q9450 @ 3.52ghz
    
     24760382        24441811       // enwik8 -d27

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Btw, the actual points of these stats were
    1) The fact that x64 version of encoder is only 2% faster, and lzma decoder is actually slower than 32-bit version.
    2) The fact that the version which uses large pages is 16% faster, while afaik coders like bsc still don't use this feature.

  3. #3
    Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    116
    Thanks
    18
    Thanked 32 Times in 11 Posts
    Some quick results:

    Code:
    size                 |  options   |  time (comp)   | time (decomp)  
    --------------------------------------------------------------------------------
    2.778.314.911 Bytes  |  -         |  -             | -
      775.746.870 Bytes  |  e d27     |  1501s (x64)   |  69s (x64) (double checked)
      757.549.235 Bytes  |  c d27     |  1512s (x64)   | 330s (x64)
      757.549.235 Bytes  |  c d27     |  1542s (x86)   | 335s (x86)
    Source is a tar of the installed games "alien breed 1,2,3". (well compressable with high dictionary)
    all results from i7-2600k @ 3.2ghz
    decompressed data has no crc errors.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    69s (x64) (double checked)
    that's expected - "e" is lzma codec, "c" is lzmarec one. lzmarec has slower decompression, but better compression. it's worth comparing with NANOZIP -cO, RZM and MCOMP (with equal dictionaries) - all are better than LZMA but slower

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Yes, that's how it is. The "c" mode provides a little better compression and much slower decoding.
    And "e" method technically is supposed to produce normal .lzma files, although v2 has a bug which
    makes them incompatible (already fixed, would post later).

    Also its kinda lzma in both cases - both modes use the same matchfinder and parser, and run
    a separate lzma/lzmarec entropy backend in other thread. That's why compression times
    are very similar on multicore system.

  6. #6
    Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    116
    Thanks
    18
    Thanked 32 Times in 11 Posts
    I'm done another test with the same source as mentioned above (but with 1gb dict.) and i'am very impressed (size related):

    Code:
                                       | options          | size              |time(comp)| input source
    -----------------------------------|------------------|-------------------|----------|---------------------------
    plzma_v2                           | e d30            | 617.565.734 Bytes | 1775s    | tar (grouped by extension)
    *m7zRepacker 1.0.32.307 (7zip 9.20)| m1 d1024 mem1024 | 625.638.082 Bytes | lifetime | tar (grouped by extension)
    freearc 0.666                      | mx md1024        | 641.836.428 Bytes | 546s     | source folder
    7zip 9.20                          | mx md1024m       | 714.470.090 Bytes | 764s     | source folder
    (*the m7zRepacker results was done some month ago)

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    In plzma you can specify actual window size in bytes.
    Eg. plzma e enwik9 enwik9.lzma 1000000000
    Unfortunately 2G already won't work due to some 32-bit overflows,
    but still its possible to specify windows up to 1.5G or so.

  8. #8
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Here are a couple of tests.
    OS: Win XP Pro SP3 \ Win XP Pro x64 SP2
    Hardware: Intel Core i5-2500K, Turbo and SpeedStep options are turned off to provide a stable CPU freq and precise results.
    Testbed: enwik9 \ levels.db1 file from S.T.A.L.K.E.R: Clear Sky game (745 166 775 bytes)
    Versions: plzma_v2 \ 7z 9.22 beta \ lzma.exe (SDK 9.22 beta)
    Options:
    PLZMA: 27 10000 273 8 0 0 4096 128 16 272
    7z: same dict\mc\fb\lc\lp\pb options and -slp
    lzma.exe same dict\mc\fb\lc\lp\pb options. Note that -slp option is not available in lzma.exe
    Also: timetest by Shelwien for measurements, output redirected to another physical HDD for both compression and decompression.

    To avoid a possible confusion... 7z\plzma means x86 compiles, 7z64\plzma64 means x64 compiles. The tables are: which compile on which OS version are executed \ compressed size \ compression time \ decompression time

    Code:
    enwik9
    ------
    7z on XP86            205094698 - 754.359s  - 21.313s
    7z on XP64            205094698 - 616.532s  - 20.015s
    7z64 on XP64          205094698 - 622.047s  - 19.969s
    lzma.exe on XP64      205094596 - 802.141s  - 19.828s
    plzma on XP86         202807027 - 1114.265s - 73.344s
    plzma on XP64         202807027 - 1158.969s - 73.672s
    plzma64 on XP64       202807027 - 919.984ss - 68.938s
    
    
    levels.db
    ---------
    7z on XP86            216362006 - 215.031s  - 22.218s
    7z on XP64            216362006 - 200.625s  - 22.219s
    7z64 on XP64          216362006 - 194.297s  - 21.968s
    lzma.exe on XP64      216361900 - 217.828s  - 21.750s
    plzma on XP86         203161580 - 381.844s  - 101.625s
    plzma on XP64         203161580 - 392.578s  - 101.406s
    plzma64 on XP64       203161580 - 334.281s  - 100.719s
    I suppose that compressed size results does not require any comments, but the idea of test was also to see how different compilations are performing on different OS-es.
    So. PLZMA definitely follows the rule: x64 compile\x64 OS faster than x86\x86 OS which is faster than x86\x64 OS.
    7z's results are quite indefinite. Improved speed on x64 OS is the result of -slp option I suppose, but its strange that x86 compile on x64 OS performs faster than x64 compile for enwik9.

    The current tests are quite brief, since I haven't utilize the full potential of PLZMA. So next time we'll see how plzma and 7z are performing with higher dict. on huge testbed.
    Anyway, PLZMA shows up impressive results. Thanks Eugene! Its definitely a really good job.

    Edit 1: Added results of LZMA.exe from LZMA SDK 9.22 beta
    Edit 2: Retested plzma with default kNumOpts matchStep alignStep lenStep parameters.
    Last edited by Skymmer; 11th June 2011 at 18:07.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Thanks, but 2x-3x slower encoding is really strange.
    In my tests vs SDK's lzma.exe, plzma encoding had the same speed as lzma -mt1,
    and was 25% slower without -mt1 (which is ok, because I don't have the parser threading for simplicity),
    but compression was the same.
    And with 7z we're getting much faster encoding, but significantly worse compression - I guess its
    threading is different from SDK coder...

  10. #10
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Added results of lzma.exe from SDK 9.22 beta to the initial table (Message #.

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Ok, I didn't notice before, but appears that plzma is so much slower because of non-default extra parameters (128 1 1)
    which improve compression at the cost of speed.

  12. #12
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Result table updated. (message #8).
    PLZMA retested with default kNumOpts matchStep alignStep lenStep parameters.
    Previously was 4096 128 1 1
    Now its 4096 128 16 272
    In case anybody wants to look into previous results - look here

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Thanks, that's much better, though 2x difference at "levels.db" encoding is still problematic,
    but at least I know what causes it (lack of parser threading).

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    http://nishi.dreamhosters.com/u/plzma_v3.rar

    Added another backend with better text compression.

    12-06-2011 02:05 v3

    - BUG: files can't be decoded with original decoder
    - BUG: size mismatch in stats on enwik8
    - BUG: only part of memory usage is reported
    - 3rd backend / c# backend selection
    - memory usage optimization (10M less or so)

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    About "c# backend":

    Code:
    "c" command:
      plzma c book1 book1.arc -- compress book1 to book1.arc using lzmarec model tuned to SFC
      plzma c wcc386 wcc386.arc 20 9999 273 7 0 0 -- d20 mc9999 fb273 lc8 lp0 pb0
      plzma c0 book1 book1.arc -- same as 'e'
      plzma c1 book1 book1.arc -- same as 'c'
      plzma c2 book1 book1.arc -- use lzmarec model tuned to enwik8

  16. #16
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    v3 vs. v2 Test coducted with 27 10000 273 8 0 0 4096 128 16 272

    Code:
    enwik9
    ------
    plzma64_v2 c      202807027 - 919.984s - 68.938s
    plzma64_v3 c      202806355 - 923.250s - 68.484s
    plzma64_v3 c2     201957812 - 923.859s - 69.203s
    
    levels.db
    ---------
    plzma64_v2 c      203161580 - 334.281s - 100.719s
    plzma64_v3 c      203160908 - 336.703s - 100.594s
    plzma64_v3 c2     211789848 - 337.437s - 104.485s
    v3 performs a little slower but also provides little gain in compression. Also quite expected better results of c2 on enwik9 and worser results on levels.db. Anyway, nice progress.
    Unfortunately I have a bad news too. When I tried to run plzma64_v3 with dictionary of 30 on enwik9 - it crashed immediately. It happens for both c2 and c. After some trials I found that dcitionary of 930000000 doesn't crashes at beggining, but still crashes at about 38%.
    Last edited by Skymmer; 13th June 2011 at 16:28.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Erm, are you saying that v2 works with d30, but v3 doesn't?
    In theory, large pages shouldn't work when more than physical memory is requested
    (they can't be swapped), but it supposedly checks pointers to allocated memory, so
    that behavior is really weird.

    Ok, crashes reproduced, thanks, would see how it happens.

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    http://nishi.dreamhosters.com/u/plzma_v3a.rar

    1. new parameter is a set of flags that enable some len loops in parsing optimizer.
    Thus, f_lenloop=7 should further improve compression (at the cost of speed of course).

    2. fixed the memory bug - now it shouldn't crash at all, and should support window
    sizes of up to 2147483646 or so (2G-2)

    Code:
    14-06-2011 00:06 v3a
    
     - BUG: crashes on memory allocation
     - new advanced option (f_lenloop)

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    http://nishi.dreamhosters.com/u/plzma_v3p.rar

    Tried attaching psrc (parallel rangecoder).
    It works all right, but somehow both compression
    and speed became a little worse (and memory usage too).

    Code:
           c.mem  c.size   c.time    d.mem d.time
    [15_0] 1370MB 24760312 113.984s  150MB 2.125s
    [15_1] 1370MB 24441741 113.843s  150MB 9.063s
    [15_2] 1370MB 24354967 114.250s  150MB 8.797s
    
    [16_0] 1443MB 24765618 111.265s  222MB 2.797s
    [16_1] 1443MB 24474148 110.969s  222MB 9.281s
    [16_2] 1443MB 24389610 111.125s  222MB 9.172s
    (here 15=v03a, 16=v03p, 0/1/2 - plzma modes)

    Well, here we can see why: (these are core load plots)

    http://nishi.dreamhosters.com/u/01.png (dec)
    http://nishi.dreamhosters.com/u/02.png (enc)

    The processing time depends only on the main thread
    in both encoding and decoding, and an extra thread
    in the pipeline surely adds some overhead, so no
    wonder that its slower.

    But at least it works at all, with rc functions
    replaced with
    Code:
      uint rc_Decode( uint P ) {
        uint p = qmap[(P+mask)>>shift];
        uint bit = rs.ptr[p]++;
        return bit;
      }
    
      uint rc_Encode( uint P, uint bit ) {
        uint p = qmap[(P+mask)>>shift];
        put( p+(bit<<7) );
        return bit;
      }
    As to speed, I'd think about that later - I didn't even use PGO,
    and lzmarec model surely has a lot of things to optimize for speed.

  20. #20
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    275
    Thanks
    6
    Thanked 23 Times in 16 Posts
    Could you provide quick result on book1?

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Code:
    plzma c0 BOOK1 1 --> 261189 // lzma
    plzma c0 BOOK1 1  20 999999 273 5 0 0 6000 132 7 1 0 --> 260758
    plzma c1 BOOK1 1 --> 258379
    plzma c2 BOOK1 1 --> 258118
    plzma c2 BOOK1 1  20 999999 273 5 0 0 --> 257864
    Not sure what you mean though... its still LZ77.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    enwik9 stats (i7-930 @ 4.0Ghz, ramdrive)
    Btw, c0 is lzma... while LTCB lists 211,776,220 for it.

    Code:
    c.mem    d.mem
    10110MB  975MB 
    
    c0.size   c0.time   d0.time  c1.size   c1.time   d1.time  c2.size   c2.time   d2.time
    196320366 8722.593s 15.865s  193935647 8729.519s 54.179s  193374977 8718.022s 55.365s
    196255274 8870.467s 15.881s  193894674 8872.666s 54.148s  193347870 8866.005s 55.334s
    196251545 8872.027s 15.881s  193891615 8875.162s 54.164s  193345080 8867.362s 55.333s
    196244988 8871.652s 15.834s  193886312 8878.095s 54.179s  193337347 8868.657s 55.318s
    196244810 8873.274s 15.834s  193882655 8875.271s 54.132s  193335804 8865.428s 55.303s
    196194943 8867.206s 15.834s  193791439 8880.560s 54.148s  193244178 8869.858s 55.286s
    196184882 8883.727s 15.834s  193788310 8892.837s 54.148s  193240160 8889.108s 55.303s
    
    c0.size   c1.size   c2.size           
    196320366 193935647 193374977 plzma c# enwik9 1  1000000000 999999999 273 8 0 0 4096 128 16 272 0
    196255274 193894674 193347870 plzma c# enwik9 2  1000000000 999999999 273 8 0 0 4096 128 16 1 0  
    196251545 193891615 193345080 plzma c# enwik9 3  1000000000 999999999 273 8 0 0 4096 128 1 1 0   
    196244988 193886312 193337347 plzma c# enwik9 4  1000000000 999999999 273 8 0 0 4096 16 1 1 0    
    196244810 193882655 193335804 plzma c# enwik9 5  1000000000 999999999 273 8 0 0 4096 1 1 1 0     
    196194943 193791439 193244178 plzma c# enwik9 6  1000000000 999999999 273 8 0 0 6000 1 1 1 0     
    196184882 193788310 193240160 plzma c# enwik9 7  1000000000 999999999 273 8 0 0 6000 1 1 1 7

  23. #23
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Shelwien View Post
    Btw, the actual points of these stats were
    1) The fact that x64 version of encoder is only 2% faster, and lzma decoder is actually slower than 32-bit version.
    2) The fact that the version which uses large pages is 16% faster, while afaik coders like bsc still don't use this feature.
    I did try large pages in bsc. But on my Win7 x64 VirtualAlloc always return out of memory error. Do you have a sample.cpp so I can integrate?
    Enjoy coding, enjoy life!

  24. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I did try large pages in bsc.
    > But on my Win7 x64 VirtualAlloc always return out of memory error.

    Its an obvious RTFM case
    Large pages are not swappable, so there's a risk of Denial-of-Service
    if a guest app is allowed to abuse it.
    Thus you have to acquire a "SeLockMemoryPrivilege" privilege to use it.
    There's an example in 7z sources, but it does something weird
    (enables that thing for the current user instead of process, afair),
    so I ended up implementing it on my own.

    Warning: having lots of free memory is not the same as having an equivalent
    amount of large pages. The address space gets fragmented with time, so
    you might have to reboot to allocate anything.
    Once I had 15.5G of free memory and 90M of free large pages.

    > Do you have a sample.cpp so I can integrate?

    http://nishi.dreamhosters.com/u/2Mpages.cpp

  25. #25
    Member toi007's Avatar
    Join Date
    Jun 2011
    Location
    Lisbon
    Posts
    35
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Im not 100% thirlled with plzma just because its 2x to 3 x slower than rar

    original file---------- sample.avi of 713.280 Kb
    plzma c ------------ sample.arc of 698.260 kb in 54 mins
    rar max compression- sample.rar of 707.913 Kb in 28 mins

    but I would get impressed by

    orginal file---------- sample.psd of 10.401 Kb
    plzma c ------------ sample.arc of 3.634 Kb in 20 secs
    rar max compression- sample.rar of 7.156 Kb in 06 secs

    I have a slow computer of 32 bits xp

  26. #26
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Btw, c0 is lzma... while LTCB lists 211,776,220 for it.
    Quote Originally Posted by metacompressor.com
    Code:
    size	  ct	cm  dt dm  opts
    203930511 1438 1788 31 165 7za460 d160m fb=248 mc=1000000000 lc8 pb0
    205095285 1031 1358 28 134 7z900a d128m fb=248 mc=1000000000 lc8 pb0

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    http://nishi.dreamhosters.com/u/plzma_v3c.rar
    Code:
    29-10-2011 03:37 v3c
     - stdin/stdout support

  28. #28
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Somehow this program escaped my attention. Now fixed. http://mattmahoney.net/dc/text.html#1933

  29. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    .
    Last edited by Bulat Ziganshin; 18th January 2013 at 04:18.

  30. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Uh, thanks, but there're many mistakes

    1) the most recent version is 3c, not 3p. 3a->3p->3b->3c (see history.txt in archive)
    v3p is a not very successful test of parallel rangecoder in plzma - its slower and compresses a little worse than 3b/3c.
    2) the listed result (193,240,160) actually corresponds to 3b/c2, not 3p/c0 (3c should be similar).
    3) max mc value is 2^32-1, not 999999999
    4) e=c0 is lzma backend, c=c1 is backend optimized to SFC, c2 is optimized for enwik8.
    c0 uses modified stats initialization, so all modes don't have binary compatibility with lzma.
    5) enwik8 result with "plzma.exe c2 enwik8 1 100000000 999999999 273 8 0 0 6000 1 1 1 7" is 24,206,571, not 24,550,691
    6) for window sizes both log2 (25=32M) and direct values are supported. Small window size values are treated as log2.

Page 1 of 2 12 LastLast

Similar Threads

  1. MTF and Coroutines and Codec APIs
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 4th December 2010, 12:09
  2. Lossless Audio Codec
    By encode in forum Forum Archive
    Replies: 8
    Last Post: 1st August 2007, 18:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •