Page 3 of 5 FirstFirst 12345 LastLast
Results 61 to 90 of 130

Thread: LZPM's future

  1. #61
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Results are quite similar.

    Quote Originally Posted by donkey7
    why use fixed size offsets and lengths? gamma encoding is very efficient and easy to decode.
    I already use kind of such stuff, using higher bits of a control byte:

    0??????? - literal run
    10?????? - match with small offset
    11?????? - match with large offset, or recent match or whatsoever

    I may extend the scheme:
    0???????
    10??????
    110?????
    111?????
    and so on...


  2. #62
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    By the way, my LZSS compresses ENWIK9 to 392,062,293 bytes and decompresses it in 5 seconds!

  3. #63
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Generally speaking, all pimps like keeping recent match offset (which mainly helps on EXEs), variable length offset (bytewise), helps a little, at the cost of decoder complexity. Worth it? Even basic super simple scheme works! Maybe we may do something more clever in favor to achieve a higher compression.

    The basic scheme:

    0 flag; 7-bit count; <count+1> bytes - literal run
    1 flag; 7-bit match length-4; 16-bit offset - match

    Extreme simplicity.

  4. #64
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    A link to Stac LZS decompressor source code:
    http://homepages.rya-online.net/plun...lzs-1.0.tar.gz

    Classic!

  5. #65
    Member
    Join Date
    May 2008
    Location
    USA
    Posts
    42
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by encode
    A link to Stac LZS decompressor source code
    God, STAC was crap. I mean, I still have my Stacker 1.0 and 2.0 compression boards (6MB/s decompression in hardware, LOL) and the *idea* was really sexy at the time, but looking at what Stac LZS really is, its disgusting. Charles Blooms LZP outperforms it, even at order 1 with byte-aligned codes. sheesh.

    Thanks for the source link!

  6. #66
    Member
    Join Date
    May 2008
    Location
    USA
    Posts
    42
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by donkey7
    gamma encoding is very efficient and easy to decode
    I agree about easy to decode, but Ive never understood why it is efficient -- any number past 15 is larger than 8 bits, so isnt it only really "useful" for a range of 0-15? I know in coding compression routines, closer matches are preferred so that the offsets are smaller, but I have a hard time believing elias gamma is the best choice for compression because hardly any offsets have that kind of locality (ie. are 15 bytes or less away from the current position).

    I still consider myself a beginner in compression (have never coded ppm for example) so maybe Im missing something obvious... if so, please tell me

  7. #67
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    well, lower 8 bits of offset should be passed in another byte. ie:

    offset % 256 is passed in separate byte
    offset / 256 is gamma encoded

    so it will be fast enough and space efficent.


    or we can use something like rice coding: http://www.monkeysaudio.com/theory.html
    and some tricky guessing algo.

  8. #68
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I did another improvement - added a 128K dictionary.

    New results:

    ENWIK8: 42,623,382 bytes
    ENWIK9: 377,636,479 bytes

    world95.txt: 912,246 bytes

    3200.txt: 7,070,837 bytes

    bible.txt: 1,292,450 bytes

    book1: 363,158 bytes

    I think it's interesting. I may compose some funny name and release a new file compressor.

  9. #69
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Btw, another speed optimization technique is code generation.
    That is, you can even use full huffman codes instead of a
    couple of fixed match layouts, and still have the same decompression
    speed - by unrolling all the branches eg. for byte of input data.
    But of course, its the decompressor which has to generate the code by
    huffman table or something.

  10. #70
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Currently I'm playing with an optimal parsing. Actually I'm not quite understand why my current scheme not works with LZ77.
    I get ALL matches within given block.

    path[N][2]; // path[i][0] - longest match at i'th position, path[i][2] = offset

    OK, we get all matches. After I tried do many things parsing backwards, apply a flexible parsing...

    Currently only the lazy matching works. I just do something wrong...

  11. #71
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How do you handle phrases which go past the current parsing block? This could be a problem.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  12. #72
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Parsing block is the entire file... Or large enough block - i.e. 16 MB The parser collect all longest matches within the reach - i.e. within fixed distance which is equal to kind of dictionary size.

  13. #73
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    storer & szymanski method for optimal parsing must work ewerywhere, where symbols have fixed probabilities. and your coding scheme has that property.

    as s & s described you should parse backwards. additionally, in each entry you should contain information about the first encoded symbol - if it's a match or the length of literal run.

    which coding scheme do you use?

  14. #74
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Currently I'm playing with an optimal parsing. Actually I'm not quite understand
    > why my current scheme not works with LZ77.

    Let's suppose that i is a number of already processed
    input bytes, and x[i] is length of optimal code for that input.
    Then we can
    1) Store a counter byte and a sequence of literals of length 1..128 - so
    if( x[i]+2<x[i+1] ) x[i+1]=x[i]+2, t[i+1]=1, L[i+1]=1
    ...
    if( x[i]+129<x[i+128] ) x[i+128]=x[i]+129, t[i+128]=1, L[i+128]=128
    2) Store a 2-byte reference (match) of length 2..l:
    if( x[i]+2<x[i+2] ) x[i+2]=x[i]+2, t[i+2]=2, L[i+2]=2
    ...
    if( x[i]+2<x[i+l] ) x[i+l]=x[i]+2, t[i+l]=2, L[i+l]=l

    And then we'd know, being at position i, that we can encode input bytes
    c[i-L[i]..i-1] as a literal sequence if t[i]==1 or as a match if t[i]==2

    Also all this can be performed using small (2*129+1) circular buffers,
    as we'd never reach further than 129 bytes back or forward.
    (well, or whatever limit there is for match length)

  15. #75
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by donkey7
    storer & szymanski
    I did it!!!

    Finally it works! Its cool!

  16. #76
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Comparison between Greedy and Storer's parsing. My LZSS, 128KB dictionary.

    ENWIK8
    Greedy: 43,270,480 bytes
    Storer: 42,178,117 bytes

    ENWIK9
    Greedy: 383,533,462 bytes
    Storer: 373,568,795 bytes

    world95.txt
    Greedy: 931,959 bytes
    Storer: 902,023 bytes

    bible.txt
    Greedy: 1,302,072 bytes
    Storer: 1,273,202 bytes

    book1
    Greedy: 367,331 bytes
    Storer: 358,744 bytes


  17. #77
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    And here, if we skip back to the tagging output we get:

    world95.txt
    128K: 868,034 bytes
    256K: 797,370 bytes [!]

    bible.txt
    128K: 1,260,275 bytes
    256K: 1,166,396 bytes

    book1
    128K: 351,286 bytes
    256K: 330,961 bytes

    Again no Huffman or arithmetic, pure bytewise LZ77... This means, decompression speed is faster than memcpy()...

  18. #78
    Member
    Join Date
    May 2008
    Location
    USA
    Posts
    42
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by encode
    Again no Huffman or arithmetic, pure bytewise LZ77... This means, decompression speed is faster than memcpy()...
    Youre kidding, Im sure, because nothing beats memcpy unless your scheme also accounts for runs of a single character for RLE (and your source has a large percentage of data that supports it).

    I *love* to see optimal parsing in use, even if I dont fully understand S&Ss paper on how to implement it Will you be releasing compressor+decompressor so we can have some fun playing?

  19. #79
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Trixter
    Will you be releasing compressor+decompressor so we can have some fun playing?
    Well, yes I will. Mainly, the goal of this compression is:
    + Smallest, fastest and simplest decoder which needs no extra memory for the decompression. Simple ASM implementation for decoder should be possible - i.e. all things are done to keep the decoder as simple as possible, utilizing ASM features. i.e. the compressed stream format is optimized for ASM decoder.
    + Compressor should generate the tightest and smallest compressed output possible, with no compromise. i.e. it may be slow and eat lots of memory. Thats why here I use Optimal Parsing.

    Currently, Im still not decided about LZSS parameters. Most likely I will use 128K dictionary. The main thing is which method to use to code literals/matches:
    1. Tagging - i.e. one byte represents eight flags, as with original LZSS
    2. Control bytes - i.e. literal run/match.

    The first one provides better compression, although has bad already-compressed data handling - we may inflate input data up to 9/8. 800KB->900KB, for example.
    Second one, has very good incompressible data handling, even in worst case we will increase data just a bit. In addition, this approach should be faster - no need in tag decoding.
    However, most likely I will use tagging - since we have a higher compression here (look to my posts above).

    So, the current compressed stream format is:

    Tag - represents eight flags:
    0 - literal
    1 - match

    Match is encoded in three bytes:

    LLLLLLLO OOOOOOOO OOOOOOOO - 7-bit match length, 17-bit offset. The lowest bit of an offset is kept in the lowest bit of a first byte.

    During decoding we may do:
    <div class=""jscript""><pre>
    MOVZX ECX,BYTE PTR[ESI] // on modern processors MOVZX is not slower than MOV
    MOVZX EAX,WORD PTR[ESI+1] // a subject to change
    ADD ESI,3
    SHR ECX,1
    ADC EAX,EAX
    XOR EAX,-1 // should be faster than NOT EAX, since XOR can be pariable
    ADD EAX,EDI
    ADD ECX,4 // MINMATCH = 4
    XCHG EAX,ESI
    REP MOVSB
    MOV ESI,EAX // MOV is faster than XCHG
    [/code]

    And the last thing how to call this program/algorithm. Temporarily I call it LZS. Maybe I should call it LZSS, or THUNDER, BULLET, or name of a video game hero, will search for my favorite games to find some name. By the way, as you know - PIMPLE is from Battletoads. A PPM Battletoad which should do the smash...

    Any suggestions and ideas are welcomed!

  20. #80
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    s & s method is very simple. for a scheme with fixed size offset and lenghts:

    1. find longest match for every position.
    2. in reverse order parse the output and compute costs (cost is a cost of coding file from current place to end)
    - for every position examine all possible match lengths and choose one that will produce lowest coding cost. coding cost is cost of current match plus coding cost at position after current match.
    3. in forward direction parse the output of previous step:
    - if there's match of length 0 then output a literal and go to next position,
    - if there's match of length > 0 then output match and skip to posiotion after current match,


    if literal flags are clustered (ie. instead of escaping every literal with flag you write length of literals run) then you must remember the length of literal run from current position (it can be >= 0).

  21. #81
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    encode:
    you can use custom lut to map coded lengths to real lenghts. ie. use only 6 bits for length where value < 60 is mapped to the same value, 60 -> 100, 61 -> 140, 62 -> 210, 63 -> 320

    ie.
    mov ecx,[lut+ecx*4]
    instead of
    add ecx,4

  22. #82
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Wonder, why do you ignore that post?
    Is there a mistake?
    Because if not, then its much simple than the multipass method described above.

  23. #83
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    maybe it won't work?

    igor pavlov in 7- zip does optimal parsing in forward direction (although it's not really optimal as symbols have non fixed probabilities), but it requires decoding matches in reverse direction so it's also multipass algorithm. the advantage of igor's approach is that he don't need to store lots of matches in auxiliary array (only max_match * position_slots where position_slot is describex in lzx algorithm description as different offsets are encoded with different codes of different size) when parsing so it is memory efficient.

    igor's method is described somewhere at this forum.

  24. #84
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > maybe it won't work?

    So, why?
    It covers all the possible encodings and has small memory requirements.
    And would still be optimal with entropy coding, though that would be slightly tricky to implement.

  25. #85
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    well, ilia can test your scheme and post the results. i bet it wouldnt be optimal on files like pic from calgary corpus and probably almost all.


    Quote Originally Posted by Shelwien
    And would still be optimal with entropy coding, though that would be slightly tricky to implement.
    afaik nobody discovered method for globally optimal parsing with adaptive encodings.

  26. #86
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien
    Wonder, why do you ignore that post?
    Firstly the idea is not clear for me. Secondly it might not work really. At the first sight its something like Flexible Parsing - we check all variants within the reach of a current match. Such thing works with ROLZ based LZPM,and with all my LZP variants (PIMv1). Strangely, but this NOT works with LZ77. Even if we process in such forward manner the array with all matches collected. Very strange, however, it is.

    S&S is very simple and clear at implementation, and it works!

    In addition, with S&S we hit the global optimality - if blocksize=filesize.

    Quote Originally Posted by donkey7
    encode:
    you can use custom lut to map coded lengths to real lenghts. ie. use only 6 bits for length where value < 60 is mapped to the same value, 60 -> 100, 61 -> 140, 62 -> 210, 63 -> 320

    ie.
    mov ecx,[lut+ecx*4]
    instead of
    add ecx,4
    Cool idea! Can you describe the ASM implementation details more deeply? Can we here completely eliminate branches? i.e. how to get LUT, and so on.

  27. #87
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    >> Wonder, why do you ignore that post?
    >
    > Firstly the idea is not clear for me.

    Well, whatever. You already have a working implementation anyway.
    And I think that what I tried to explain there should be pretty similar to S&S,
    just with different data structures and single-pass delayed encoding in sliding window.
    So that it would work in conjunction with adaptive entropy coding, though of course
    I don't have an optimal parsing scheme for static coding.

    >> you can use custom lut to map coded lengths to real lenghts. ie. use only 6 bits
    >> for length where value < 60 is mapped to the same value, 60 -> 100, 61 -> 140,
    >> 62 -> 210, 63 -> 320
    >>
    >> ie.
    >> mov ecx,[lut+ecx*4]
    >> instead of
    >
    > Cool idea! Can you describe the ASM implementation details more deeply? Can we
    > here completely eliminate branches? i.e. how to get LUT, and so on.

    He meant that you can have a nonlinear mapping of offset/length field values to the
    real values with a lookup table (LUT).

  28. #88
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien
    He meant that you can have a nonlinear mapping of offset/length field values to the
    real values with a lookup table (LUT).
    Hm... I thought he can done this with simple computations only. I think no additional data structures should be used. However, quantized match lengths should be a good idea. Here are many possibilities. We may map the lontest match to another value, for example:
    if (len == 127) len+=len;
    or
    if (len&64) len<<=1;
    Even in Deflate specification the longest match has a high priority.
    Also, such quantization stuff not works with offsets - we loose to many matches...
    I have an idea about variable length match lengths and offsets, which are fits in 24-bit integer to allow:
    short matches at smaller offsets
    long matches at large offsets
    Maybe m wrong here and we will not see any comression improvement...

  29. #89
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    you can use cmov* instruction while working on i686 or higher.

    for example:

    mov ebx,ecx
    cmp ecx,126
    cmove ebx,512
    cmp ecx,127
    cmove ebx,1024
    mov ecx,ebx

    i'm using ebx as a temporary register to break dependency chains. it would be useful with more quantized values, eg:

    with 6 bit length:

    mov ebx,ecx
    cmp ecx,60
    cmove ebx,120
    cmp ecx,61
    cmove ebx,240
    cmp ecx,62
    cmove ebx,480
    cmp ecx,63
    cmove ebx,960
    mov ecx,ebx


    if (len == 127) len += len;

    could be rewritten as:

    cmp ecx,127
    cmove ecx,254

  30. #90
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Yep, CMOV is cool.

    Actually, I want achive a compression comparable to Deflate, even with byte I/O only. Currently, my LZSS with 256K dictionary already has a higher text compression that Deflate...

    Since the Offset takes the most space, I should try to do a variable length coding for it.

    For eaxmple:

    0??????? ???????? - match length + short offset
    1??????? ???????? ???????? - match length + large offset

    Depending on match length, we may use different offset coding.

Page 3 of 5 FirstFirst 12345 LastLast

Similar Threads

  1. BCM's future
    By encode in forum Data Compression
    Replies: 17
    Last Post: 9th August 2009, 01:00
  2. Future Bandwidth.
    By Tribune in forum The Off-Topic Lounge
    Replies: 9
    Last Post: 10th October 2008, 22:56
  3. ENCODE's music
    By encode in forum Forum Archive
    Replies: 1
    Last Post: 16th October 2006, 23:19
  4. Dwing's UDA v3.00
    By spark in forum Forum Archive
    Replies: 6
    Last Post: 10th August 2006, 10:11
  5. TC - What's next
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 20th June 2006, 17:06

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •