Results 1 to 22 of 22

Thread: First experiment: simple LZP / ROLZ ish toy

  1. #1
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts

    First experiment: simple LZP / ROLZ ish toy

    Hi all,

    I recently decided to start exercising my C skills again by implementing some of the data compression algorithms I learned over the years.

    The first one I did, I don't remember the name for - basically, keep a rolling hash of data, look it up in a hash table; if it matches, output a 1, if not a 0 and the literal. Optimise to be byte aligned by gathering 8 flags and emitting together.

    That actually performed a lot better than I expected when I expanded the hash table to 1MB. Even better than `lz4 -1` on enwik8.

    Next, I implemented a simple LZP or ROLZ scheme.

    Using a (8k then 64k) table of arrays of (at first 8 then 16) references, I hashed the last (2 bytes, then 3) and scanned the list of references for the longest match. I then output a match flag, the index, and the length.

    If the length is short enough, I can store all that in a single byte, so even match lengths of 2 are acceptable.

    Sequential literals are gathered into blocks with a (0, len) byte.

    In either case if the length hits the maximum for its original size, an additional byte is read and added, allowing match lengths of (2 + 15 + 255) in the original, then (2 + 7 + 255) in the 16-deep. Literal runs can be up to (128 + 255) bytes long.

    Literal run byte: 0lll llll (l = length - 1)
    Original match flag byte: 1iii llll (i = index, l = length - 2)
    Revised match flag byte: 1lll iiii

    Again the results are not bad. I've done no optimisation for speed, but can manage to compress enwik9 in ~24 seconds, resulting in 485,910,956 bytes, and decompresses in ~12seconds.

    For enwik8 it compresses to 55,343,408 bytes.

    Some questions:
    1. Would this count as LZP, or ROLZ, or is LZP typically a ROLZ?
    2. I currently update the hash tables for every byte [up to -3 from the current] - is there a better scheme?

    The longest match I've managed in enwik8 was 264 bytes. I guess my next step is to start gathering statistics to see how it's really performing.

  2. The Following 2 Users Say Thank You to FunkyBob For This Useful Post:

    encode (2nd April 2019),Mike (2nd April 2019)

  3. #2
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    Forgot to mention I will post the code to github soon... if anyone expresses an interest.

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. ROLZ is "reduced offset" LZ, where distance is counted within a context
    LZP is a corner case of ROLZ, where only one offset per context is stored
    http://ctxmodel.net/files/LZP-DS_v0.rar
    http://mattmahoney.net/dc/flzp.zip
    https://github.com/IlyaGrebnov/libbs...p/lzp.cpp#L168

    2. Another approach was mentioned here: https://encode.ru/threads/2681-SQUID...ll=1#post59629
    But to actually compete with existing implementations, you'd need optimal parsing most likely.

    3. Did you see LZTP? https://encode.ru/threads/3005-TP-LI...ig-compression
    Usually there're just packed flags, but they extended this approach further, and just use interleaved bit and byte streams.
    I think it has some potential, despite being troublesome at encoding stage.

  5. #4
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    A few more details (sorry, last night I was tired and on cold/flu meds

    It uses a 16MB buffer for history, so 24bit offset values. So combined with the (64k x 16 x 4 = 4MB) hash table, it requires 20MB to compress / decompress. The buffer is not a sliding history - offsets are absolute, and the buffer is discarded when a new frame is started.

    Originally I was storing and checking the first byte, for hash validation, in the top byte of the offset, but that hurt compression.

    I played with a few trivial hash functions, and did see improved results... so perhaps an area worth more investigation.

    Shelwin,
    thanks for the links - I had not looked in depth at those. I had read Charles' paper on LZP.
    from the look of it, it may be worthwhile experimenting with different dimensions on the hash table - smaller hash, deeper lists.

    --
    Curtis

  6. #5
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    29
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Predictor Compression Protocol (PPP) = LZP.

    I have a simple program *praq* that is similar to PPP/LZP. One lzpn program is faster than the other in lzpn.zip, which searches for match strings in a longer time.

    You might want to take a look at praq.zip and lzpn.zip :

    https://sites.google.com/site/dataco...uide/predictor

    The programs use my variable-length coding function put_vlcode() in ucodes.c library which can write the universal codes Elias-Gamma codes and Exponential-Golomb codes.
    Last edited by compgt; 5th April 2019 at 11:21. Reason: put_vlcode()

  7. #6
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    Ah, yes, this looks like the first codec I implemented in my recent experimentation. Excellent for low memory situations because it uses a fixed space, and can be decoded in a _very_ simple loop. With a 1MB hash table it can achieve 58,018,831 on enwik8, and takes under 3 seconds on my laptop (i7-7500U).

    After a machine failure, I've rewritten my LZ code, this time using a smaller hash table (4k) and longer list (12. This time it has fixed 2byte match encoding, and compresses enwik8 to 53,077,086 in just under 51 seconds.

    Then I started playing with hash functions, table sizes, and some code optimisations. With a 64k hash table and "direct" 2-byte lookup, it manages 51,834,139 on enwik8 in roughly the same time.

    I was about to experiment with optimal and lazy parsing, but realised (a) that doesn't work well with my current search scheme, and (b) this current approach likely loses MANY possible matches.

    So I'm currently looking at efficient ways to find many matches concurrently, using vector operations...

    --
    C

  8. #7
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    29
    Thanks
    12
    Thanked 0 Times in 0 Posts
    the joys of LZ coding...

  9. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by FunkyBob View Post
    A few more details (sorry, last night I was tired and on cold/flu meds

    It uses a 16MB buffer for history, so 24bit offset values. So combined with the (64k x 16 x 4 = 4MB) hash table, it requires 20MB to compress / decompress. The buffer is not a sliding history - offsets are absolute, and the buffer is discarded when a new frame is started.

    Originally I was storing and checking the first byte, for hash validation, in the top byte of the offset, but that hurt compression.

    I played with a few trivial hash functions, and did see improved results... so perhaps an area worth more investigation.

    Shelwin,
    thanks for the links - I had not looked in depth at those. I had read Charles' paper on LZP.
    from the look of it, it may be worthwhile experimenting with different dimensions on the hash table - smaller hash, deeper lists.

    --
    Curtis
    don't drop the whole compressed block and start a new one. you can just drop the first half block away and use the last half block for matching.
    absolute offset is not an issue, you can batch cut them down when a half-block is finished.
    this strategy is implemented in orz, see https://github.com/richox/orz/blob/master/src/main.rs.

  10. The Following User Says Thank You to RichSelian For This Useful Post:

    FunkyBob (7th April 2019)

  11. #9
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    So last night I got my faster match function working (4bytes at a time, not 1) with significant improvements. I might detail that in another post; not because it's novel, but more for documentation for others.

    I also tried using a shorter list of matches (16, not 12 as recommended by a rant by cbloom. It hurt compression ratio, but (of course) drastically improved speed. Additionally, this means I can pack a (short ... up to 7 bytes) match into a single byte.
    enwik8 is 54,246,622 in ~8 seconds.

    I had considered cutting up the history into smaller chunks, and just rolling them along. Might investigate that on my long flight today [heading to Copenhagen from Australia...]

    This was all triggered by an idea for a fast lookup index structure, so I might also try that, in what I might describe as a "Length Limited LZ"...

    --
    C

  12. #10
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    So after more and more playing, I now have:

    - a new API that avoids lots of memory allocation and all file IO
    - a 64k x 16 buffer of history pointers
    - a faster 8-byte per loop string matcher. (gotta love ctzl )

    I just tried switching from a 2-byte context to 3 (hashed as a<<8 ^ b<<4 ^ c) and it regained a lot of the lost compression, and then some.

    It now takes ~11sec to compress enwik8 to 51,391,141
    and ~1m38sec to compress enwik9 to 449,420,341

    I'm now trying double the depth... which requires changing the encoding, and is slower.

    After that, I guess next step is to try lazy parsing...

    --
    Curtis

  13. #11
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    So, with 32 entries per hash slot, (and so only up to MML+2 for single byte match encoding, instead of +6) and:
    - encoding time roughly doubles, as we test for twice as many potential matches.
    - compression improves slightly.

    enwik8 : 49,997,389
    enwik9 : 435,913,931

  14. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    LZ4 result is supposedly 42,870,164?
    I think you need to beat at least that?

    Also, maybe use a better hash function? Like, involving some multiplication by a prime number?

  15. #13
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    So, indeed, a hash function based on multiplying each byte by a different prime, then xoring and masking to size... improved compression a tiny amount.

    So here I am, with the improved hash function:
    Code:
    return (a * 6151) ^ (b * 3079) * (c * 1543) & HASH_MASK;
    
    enwik8: 49,157,057
    enwik9: 428,393,440

    Instead, I tried a lazy parser... with poor results.

    First, I tried checking this match against the next, and if the next was longer, encoding a literal, and the next match. Compression was worse.

    enwik8: 50,140,228
    enwik9: 435,721,549

    Then I adjusted to make the next match have to be a bit larger if there were no pending literals. Worse again.

    enwik8: 59,880,961
    enwik9: 519,594,101

    Now I adjust for encoding size (match lengths of 3+MML or more require a second byte). This regained some of the compression:

    enwik8: 49,330,209
    enwik9: 429,692,408

    Next, if the next position had a longer match, I emitted a literal, and cycled on... so any number of positions could be skipped so long as we kept getting better matches.
    Compression was oddly identical.

    enwik8: 49,330,209
    enwik9: 429,692,408

    Am going to add a "make silesia" to my makefile so I can see results on that corpus.

  16. #14
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    So as a detour, I implemented a more "traditional" LZ that outputs either a literal run (or 1 to 128 lits), or a match.

    Matches are variable sized, 2 or 3 bytes offsets, 1 or 2 byte lengths.

    I've added lazy parsing, and got it to re-use the work to find the next match when available.

    I've pushed "encoding size" compensation down into the "find best match" loop. [Slowed things, but improved compression].

    And still uses a hash table of the N-last offsets for places to look for matches.

    Oh, and I swtiched my hashing function to crc32, using the [yes, non-portable] intel crc32 instruction. It's fast, and has a decent spread.

    Results? Actually, not so bad.

    Code:
    Reference: lz4 -BD -9 -k
    File   | Compressed  | Time
    enwik8 |  42,203,342 | ~11.5sec
    enwik9 | 374,086,002 | ~1m31sec
    
    HASH_DEPTH = 64:
    
    File   | Compressed  | Time
    enwik8 |  42,914,828 | ~38.8sec
    enwik9 | 374,719,846 | ~5m42sec
    
    HASH_DEPTH = 128:
    
    File   | Compressed  | Time
    enwik8 |  41,486,119 | ~1m9sec
    enwik9 | 362,226,644 | ~10m7sec
    
    Sure, my time is FAR slower, but the compression is now competitive with lz4!

    Profiling indicates about 50% of the time is spend in find_match_length... so ideally I need to find ways to avoid calling that - reduce the number of candidates, and increase their quality.

    Disappointingly, when I back-ported these changes to my rolz engine, its compression became _worse_.


    --
    Curtis

  17. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    you can replace crc32(x) & 0xFFF with (x*123456791)>>20 which is about as fast but more portable

  18. #16
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    Thanks, Bulat!

    I used >> 16 instead, since I want a 16bit hash... it works, but results for enwik8 worsened a little...

    crc32: 42,914,828
    new : 42,922,469

  19. #17
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    343
    Thanks
    235
    Thanked 226 Times in 123 Posts
    From my paq8px experience, I can say that crc32 vs a multiplicative hash gives comparable results, and small fluctuations are OK between files, and when using different multipliers.
    (A one file-test does not say much.)
    Please try different multipliers, and see which works best in your case testing them on multiple files. The multiplier does not have to be a prime, it just needs to be a large odd number.
    One of the better candidates is the Knuth one:

    // Golden ratio of 2^32 (not a prime)
    #define PHI32 UINT32_C(0x9E3779B9) // 2654435769
    // Golden ratio of 2^64 (not a prime)
    #define PHI64 UINT64_C(0x9E3779B97F4A7C15) // 11400714819323198485

    I would try that one certainly. In paq8px that is a core constant, works really well.

  20. #18
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    343
    Thanks
    235
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by FunkyBob View Post
    So here I am, with the improved hash function:
    Code:
    return (a * 6151) ^ (b * 3079) * (c * 1543) & HASH_MASK;
    When you need to hash a fixed number of values (and the values may include the number zero), something like this seems to work best:


    // Golden ratio of 2^32 (not a prime)
    #define PHI32 UINT32_C(0x9E3779B9) // 2654435769


    // Some more arbitrary magic (prime) numbers
    #define MUL32_1 UINT32_C(0x993DDEFF)
    #define MUL32_2 UINT32_C(0xE9C91DC1)


    // hashing 3 values:
    uint32_t hash(const uint32_t x0, const uint32_t x1, const uint32_t x2) {
    return (x0+1)*PHI32 + (x1+1)*MUL32_1 + (x2+1)*MUL32_2;
    }


    uint32_t hash_finalize(const uint32_t hash, const int hashbits) {
    assert(0<hashbits && hashbits<=32);
    return hash>>(32-hashbits);
    }



    Use it like this:
    uint32_t hashval = hash_finalize(hash(a,b,c),16);

    I don't know how it performs in your case, I recommend you give it a try.

  21. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    crc is better than multiplication since every output bit depends on every input bit, but difference isn't much. Multipliers closer to 2^31 should be a bit better than my one

    function like (x0+1)*PHI32 + (x1+1)*MUL32_1 + (x2+1)*MUL32_2 is slower than crc, also it's zero for trivial FFFFF...FFF input

  22. #20
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    Thanks all for your input. You sure know how to make a newcomer feel welcome in a community

    Well, I feel right now I'm better off finding a better data structure to yield higher quality match candidates, as most of my time appears to be spent in the match length function.

    Fine tuning the hashing function can come later, if it's needed at all

  23. #21
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    With much help from Selwin and others, I've now got something that (a) compresses fast, and (b) compresses OK*

    https://github.com/funkybob/zoom/tree/master/basiclz4

    (* it's not as good as LZ4, but it's not far off).

    Literals and Matches are differentiated by the high bit of the first byte being 0 (literal) or 1 (match).

    Literal runs are encoded with a preceding run length byte, 0 meaning 1 literal, 0x7f meaning 128.

    Matches are encoded as the length | 0x80, followed by the 16bit offset.

    It has two modes -- greedy, or lazy. Currently it requires recompiling to switch between. Lazy is 50 to 100% slower, for a not insignificant gain in compression.

    The "lazy" parsing has no limit to how many matches it will skip, so long as it keeps finding better matches.

    This, of course, opens the corner case of skipping so many literals, it could have encoded a smaller match in there, though I think this is unlikely.

    From my analysis, it's lagging behind LZ4s compression because:

    1. LZ4 can encode match / literal lengths in 4bits

    2. LZ4 omits explicit match/literal flags by just assuming every match is preceded by literals, and just wearing the cost [4bits] when it's wrong.

    As it is, I'm pleased that I've got this far, and now hope to improve compression with smarter parsing experiments.

    --
    Curtis

  24. #22
    Member FunkyBob's Avatar
    Join Date
    Jun 2017
    Location
    Australia
    Posts
    24
    Thanks
    1
    Thanked 4 Times in 3 Posts
    Thanks to Shelwien for spotting a bug, the fix for which may have even given a tiny performance improvement.

    Whilst searching for the problem, I noticed some of the lazy parsing was not really winning me anything, because whilst i was accounting for the overhead of creating a new literal run _before_ the match, I wasn't for after.

    As it turns out, upping the penalty of starting a new literal match from 1 to 2 has improved this, and slightly improved compression.

Similar Threads

  1. Replies: 13
    Last Post: 17th May 2014, 06:11
  2. TinyLZP - A very simple LZP compressor
    By david_werecat in forum Data Compression
    Replies: 8
    Last Post: 15th October 2012, 03:05
  3. Some of my toy compressors
    By RichSelian in forum Data Compression
    Replies: 27
    Last Post: 6th October 2011, 05:09
  4. Multithreading experiment
    By Cyan in forum Data Compression
    Replies: 11
    Last Post: 24th March 2011, 00:26
  5. Unaligned bitstring matching experiment
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 16th April 2010, 19:59

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •