Results 1 to 15 of 15

Thread: Musings on RLE vs "Fibonacci coding"

  1. #1
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts

    Musings on RLE vs "Fibonacci coding"

    I've been experimenting with ways to pre-process data containing lots of runs prior to entropy encoding. Now ideally I would be using an entropy encoder that has run-lengths built in as part of the model, so that it can use one distribution for run lengths and one for the symbols (or better yet one run-length distribution per symbol so we learn which are repeated and which are not). That does work well on my data, but initially I explored something far simpler that didn't require much work: a dictionary replacement scheme.

    Firstly, my data is just 7-bit ASCII, so I restricted myself to a very basic 128 additional symbols to indicate runs of N symbol occurrences for specific symbols, where N is taken from the Fibonacci series. Eg ABCCCCCCCCCCCCBBBB may be A B <8C> <3C> C <3B> B, which perhaps symbol 128=8C, 129=3C and 130=3B.

    I compared this to a variety of traditional run length encoders, but mostly the mespotine RLE worked best of this specific data set (lossy compressed DNA quality values). The data looks something like (with added linewraps for ease of viewing):

    Code:
    ????????????????????????????????????????????????????????????????????????????????
    ??????>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    >>>>>>>>>>>>>>>>>>>>>>>>>>>99999999?????????????????????????????????????????????
    ????????????????????????????????????????????????===========================#####
    #####################################################################::.....<..?
    *::::###########################################################################
    ##########@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<####################????????????????????????????
    ????????????????????????????????????????????????????????????????????????????????
    ????????????????????????????????????????????????????????????????????????????????
    ??????????????>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>?????????????????????????????????????????????
    ????????????????????????????????????????????????????????@@@@@@@@@@@@@@@@@@@@@@@@
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @@@@@@@@@@@@@@@@@@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>8888888*;;;;;;;;;;;;
    ;;;;;;.????55))?????????????????009999999&>>.@777*:::::::::::::::::999922=======
    Ideal for RLE... So firstly our rle implementation:

    Code:
    @ seq3c[comp]; ls -l /tmp/qc
    -rw-r--r-- 1 jkb team117 101000000 Mar 13 17:50 /tmp/qc
    @ seq3c[comp]; rle4 < /tmp/qc | wc -c
    1990383
    @ seq3c[comp]; rle4 < /tmp/qc | entropy16b
    len =        1990383 bytes, entropy16  =    1069035.7, 4.29680 bits per byte
    len =        1990383 bytes, entropy8   =    1254047.5, 5.04043 bits per byte
    len =        1990383 bytes, entropy8o1 =     886391.5, 3.56270 bits per byte
    @ seq3c[comp]; rle4 < /tmp/qc | gzip -9 |wc -c
    1074169
    @ seq3c[comp]; rle4 < /tmp/qc | bzip2 -9 |wc -c
    918833
    @ seq3c[comp]; rle4 < /tmp/qc | xz -9 |wc -c
    948696
    @ seq3c[comp]; rle4 < /tmp/qc | zstd -22 |wc -c
    989464
    @ seq3c[comp]; rle4 < /tmp/qc | bro --quality 11 | wc -c
    951326
    We can see from the entropy16b tool that all programs are beating the order-0 basic entropy calculation, but none are beating order-1 (the stats for which do not include the size of the order-1 frequency table). So complex LZ compression algorithms aren't useful here.

    Now I try the fib_code replacement:

    Code:
    @ seq3c[comp]; fib_code < /tmp/qc | wc -c
    2373984
    @ seq3c[comp]; fib_code < /tmp/qc | entropy16b         
    len =        2373984 bytes, entropy16  =    1305327.7, 4.39877 bits per byte
    len =        2373984 bytes, entropy8   =    1795256.5, 6.04977 bits per byte
    len =        2373984 bytes, entropy8o1 =     820013.0, 2.76333 bits per byte
    @ seq3c[comp]; fib_code < /tmp/qc | gzip -9 | wc -c
    1127506
    @ seq3c[comp]; fib_code < /tmp/qc | bzip2 -9 | wc -c
    928864
    @ seq3c[comp]; fib_code < /tmp/qc | xz -9 | wc -c
    980016
    @ seq3c[comp]; fib_code < /tmp/qc | zstd -22 | wc -c
    992135
    @ seq3c[comp]; fib_code < /tmp/qc | bro --quality 11 | wc -c
    988158
    In all cases, it's poorer, except one! The order-1 entropy is smaller because it is easier to predict the next symbol from the previous symbol when using a shrinking length repetition marker. Possibly this is simply because I have a length marker per symbol instead of for all symbol types mixed together (which is how RLE would work).

    This therefore means the order-0 and order-1 encoded sizes from rANS come out better with fib_code.

    Code:
    @ seq3c[comp]; cat /tmp/qc | ./a.out  -o1 2>/dev/null | wc -c
    1112593
    @ seq3c[comp]; rle4 < /tmp/qc | ./a.out  -o1 2>/dev/null | wc -c
    905611
    @ seq3c[comp]; fib_code < /tmp/qc | ./a.out  -o1 2>/dev/null | wc -c
    850297
    The order-1 codec doesn't approach the estimate 822k entropy due to the size of the tables, but it's not a bad approximation. (This is a tweak on my old codecs which took around twice as much space for the frequency table.)

    I'm still exploring different approaches to this data while trying to retain high speed (so ideally hoping for static analysis rather than dynamically updated probabilities), but I'm curious if anyone has explored optimal ways of encoding data while trying to shrink the order-1 entropy instead of an order-0 entropy? Clearly this is a hacky approach, with the right approach mentioned at the start - separate the run-length and symbols into their own streams and encode each independently. However this alternative gets surprisingly close and has the advantage of working as part of a pipeline without modifying my existing codecs.

    The very hacky fib_code source can be seen here:

    https://github.com/jkbonfield/mpeg_m...omp/fib_code.c

    I just noticed the comments are totally bogus. Like I say, hacky!

    Edit: traditional RLE (with guard byte) plus bsc beats all of these anyway at 792356 bytes and it benefits best from that traditional RLE over the other methods I try. Obviously CM is going to get better still. My point though isn't maximum compression at any cost.

  2. The Following 2 Users Say Thank You to JamesB For This Useful Post:

    Cyan (20th March 2017),Jarek (22nd March 2017)

  3. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Hi, so the standard Fibonacci coding is something different ( https://en.wikipedia.org/wiki/Fibonacci_coding ) - handling the simplest constraint coding (no '11', MERW can handle general ones: https://en.wikipedia.org/wiki/Maxima...py_Random_Walk ).

    Yours writing e.g. <11C> as <8C> <3C> is not a good idea due to freedom of writing it: you could alternatively use <3C> <8C>.
    So you have artificially added one bit of information which distinguishes these two situations - you waste lg(n!) every time while splitting into n parts this way.

    To do it right, e.g. split the information into letter and repetition - getting two types of data, encoded with separate probability distributions ... going toward modern LZ which split into 4 types of information (literal, literal_length, offset, match_length).

  4. #3
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    For simple encoding/preprocessing you can try TurboRLE or use bit-io with the run length gamma encoded

  5. #4
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by Jarek View Post
    Hi, so the standard Fibonacci coding is something different ( https://en.wikipedia.org/wiki/Fibonacci_coding ) - handling the simplest constraint coding (no '11', MERW can handle general ones: https://en.wikipedia.org/wiki/Maxima...py_Random_Walk ).

    Yours writing e.g. <11C> as <8C> <3C> is not a good idea due to freedom of writing it: you could alternatively use <3C> <8C>.
    So you have artificially added one bit of information which distinguishes these two situations - you waste lg(n!) every time while splitting into n parts this way.

    To do it right, e.g. split the information into letter and repetition - getting two types of data, encoded with separate probability distributions ... going toward modern LZ which split into 4 types of information (literal, literal_length, offset, match_length).
    That's a good point, but specifically I'm aiming at minimising order-1 entropy so repeating lengths in fixed order means 8 is often followed by 5, 3 or 2. The multiplicity of ways to encode this is automatically compensated for by the order-1 encoding.

    I agree though it's not the "right way" to do this, which is obviously separate data series. I'm just exploring if there are simple "good enough" approaches that also work, partly just for fun and partly because the simplicity appeals.

    Edit: I also have a little zlib driver test program so I can try out Z_RLE. It works OK, but huffman isn't really the best so it's pretty mediocre overall on this data. Using Z_RLE is almost identical size to TurboRLE's srle method coupled with Z_HUFFMAN_ONLY, so I guess it's RLE implementation via LZ is pretty basic and similar in implementation.

    PS. Thanks for the comment about Fibonacci coding. I didn't bother to check before naming my experiment fibonacci codes. They are totally different things.
    Last edited by JamesB; 20th March 2017 at 23:52.

  6. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by dnd View Post
    For simple encoding/preprocessing you can try TurboRLE or use bit-io with the run length gamma encoded
    I thought about bit-io which is fast, but only worth it if there is no downstream entropy encoding or the downstream coding is bit-based.

    Thanks for the idea of TurboRLE. It's fast for sure, but doesn't get the same results:

    I modified your main test code to spit data out to temporary files; _trle (TurboRLE), _srle (TurboRLE with esc), _mrle (Mespotine RLE) plus _frle (my fibonacci lengths).

    Code:
    @ seq3c[tmp/TurboRLE]; entropy16b < _trle
    len =        1912528 bytes, entropy16  =    1044787.5, 4.37029 bits per byte
    len =        1912528 bytes, entropy8   =    1209509.8, 5.05931 bits per byte
    len =        1912528 bytes, entropy8o1 =     882756.9, 3.69252 bits per byte
    @ seq3c[tmp/TurboRLE]; entropy16b < _srle
    len =        2506666 bytes, entropy16  =    1167371.0, 3.72565 bits per byte
    len =        2506666 bytes, entropy8   =    1380648.2, 4.40633 bits per byte
    len =        2506666 bytes, entropy8o1 =     955941.8, 3.05088 bits per byte
    @ seq3c[tmp/TurboRLE]; entropy16b < _mrle
    len =        2119389 bytes, entropy16  =    1087927.8, 4.10657 bits per byte
    len =        2119389 bytes, entropy8   =    1292273.5, 4.87791 bits per byte
    len =        2119389 bytes, entropy8o1 =     885514.3, 3.34253 bits per byte
    @ seq3c[tmp/TurboRLE]; entropy16b < _frle
    len =        2373984 bytes, entropy16  =    1305327.7, 4.39877 bits per byte
    len =        2373984 bytes, entropy8   =    1795256.5, 6.04977 bits per byte
    len =        2373984 bytes, entropy8o1 =     820013.0, 2.76333 bits per byte
    So you see the Fib rle output is smallest order-1 but rather poor for both order-0 and unencoded output. It has a strict 7-bit requirement for starters. However it was tested and tweaked with my specific data in mind. I tested it on a small chunk of BWTed enwiki9 though and it still does well (minus the 8-bit chars). It's not so hot on the entire thing mainly due to no blocking in the naive implementation.

    On much smaller blocks of data the Mespotine RLE seems to be best for entropy.
    Last edited by JamesB; 20th March 2017 at 23:56. Reason: Fixed table (ran on wrong input file!)

  7. The Following User Says Thank You to JamesB For This Useful Post:

    dnd (21st March 2017)

  8. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    What about qlfc? ( http://nishi.dreamhosters.com/u/bsc240_qlfc_v2.rar )
    It also includes an advanced RLE part, and your data looks like ranking transform would work for it.

  9. The Following User Says Thank You to Shelwien For This Useful Post:

    JamesB (21st March 2017)

  10. #7
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Qad worked well, but I can't test your bsc_qlfc as the rar doesn't decode under linux. It's rather slow though compared to bsc.

    Is qlfc a BWT post-processing step (like MTF), or a modification of the BWT algorithm in its own right?

  11. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I can't test your bsc_qlfc as the rar doesn't decode under linux.

    What about http://rarlab.com/rar_add.htm ?
    Also 7-zip should be able to unpack it.

    Anyway, here I attached v0 source for you - you probably won't be able to compile v2 anyway, on your linux.
    Should compile with "g++ qlfc.cpp -o qlfc", then can be used as "qlfc c/d input output".

    > It's rather slow though compared to bsc.

    Its a ripped-off postcoder from bsc (-e2 one though), so it can't be really slower than bsc (except for MT I guess).

    > Is qlfc a BWT post-processing step (like MTF), or a modification of the BWT algorithm in its own right?

    qlfc is a tricky reverse-MtF, which allows to use symbol code as context for its rank, which improves compression a lot comparing to plain MTF.

    And bsc postcoder consists of this reverse-MtF, then RLE, then advanced CM with logistic mixing and SSE.

    As it is, it would be maybe still too slow for your purpose, but you should be able to just drop the CM part in this case -
    qlfc+RLE+bitcode part converts enwik8.bwt to 214320941 bits = 26,790,118 bytes, for example.
    Attached Files Attached Files

  12. The Following 3 Users Say Thank You to Shelwien For This Useful Post:

    JamesB (21st March 2017),Mike (21st March 2017),RamiroCruzo (21st March 2017)

  13. #9
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Many thanks Shelwien. I meant qad was slow, not bsc. Sorry - bsc has always seemed to have a really good speed vs size tradeoff.

    Qlfc does indeed seem to work well; 844657 for this little demo file (862255 with fast qlfc mode). FWIW, the best full bsc output I can get is 768020, but that's one entire block (bsc e /tmp/qc _ -e2 -m0 -p -b1024).
    In reality though we need some level of random access, although it's unclear quite how fine grained people desire. (I suspect it changes depending on usage). However even on small subsections qlfc looks like a good performer, eg the first 1Mb encodes to 18589 bytes (17254 with full bsc) and smaller than my aging paq8o9 binary. The fib_code doesn't work well on small data sets alas. Ho hum!

    Test data attached, incase anyone is curious.
    Attached Files Attached Files

  14. #10
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Trivial RLE + entropy encoder example, as an alternative to the fib-coding idea. This code is a basic byte-wise range coder (yours infact Shelwien) and dynamic simple model, with separate modelling of literals (order-1) and run-lengths (order-0 symbol as context).

    Entire file:

    840008 fib_code | rans -o1
    822486 rle5
    879244 qlfc? (not sure what block size it uses).
    768020 bsc -e2 -m0 -p

    ~1Mb blocks:

    1062065 fib_code | rans -o1
    908350 rle5
    879244 bsc -e2 -m0 -p

    Basically I need to decide if straight libbsc is fast enough and if so simply use it. It's pretty quick still and not that much slower than my very basic range-coder variant anyway. Bsc does really well on this data set - it even beats cmix (just) on my 1Mb subset!
    Attached Files Attached Files

  15. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    That's a really old rc though - I'd recommend something like https://github.com/Shelwien/ppmd_sh/...pmd/sh_v1x.inc now.

    > not sure what block size it uses

    Whole file.

  16. #12
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    going toward modern LZ which split into 4 types of information (literal, literal_length, offset, match_length).
    In brotli we join the literal_length and match_length. They can be often interestingly correlated and this can be captured with a joint distribution.

  17. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    JamesB (22nd March 2017)

  18. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by JamesB
    That's a good point, but specifically I'm aiming at minimising order-1 entropy so repeating lengths in fixed order means 8 is often followed by 5, 3 or 2. The multiplicity of ways to encode this is automatically compensated for by the order-1 encoding.
    You are right, by sorting e.g. to always descending numbers of repetitions and fitting entire e.g. <8C> into a single byte, your o1 rANS will remove the problem of adding information about the order: by fixing probability of e.g. <3C><8C> to zero.

    Quote Originally Posted by Jyrki Alakuijala View Post
    In brotli we join the literal_length and match_length. They can be often interestingly correlated and this can be captured with a joint distribution.
    Sure, there remains large potential to exploit the correlation - at cost of complexity and memory needs, like the number and parameters of tables for entropy coding.
    As it is very dependent on data type and specific content, a flexibility of coding scheme might be worth to consider: from choosing between a few options, up to encoder analyzing correlations, calculating optimized coding scheme and even compiling encoder/decoder for this scheme.

  19. #14
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    The discussions about LZ are an interesting one which I hadn't really thought about for a while.

    Some of the data I'm looking at has random matches but no "meaningful" distant repetitions. This means naive attempts to spot and model these usually ends up increasing entropy instead of reducing it by shifting a randomness in literals to randomness in offset & length codes. However there is a correlation between the value of neighbouring bases (hence order-1 and to a lesser extent order-2 encoding working well). This was evident right from my early days of compressing the DNA qualities with good old zlib - Z_RLE offered better (and MUCH faster) compression than Z_DEFAULT_STRATEGY or Z_FILTERED. However better entropy encoders than the basic huffman in zlib coupled to better separation of models can only improve the situation, so I should investigate this again.

    This is also why I ended up using a trivial but fast order-1 model, but the more recent instruments have quantised the number of discrete qualities from ~40 to 8 and soon to 4. It makes runs much more likely and longer (especially given the correlation to the previous few values), making it important to model runs instead of just relying on the previous value alone. I now realise the fib-code thing is a bit of a poor hack, albeit one that does still apparently work. An adaptive model helps a lot more, but is slower. (Speed matters when you have petabytes of data!) The good news is there are blatantly obvious ways to do a first pass rapid compresson when you only have 4 values! Packing together improves the exploitation of neighbouring value correlation (it's comparable to using a larger context basically, but faster) while also speeding up the entropy coder. I think I'll end up with something that is part bit-packing, part run length modelling and part straight entropy encoding.

    Thanks all for the thought provoking discussions.

  20. #15
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    This is what I get:

    Code:
    time ./kanzi -compress -input=qc -output=qc.bwt -overwrite -block=101m -entropy=none -transform=bwt -verbose=0
    
    real    0m1.468s
    user    0m1.368s
    sys     0m0.100s
    
    
    time ./qlfc c qc.bwt qc.qlfc
    
    
    real    0m0.254s
    user    0m0.228s
    sys     0m0.020s
    
    qc.qlfc => 764882
    
    time ./kanzi -compress -input=qc -output=qc.knz -overwrite -block=101m -entropy=cm -transform=rlt+bwt -verbose=1
    Kanzi 1.0 (C) 2017,  Frederic Langlet
    Encoding ...
    
    
    Encoding:          470 ms
    Input size:        101000000
    Output size:       770200
    Ratio:             0.00762574
    Throughput (KB/s): 209846
    Maybe a limited depth radix sort instead of a full BWT would work well on this kind of data at much faster speed.

  21. The Following User Says Thank You to hexagone For This Useful Post:

    JamesB (27th March 2017)

Similar Threads

  1. new compressor LZNA = "LZ-nibbled-ANS" - "Oodle 1.45"
    By joerg in forum Data Compression
    Replies: 22
    Last Post: 19th February 2018, 05:50
  2. Replies: 14
    Last Post: 6th February 2017, 00:40
  3. Replies: 7
    Last Post: 4th January 2016, 15:06
  4. Replies: 11
    Last Post: 15th November 2015, 12:10
  5. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 21:53

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •