Results 1 to 11 of 11

Thread: ELI5: FSE/ANS

  1. #1
    Member
    Join Date
    Sep 2016
    Location
    China
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question ELI5: FSE/ANS

    I'm trying to learn how ANS/FSE/Zstandard work, but I find the information online a bit difficult to get through. I'd like to know if anybody here can explain me in simple terms how ANS/FSE work, and how they compare one another. Once I've better understood the general picture, I hope it will be much simpler to read the papers. Thanks.

  2. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    600
    Thanks
    186
    Thanked 186 Times in 112 Posts
    How to simply explain ANS is a difficult question - everybody does it in a different way - at the top here are links to explanations of a few persons: http://encode.ru/threads/2078-List-o...mplementations

    It would be great to find a really intuitive explanation, especially for Wikipedia article - here is a draft: https://en.wikipedia.org/wiki/Draft:...umeral_Systems

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,966
    Thanks
    153
    Thanked 802 Times in 399 Posts
    I think that it would be better to explain the rANS first in that wiki
    article, rather than uANS - that form looks pretty confusing.
    Also even for rANS there's only the "optimized" example with shifts etc,
    which also is not the most comprehensible.
    The "basic concepts" section seems ok though, so why not continue from that?

    Say, we have a coder state "s" and a "frequency table" freq[], where
    freq[i] is the number of occurences of i'th symbol in the data.
    Also, total = Sum(freq[i]); cdf[i]=Sum(freq[j],j=0..i-1);
    In this case, the optimal static code would require spending -log2(freq[i]/total)
    bits for encoding each occurrence of i'th symbol.
    So, we want to do something to "s" (the coder state) in such a way, that
    log2(s')-log2(s)=-log2(freq[i]/total), and its possible to decode i and s from s'.
    Let's start with
    s'=s*total+cdf[i] - it adds -log2(1/total) (=log2(total))
    and allows to decode i and s from s', but we still need
    to substract log2(freq[i]) bits somehow.
    But we know that any value in the range cdf[i]..cdf[i]+freq[i]-1 would have
    the same effect, so we can put additional information in that range, and
    thus compensate these log2(freq[i]) bits.
    Of course, we can take these bits from some other source, eg. another data stream
    (which, btw, could make sense in these fashionable interleaved schemes),
    or we can take them from the only other source - the value of s.
    So we'd split it like this:
    sd = s/freq[i]; sr = s % freq[i];
    and then compute s':
    s' = sd*total + cdf[i] + sr;
    ie
    s' = (s/freq[i])*total + Sum(freq[j],j=0..i-1) + s%freq[i];
    And suddenly, this satisfies all the requirements:
    cdf[i] + sr' = s' % total; sd' = s'/total;
    "i" can be found from cdf[i]+sr' value, and then we'd also know sr'.
    Then s = sd'*freq[i]+sr';

    Now, if "s" is supposed to fit in a cpu register, we'd have
    to occasionally move some part of its value to a data stream,
    but basically any convenient (and lossless) method can be used for this.
    For example, we can take 8 high bits of "s":
    l = (log2(s)-8); out_bits = s >> l; s = s % (1<<l)
    (And again, this might make sense in some implementation)
    But usually its easier and faster to use the low byte:
    out_bits = s % 256; s = s/256;

    In conclusion, I'd have to add that in my opinion, the actual benefit of
    this method (ie rANS) is not speed, as wiki article says, but precision.
    For example, I tried replacing my rangecoder with rANS in ppmd, and
    speed remained the same (note that divisions were used both in rANS and rc,
    because ppmd works with freq tables) - in fact rANS encoding should have been
    slower, because I didn't add caching and coding in reverse order.
    But rANS showed the same precision as a slow/precise rangecoder
    (ie log2 freq sum rounded up to a byte), while speed-optimized rangecoder
    added some overhead.

    Same way, interleaved/vectorized rangecoders are also possible,
    but add even more overhead due to limited precision (again, speed is not a problem).
    Also, rangecoders don't affect the compressed file format so much,
    so they won't ever be completely replaced by rANS.

  4. The Following 3 Users Say Thank You to Shelwien For This Useful Post:

    Bulat Ziganshin (24th December 2016),Jarek (28th September 2016),RamiroCruzo (10th April 2017)

  5. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    600
    Thanks
    186
    Thanked 186 Times in 112 Posts
    Anybody can contribute to wikipedia article - please do it if you know how to improve it.

    I have started with uABS because it seems the simplest to explain, I see it a direct continuation of "basic concepts" - just straightforward formulas for single parameter p, becomes standard binary system for p=1/2 ... and was the original motivation for the name (it was the first variant).
    For rANS, even rABS it's more complicated, as we additionally need to choose the denominator (2^n). There is written the optimized version, as this is probably the one which is actually used - sure, a lot of information is missing, but wikipedia article should be relatively compact.
    The tANS is kind of a different story, maybe there should be a separate article about it, e.g Finite State Entropy (?) - focusing on automata, starting with Huffman decoder (?)

    Regarding benefits, both arithmetic coding and ANS allows to get as close Shannon as we want - the basic used advantage is speed.
    Sure for some specific applications different tradeoffs might be crucial, but again - this should be only a compact introduction to the subject for someone who just wants to get basic intuitions, a short code to directly apply, etc.

    Also - it should provide links to more detailed descriptions - maybe you could describe your experience in details in e.g. a blog post to link there?

  6. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    422
    Thanks
    133
    Thanked 147 Times in 95 Posts
    Quote Originally Posted by Shelwien View Post
    In conclusion, I'd have to add that in my opinion, the actual benefit of
    this method (ie rANS) is not speed, as wiki article says, but precision.
    For example, I tried replacing my rangecoder with rANS in ppmd, and
    speed remained the same (note that divisions were used both in rANS and rc,
    because ppmd works with freq tables) - in fact rANS encoding should have been
    slower, because I didn't add caching and coding in reverse order.
    But rANS showed the same precision as a slow/precise rangecoder
    (ie log2 freq sum rounded up to a byte), while speed-optimized rangecoder
    added some overhead.
    I'd say it's very much speed being the benefit, with the penalty of having to structure your algorithm to encode in reverse. Arguably it's trading encode-division + decode-multiply (ANS) with encode-multiple + decode-division (arith), plus needing only one state instead of 2 (high / low) in non-binary entropy encoding. As you say, the division can sometimes be replaced by lookup tables if you're processing large blocks with static probabilities.

    For adaptive coding with non-stationary probabilities, I think it would be hard to remove the division without also losing precision (via quantisation and lookup tables), so we're essentially picking arithmetic coding over rANS coding based on encode or decode speed priority.

    The situation may well be different for binary coders though, where our arithmetic boils down to a very simple operation (with the cost of needing 8 times as many) for both encoder types.

    Same way, interleaved/vectorized rangecoders are also possible,
    but add even more overhead due to limited precision (again, speed is not a problem).
    Also, rangecoders don't affect the compressed file format so much,
    so they won't ever be completely replaced by rANS.
    Indeed interleaving / vectorization is possible for both.

    I know you have vectorised binary arithmetic coding some time ago. http://encode.ru/threads/1153-Simple...ll=1#post23197

    Much later I posed an interleaved byte-based static arithmetic coder, but within a week Fabian posted his interleaved rANS alternative that was better. http://encode.ru/threads/1867-Unroll...thmetic-coding

    However I think the precision argument is again misplaced. This mainly depends on whether you are adopting block based approaches for speed or adpative approaches for precision. Obvious arithmetic coding has a long history in adaptive coding (Ppm etc), but Charls & Fabian have demonstrated adaptive ANS works well too with vectorisation for the updates.

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,966
    Thanks
    153
    Thanked 802 Times in 399 Posts
    > was the original motivation for the name (it was the first variant).

    That's ok, but the problem is that it looks like a magic trick,
    while in rANS form its clear how the number (state) is rearranged and why.

    Btw, I also don't like the standard explanation of arithmetic coding
    (one with infinite fraction) for the same reason - people who use that
    to learn arithmetic coding are unable to modify the coder for non-standard
    applications.

    > Arguably it's trading encode-division + decode-multiply (ANS) with encode-multiple + decode-division (arith),

    But that's true only for static coders, while for adaptive bytewise coders (like ppmd)
    both multiplication and division would be present in both cases.

    > plus needing only one state instead of 2 (high / low) in non-binary entropy encoding.

    AC decoder also has basically a single state register, so does that really matter?

    > I know you have vectorised binary arithmetic coding some time ago.

    Unfortunately its not static, and also works with file i/o, so its hard for it
    to compete with modern memory-to-memory implementations.
    But on AVX etc, I'm not really sure which decoder (AC or rANS) would be faster,
    when properly optimized.

    > However I think the precision argument is again misplaced.

    The main problem is that rANS can just truncate state to 16 bits or less
    for comparison with symbol ranges, while AC would noticeably lose precision if
    less than 32 bits of code is used. Otherwise, decoder state updates are very similar
    in optimized binary rANS and AC.

    > demonstrated adaptive ANS works well too with vectorisation for the updates

    That can be used with AC too, especially in cases where updates are only performed
    once in a while.

    The problem is that everybody focused too much on this specific model, so there'd
    be sudden 100x drop in speed if we'd try to replace a simple order0 model with
    something more advanced, be it fsm counters or a probability mapping.

  8. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    600
    Thanks
    186
    Thanked 186 Times in 112 Posts
    Quote Originally Posted by Shelwien View Post
    >That's ok, but the problem is that it looks like a magic trick,
    while in rANS form its clear how the number (state) is rearranged and why.
    I have added motivation for uABS, but for rANS it is more complicated - would need adding some formalism - which might scare some readers.
    But please improve it if you see a way.

  9. #8
    Member
    Join Date
    Sep 2016
    Location
    China
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    First of all I believe you should not use links to IEEE, as the papers are behind a paywall. Isn't there any better source? Secondly, I think the introduction should be rewritten because it sounds like ANS was created by Facebook/Apple. I think you should just add the authors' names, then add Zstandard and Apple LZFSE as example implementation.

  10. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,966
    Thanks
    153
    Thanked 802 Times in 399 Posts
    He (Jarek Duda) is actually the author :)
    http://cyber.sci-hub.cc/MTAuMTEwOS9w...4/duda2015.pdf

    Also the first implementation is probably this - https://cs.fit.edu/~mmahoney/compression/fpaqa.zip (2007)
    it just wasn't popular before, until the vectorization breakthrough.

  11. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,483
    Thanks
    719
    Thanked 653 Times in 349 Posts
    i think that it was popularized by Andrew Polar, but he wasn't great in low-level programming. Then Matt approached it with binary codec. It was failure too - ANS isn't better than arithmetic at binary coders. And finally Cyan arrived with FSE (which isn't vecorized, just efficient low-level implementation)
    Last edited by Bulat Ziganshin; 28th September 2016 at 23:13.

  12. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (28th September 2016),willvarfar (30th September 2016)

  13. #11
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    600
    Thanks
    186
    Thanked 186 Times in 112 Posts
    The first variant (uABS, originally with switched encoder and decoder) was born in my physics MSc thesis mid-2006 (Polish: http://chaos.if.uj.edu.pl/%7Ekarol/prace/Duda06.pdf , alongside Maximal Entropy Random Walk). Both the concept and name was inspired by earlier working on Complex Base Numeral Systems (for my cs and math MScs).
    I have translated it into English as first version of http://arxiv.org/abs/0710.3861 (October 2007), early tANS variant can be seen in its later versions (end of 2007).
    The comp.compression discussion started by Mark Nelson was crucial - thanks of which Matt Mahoney found out about and implemented fpaqb and fpaqc (November 2007): https://groups.google.com/forum/#!to...on/_cYQFMijqXE
    Regarding Andrew Polar, I have just found email from him about his implementation: 22nd April 2008.
    ... 2009 Wolfram Simulator ...
    ... a few years later intensive:
    29.10.2013 Yann's thread: http://encode.ru/threads/1821-Asymetric-Numeral-System
    11.11.2013 rewritten arxiv - http://arxiv.org/abs/1311.2540
    26.11.2013 rABS: https://groups.google.com/d/msg/comp...I/c5CdgBXRx2cJ
    16.12.2013 FSE: http://fastcompression.blogspot.com/...-breed-of.html
    31.12.2013 rANS: https://groups.google.com/d/msg/comp...I/DnfEbRrHBUUJ
    7.01.2014 updated arxiv with tANS, rANS names (inspired by mRNA, tRNA etc.)
    2.02.2014 Fabian's rANS: https://fgiesen.wordpress.com/2014/02/02/rans-notes/

    Generally: http://encode.ru/threads/2078-List-o...mplementations
    Last edited by Jarek; 29th September 2016 at 00:23.

  14. The Following 3 Users Say Thank You to Jarek For This Useful Post:

    JamesB (29th September 2016),Mike (28th September 2016),willvarfar (30th September 2016)

Similar Threads

  1. Replies: 33
    Last Post: 2nd July 2018, 20:18
  2. XPACK - experimental compression format (LZ77+FSE)
    By Zyzzyva in forum Data Compression
    Replies: 10
    Last Post: 30th May 2016, 10:32
  3. nARANS (Nania Adaptive Range Variant of ANS)
    By Nania Francesco in forum Data Compression
    Replies: 9
    Last Post: 16th November 2014, 16:33
  4. Which ANS is the Best
    By calthax in forum Data Compression
    Replies: 8
    Last Post: 9th September 2014, 16:13
  5. LzTurbo Byte coding vs. Zhuff FSE
    By dnd in forum Data Compression
    Replies: 57
    Last Post: 24th January 2014, 04:25

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •