# Thread: ELI5: FSE/ANS

1. ## ELI5: FSE/ANS

I'm trying to learn how ANS/FSE/Zstandard work, but I find the information online a bit difficult to get through. I'd like to know if anybody here can explain me in simple terms how ANS/FSE work, and how they compare one another. Once I've better understood the general picture, I hope it will be much simpler to read the papers. Thanks.

2. How to simply explain ANS is a difficult question - everybody does it in a different way - at the top here are links to explanations of a few persons: http://encode.ru/threads/2078-List-o...mplementations

It would be great to find a really intuitive explanation, especially for Wikipedia article - here is a draft: https://en.wikipedia.org/wiki/Draft:...umeral_Systems

3. I think that it would be better to explain the rANS first in that wiki
article, rather than uANS - that form looks pretty confusing.
Also even for rANS there's only the "optimized" example with shifts etc,
which also is not the most comprehensible.
The "basic concepts" section seems ok though, so why not continue from that?

Say, we have a coder state "s" and a "frequency table" freq[], where
freq[i] is the number of occurences of i'th symbol in the data.
Also, total = Sum(freq[i]); cdf[i]=Sum(freq[j],j=0..i-1);
In this case, the optimal static code would require spending -log2(freq[i]/total)
bits for encoding each occurrence of i'th symbol.
So, we want to do something to "s" (the coder state) in such a way, that
log2(s')-log2(s)=-log2(freq[i]/total), and its possible to decode i and s from s'.
Let's start with
s'=s*total+cdf[i] - it adds -log2(1/total) (=log2(total))
and allows to decode i and s from s', but we still need
to substract log2(freq[i]) bits somehow.
But we know that any value in the range cdf[i]..cdf[i]+freq[i]-1 would have
the same effect, so we can put additional information in that range, and
thus compensate these log2(freq[i]) bits.
Of course, we can take these bits from some other source, eg. another data stream
(which, btw, could make sense in these fashionable interleaved schemes),
or we can take them from the only other source - the value of s.
So we'd split it like this:
sd = s/freq[i]; sr = s % freq[i];
and then compute s':
s' = sd*total + cdf[i] + sr;
ie
s' = (s/freq[i])*total + Sum(freq[j],j=0..i-1) + s%freq[i];
And suddenly, this satisfies all the requirements:
cdf[i] + sr' = s' % total; sd' = s'/total;
"i" can be found from cdf[i]+sr' value, and then we'd also know sr'.
Then s = sd'*freq[i]+sr';

Now, if "s" is supposed to fit in a cpu register, we'd have
to occasionally move some part of its value to a data stream,
but basically any convenient (and lossless) method can be used for this.
For example, we can take 8 high bits of "s":
l = (log2(s)-8); out_bits = s >> l; s = s % (1<<l)
(And again, this might make sense in some implementation)
But usually its easier and faster to use the low byte:
out_bits = s % 256; s = s/256;

In conclusion, I'd have to add that in my opinion, the actual benefit of
this method (ie rANS) is not speed, as wiki article says, but precision.
For example, I tried replacing my rangecoder with rANS in ppmd, and
speed remained the same (note that divisions were used both in rANS and rc,
because ppmd works with freq tables) - in fact rANS encoding should have been
slower, because I didn't add caching and coding in reverse order.
But rANS showed the same precision as a slow/precise rangecoder
(ie log2 freq sum rounded up to a byte), while speed-optimized rangecoder
added some overhead.

Same way, interleaved/vectorized rangecoders are also possible,
but add even more overhead due to limited precision (again, speed is not a problem).
Also, rangecoders don't affect the compressed file format so much,
so they won't ever be completely replaced by rANS.

4. ## The Following 3 Users Say Thank You to Shelwien For This Useful Post:

Bulat Ziganshin (24th December 2016),Jarek (28th September 2016),RamiroCruzo (10th April 2017)

5. Anybody can contribute to wikipedia article - please do it if you know how to improve it.

I have started with uABS because it seems the simplest to explain, I see it a direct continuation of "basic concepts" - just straightforward formulas for single parameter p, becomes standard binary system for p=1/2 ... and was the original motivation for the name (it was the first variant).
For rANS, even rABS it's more complicated, as we additionally need to choose the denominator (2^n). There is written the optimized version, as this is probably the one which is actually used - sure, a lot of information is missing, but wikipedia article should be relatively compact.
The tANS is kind of a different story, maybe there should be a separate article about it, e.g Finite State Entropy (?) - focusing on automata, starting with Huffman decoder (?)

Regarding benefits, both arithmetic coding and ANS allows to get as close Shannon as we want - the basic used advantage is speed.
Sure for some specific applications different tradeoffs might be crucial, but again - this should be only a compact introduction to the subject for someone who just wants to get basic intuitions, a short code to directly apply, etc.

Also - it should provide links to more detailed descriptions - maybe you could describe your experience in details in e.g. a blog post to link there?

6. Originally Posted by Shelwien
In conclusion, I'd have to add that in my opinion, the actual benefit of
this method (ie rANS) is not speed, as wiki article says, but precision.
For example, I tried replacing my rangecoder with rANS in ppmd, and
speed remained the same (note that divisions were used both in rANS and rc,
because ppmd works with freq tables) - in fact rANS encoding should have been
slower, because I didn't add caching and coding in reverse order.
But rANS showed the same precision as a slow/precise rangecoder
(ie log2 freq sum rounded up to a byte), while speed-optimized rangecoder
added some overhead.
I'd say it's very much speed being the benefit, with the penalty of having to structure your algorithm to encode in reverse. Arguably it's trading encode-division + decode-multiply (ANS) with encode-multiple + decode-division (arith), plus needing only one state instead of 2 (high / low) in non-binary entropy encoding. As you say, the division can sometimes be replaced by lookup tables if you're processing large blocks with static probabilities.

For adaptive coding with non-stationary probabilities, I think it would be hard to remove the division without also losing precision (via quantisation and lookup tables), so we're essentially picking arithmetic coding over rANS coding based on encode or decode speed priority.

The situation may well be different for binary coders though, where our arithmetic boils down to a very simple operation (with the cost of needing 8 times as many) for both encoder types.

Same way, interleaved/vectorized rangecoders are also possible,
but add even more overhead due to limited precision (again, speed is not a problem).
Also, rangecoders don't affect the compressed file format so much,
so they won't ever be completely replaced by rANS.
Indeed interleaving / vectorization is possible for both.

I know you have vectorised binary arithmetic coding some time ago. http://encode.ru/threads/1153-Simple...ll=1#post23197

Much later I posed an interleaved byte-based static arithmetic coder, but within a week Fabian posted his interleaved rANS alternative that was better. http://encode.ru/threads/1867-Unroll...thmetic-coding

However I think the precision argument is again misplaced. This mainly depends on whether you are adopting block based approaches for speed or adpative approaches for precision. Obvious arithmetic coding has a long history in adaptive coding (Ppm etc), but Charls & Fabian have demonstrated adaptive ANS works well too with vectorisation for the updates.

7. > was the original motivation for the name (it was the first variant).

That's ok, but the problem is that it looks like a magic trick,
while in rANS form its clear how the number (state) is rearranged and why.

Btw, I also don't like the standard explanation of arithmetic coding
(one with infinite fraction) for the same reason - people who use that
to learn arithmetic coding are unable to modify the coder for non-standard
applications.

> Arguably it's trading encode-division + decode-multiply (ANS) with encode-multiple + decode-division (arith),

But that's true only for static coders, while for adaptive bytewise coders (like ppmd)
both multiplication and division would be present in both cases.

> plus needing only one state instead of 2 (high / low) in non-binary entropy encoding.

AC decoder also has basically a single state register, so does that really matter?

> I know you have vectorised binary arithmetic coding some time ago.

Unfortunately its not static, and also works with file i/o, so its hard for it
to compete with modern memory-to-memory implementations.
But on AVX etc, I'm not really sure which decoder (AC or rANS) would be faster,
when properly optimized.

> However I think the precision argument is again misplaced.

The main problem is that rANS can just truncate state to 16 bits or less
for comparison with symbol ranges, while AC would noticeably lose precision if
less than 32 bits of code is used. Otherwise, decoder state updates are very similar
in optimized binary rANS and AC.

> demonstrated adaptive ANS works well too with vectorisation for the updates

That can be used with AC too, especially in cases where updates are only performed
once in a while.

The problem is that everybody focused too much on this specific model, so there'd
be sudden 100x drop in speed if we'd try to replace a simple order0 model with
something more advanced, be it fsm counters or a probability mapping.

8. Originally Posted by Shelwien
>That's ok, but the problem is that it looks like a magic trick,
while in rANS form its clear how the number (state) is rearranged and why.
I have added motivation for uABS, but for rANS it is more complicated - would need adding some formalism - which might scare some readers.
But please improve it if you see a way.

9. First of all I believe you should not use links to IEEE, as the papers are behind a paywall. Isn't there any better source? Secondly, I think the introduction should be rewritten because it sounds like ANS was created by Facebook/Apple. I think you should just add the authors' names, then add Zstandard and Apple LZFSE as example implementation.

10. He (Jarek Duda) is actually the author :)
http://cyber.sci-hub.cc/MTAuMTEwOS9w...4/duda2015.pdf

Also the first implementation is probably this - https://cs.fit.edu/~mmahoney/compression/fpaqa.zip (2007)
it just wasn't popular before, until the vectorization breakthrough.

11. i think that it was popularized by Andrew Polar, but he wasn't great in low-level programming. Then Matt approached it with binary codec. It was failure too - ANS isn't better than arithmetic at binary coders. And finally Cyan arrived with FSE (which isn't vecorized, just efficient low-level implementation)

12. ## The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

Cyan (28th September 2016),willvarfar (30th September 2016)

13. The first variant (uABS, originally with switched encoder and decoder) was born in my physics MSc thesis mid-2006 (Polish: http://chaos.if.uj.edu.pl/%7Ekarol/prace/Duda06.pdf , alongside Maximal Entropy Random Walk). Both the concept and name was inspired by earlier working on Complex Base Numeral Systems (for my cs and math MScs).
I have translated it into English as first version of http://arxiv.org/abs/0710.3861 (October 2007), early tANS variant can be seen in its later versions (end of 2007).
The comp.compression discussion started by Mark Nelson was crucial - thanks of which Matt Mahoney found out about and implemented fpaqb and fpaqc (November 2007): https://groups.google.com/forum/#!to...on/_cYQFMijqXE
Regarding Andrew Polar, I have just found email from him about his implementation: 22nd April 2008.
... 2009 Wolfram Simulator ...
... a few years later intensive:
29.10.2013 Yann's thread: http://encode.ru/threads/1821-Asymetric-Numeral-System
11.11.2013 rewritten arxiv - http://arxiv.org/abs/1311.2540
26.11.2013 rABS: https://groups.google.com/d/msg/comp...I/c5CdgBXRx2cJ
16.12.2013 FSE: http://fastcompression.blogspot.com/...-breed-of.html
31.12.2013 rANS: https://groups.google.com/d/msg/comp...I/DnfEbRrHBUUJ
7.01.2014 updated arxiv with tANS, rANS names (inspired by mRNA, tRNA etc.)
2.02.2014 Fabian's rANS: https://fgiesen.wordpress.com/2014/02/02/rans-notes/

Generally: http://encode.ru/threads/2078-List-o...mplementations

14. ## The Following 3 Users Say Thank You to Jarek For This Useful Post:

JamesB (29th September 2016),Mike (28th September 2016),willvarfar (30th September 2016)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•