4th September 2014, 10:32
Which ANS is the Best
Do you have proof that your ANS is the best?
Post your results in this thread.
5th September 2014, 00:25
I haven't got benchmarks to hand right now, but of the open source code it's almost certainly the tANS implementation in FSE as an overall winner.
Ryg's SIMD rANS will be best decoding speed I think, but it's not compatible with the non-SIMD version so you have to know you'll always have SIMD available. My own modification on Ryg's 32-bit version contain an order-1 entropy encoder which I don't think any others do, so it is "best" in one way, and relatively weak in others (obviously the speed). My order-0 variant beats Ryg's non-SIMD ones, but obviously slower than the SIMD.
There are some old benchmarks comparing them in:
Last edited by JamesB; 5th September 2014 at 00:27.
5th September 2014, 00:55
Indeed tANS is probably the most interesting variant (it also allows for simultaneous encryption), however, rANS especially with alias method has also interesting exceptional properties.
Regarding fastest implementations, dnd claims to have a faster one in his benchmark, but unfortunately without any confirmation. We can only test it inside lzturbo and zhuff still has almost twice faster decoding ...
Regarding squeezing the maximal rate for given tANS size in reasonable time, probably spread_tuned_s() from toolkit will be hard to beat, but it requires more information about probabilities than just quantizer. It could probably be improved a bit by some slow exhaustive search.
5th September 2014, 08:53
Note that latest version of FSE spread symbols using (partly) P also.
Specifically, it flags all symbols with probability < 1 quantization step, and place them at the end of the table.
The rest of the spread is the same as spread_fast(), except that we avoid overwriting the end area.
It empirically proved to offer the best trade-off (taking also in consideration the cost of header description).
Details can be found here :
5th September 2014, 09:13
Yes I noticed - placing p[s]<0.7/L symbols at the end of the table is indeed the first step of extending over the pure quantizer - and the most significant in terms of the rate.
With a good tuned spread we could store in the header (compressed) a better quantization than q[s]~L*p[s], then decoder would first find q[s] and use some tuned spread ... but finding a good trade-off here is a tough work for a tiny income ...
5th September 2014, 18:29
I'd forgotten about Charles Bloom's ANS implementations. It's interesting to note that his rANS is faster than his tANS, and pretty close to FSE.
9th September 2014, 01:19
There's no simple answer to even the question of "which ANS implementation is fastest".
Some things you have to consider :
1. How often are you sending statistics? eg. how many symbols are coded per table initialization?
In general rANS has faster table construction than tANS does, so the tradeoff will change based on how many symbols are coded per table.
2. Are you doing other bit-IO ?
eg. lots of exp-golomb type schemes are used in JPEG/LZ77/etc. which want to send raw bits.
If so, then tANS may be preferrable since it already uses a bit buffer. rANS which is byte-by-byte needs an extra bit buffer.
3. Are you able to code multiple symbols interleaved?
In a highly serial coding structure you may not be able to take advantage of that parallelism at all, since each symbol is dependent on the previous. (eg. trying to do something like context coding with ANS ruins interleaving)
Last edited by cbloom; 3rd August 2016 at 20:37.
The Following 5 Users Say Thank You to cbloom For This Useful Post:
Bulat Ziganshin (9th September 2014),Cyan (9th September 2014),Gonzalo (9th September 2014),Jarek (9th September 2014),Matt Mahoney (9th September 2014)
9th September 2014, 05:28
BTW... Can you please point me where the beginning of this ANS topic is? Somehow i lost the argument behind, but i want to understand it as deep as i can.
Thanks in advance!
9th September 2014, 16:13
It can be possible in some circumstances, but it isn't applicable everywhere. My strategy for order-1 entropy encoding with rANS is simply to start the codec in 4 separate spots per data chunk and interleave those 4 together. It's not ideal and not as quick as order-0 with interleaving of adjacent input bytes, but it works.
Originally Posted by cbloom
Enwiki9 gave 644057148 O(0) at 176MB/sec enc, 325MB/sec dec (icc) and 484003410 bytes O(1) at 107MB/sec enc and 165MB/sec dec. So around half speed for order-1.
Last edited by JamesB; 9th September 2014 at 16:20.