13th June 2019, 18:55
> Q1. 1<<10..1<<16 how it is derived?
I just know it from experience:
768,771 SCALElog
427,800 8
379,664 9
362,129 10
355,447 11
352,591 12
351,322 13
350,746 14
350,483 15
I mean, its obvious that having higher precision is better for prediction quality.
But I only know actual impact from empirical testing - it depends on sample data,
so its hard to derive it mathematically.
> Is there any dependency on the number of possible symbols or range?
With larger alphabet it can be possible to use worse precision.
For example, ppmd works pretty well with 6-bit counters and a byte alphabet.
Anyway, if you only care about compression, its always better to use higher precision.
I can observe some gains even with 64-bit counters vs 32-bit.
But most codecs are designed with speed optimization priority,
and that puts a hard limit on possible precision.
In fact, we can interpret a counter as a lossily compressed representation of context history.
So higher counter precision means more information about context preserved.
But, for example, more than 16 bits of probability precision requires
2x memory usage (because cpu only can directly work with 8,16,32,64-bit types),
also 17x17 bit product won't fit in a 32-bit variable.
Both of these facts are bad for coding speed.
There's also a relation between probability precision and rangecoder, yes.
1) range>maxprob has to be always true, otherwise range value could become 0,
and there'd be a decoding error at that point.
2) rc should be able to shift the range up by >= number of probability bits,
otherwise (1) is impossible.
In unoptimized code we can just use a range normalization loop,
but its certainly faster to put 1-2 ifs there instead.
LZMA only has 1 if for range renorm, since it knows that
range only can be reduced by (11-log2i(31))=7 bits per step,
so single 8-bit shift can compensate that.
>>SCALE/minprob<(1<<8
>Q2. I could not able to understand the idea behind this equation? Practically what does it mean?
After (de)coding of a symbol, new rc range is range*SCALE/p
And there's only one renorm iteration, which can add 8 bits to range.
From that we can see the minimum possible probability which won't cause rc underflow.
> got the results as shown in attached image NoOfBitsVsStepsTo31Prob.PNG.
> Is this observation have anything to do with the choice of probability model
> that we are using?
I don't think so.
(p>>5) becomes 0 for p<32,
so p-=(p>>5) can't produce p value lower than 31.
Which in turn affects rc precision, but there's no magic in this specific value.
Just as well, for some data types, compression may be better with p-=(p>>4) or p-=(p>>6).
In fact, I just use counters with multiplication instead, like (p-=(p*C)/SCALE),
and optimize C value for sample data.
> To observe the difference we tried plotting the probability graph and could
> not get the difference they are talking about, As graph seems to be similar
> in both cases (saturation is occurring in both cases).Please find attached
> graph in both cases.
There is a difference.
Try testing with nonstationary data, eg. BWT output
http://nishi.dreamhosters.com/u/BWT16.rar
A counter like p = SCALE*n/(n+n); ; n++;
would converge to a fixed value at some point
(and also would have problems with n overflows when there's infinite volume of data).
While a counter like p = (p*C) + bit*(1-C)
can adapt to recently processed data.