1. On the Q Coder

Due to popular demand, and probably not enough other sources available, I here start a short series of posts on the Q coder and its variants.

2. The Following 4 Users Say Thank You to thorfdbg For This Useful Post:

Bulat Ziganshin (8th February 2014),cade (7th February 2014),Jarek (10th February 2014),Matt Mahoney (8th February 2014)

3. What is the Q Coder? The Q-Coder and its family (QM-coder, Q15-Coder, MQ-Coder) is a series of binary arithmetic coders that only work by shifting and adding. Multiplications are not required. Divisions are not required. Thus, it is pretty fast, and can be efficiently be implemented in hardware. The Q-coder family was developped by Joanne Mitchell, back at her time at IBM. The QM coder is used in the arithmetic coding variant of JPEG, the MQ coder in JBIG and JPEG 2000. It has been superceeded by the various arithmetic coders in MPEG which are entirely table-driven and still faster.

4. The Q coder is a binary arithmetic coder, that is, its alphabet is {0,1}. Its internal state consists of registers A,B,C and T, where A is the size of the coding interval, and C the codepoint within the interval, though both are represented in a non-obvious fixpoint representation. Otherwise, it uses the standard arithmetic coding paradigm, where the code interval is divided in two subintervals. The lower subinterval [0,Q] represents the LPS (least probable symbol), the interval [Q,1] the MPS (most probable symbol), where LPS and MPS can be either 0 or 1. The coder automatically adjusts the symbol probability Q to the source, and it also automatically switches between LPS and MPS. In all practical implementations, Q and LPS depend on an additional information, the context, coming from the outside and driving the coder. Thus, Q is actually a conditional probability depending on the context. The B register is a byte-buffer, the T register counts the number of available bits in C.

5. Like regular arithmetic coding, the code interval size stays not constant one, but is only divided proportionally to Q and gets smaller and smaller as coding proceeds. The size of the code interval is A. By proper normalization, the Q coder arranges always 0.75<A<1.5, with some tricks for overflow propagation. One can see easily that under LPS coding, C<-C, A<-Q*A, and under MPS coding C<-C+Q*A, A<-(1-Q)*A. By assuming that A is always about one (with some errors, only causing coding performance), the following simplifications can be made: Q*A is about A, hence LPS coding becomes C<-C,A<-Q. By the same reasoning, MPS coding becomes C<-C+Q, A<-A-Q.

6. Registers in the Q coder are not kep in floating point, but in integer, normalized such that 0.75 = 0x8000 (a bit strange indeed). Since 0.75<A<1.5, the topmost bit of the A register must always be 1. This is the reason for this strange normalization. Once it is clear whether the coding sub-interval is in the left or right subinterval of [0,1], a classical arithmetic coder would push out bits, and renormalize. The Q coder does not immediately flush such bits out, but rather moves them to the upper bits of the C register, where T counts how many spare bits are available. This way, an overflow can ripple from the lower bits to the upper bits and adjust the bits there. Once eight bits accumulated in the upper part of the C register, they are removed there and buffered in the B register. This will still allow for a carry-over from C to B, the classical "carry resolution" problem of arithmetic coding. The Q coder family has a couple of tricks how to deal with this situation which differs between QM and MQ. More on this later.

7. As in traditional A-Coding, the coder requires normalization if the coding interval becomes too small, which is here when A shrinks below 0.75 or equivalently, if the MSB of A becomes zero. If this is the case, double A, and double C - left shift. Remember that the upper bits from coding are by that shifted towards the upper half of C. Also decrement the T register then, indicating that one more bit became accumulated, and one less bit is free in the upper half of C. If T becomes zero, the upper bits of C are moved into B, cleared from C and T is reset to the number of bits removed from C to B. This is typically 8 (though can be 7 for the MQ coder, see below why).

8. Carry over resolution: Of course, as in classical ACoding, the carry-over problem also exists. By adding Q to C (see above), bits could overflow into the upper half of C, and since B extends C, could carry over into B. If B carries over, the written stream would have to be adjusted, which is of course impossible because the data is already written out. There are two solutions for this: In the QM coder, the code never writes a B register of 255 (0xff) to the stream since this could ripple over to the previous byte written. Instead, it just remembers how many 0xff are stuffed, and as soon as a carry over into one of the 0xff appears, or a carry over can be prevented by a non-0xff-bit, either the right amount of 0x00 (first case) or 0xff (second case) are written. This is the "classical approach".

9. The problem with the classical approach is that the latency of the QM coder can become arbitrarely large, i.e. it may take an infinite amount of time (in case of bad luck) until a byte pops out of the coder because so many 0xff's have been delays. In the MQ coder, the problem is resolved by removing only 7 bits from the C register after a 0xff has been cut off before. The 8th bit remains zero. Now, any carry over from the C register into the B register can ripple into this free bit, avoiding a carry over into the stream. This is called "bitstuffing". After eight one-bits, a single zero-bit is stuffed into the stream to stop a carry-over.

10. Conditional exchange: Remember that part of the Q coder estimate was that A*(1-Q) = A-Q for the MPS coding. Obviously, the MPS interval should always be bigger than the LPS interval, that is, A-Q > A/2. However, this is (with the above brute-force simplification) not the case if A < 2*Q. In this case, the QM and QM coders use a special coding mode called "conditional exchange". Under conditional exchange coding, that is, whenever A < 2*Q, encoding of an LPS uses the MPS-algorithm, and encoding of an MPS uses the LPS algorithm. This is simply because in such a case the LPS interval becomes bigger than the MPS interval, thus intervals just be used the other way around.

11. By this, we have the following coding rules: For LPS coding: If A < 2Q: A<-A-Q, C<-C+Q (exchange case), ELSE A<-Q (regular LPS).

For MPS coding: If A < 2Q: A<-Q (exchange), ELSE A<-A-Q, C<-C+Q (regular MPS case)

12. Q coding is made adaptive by taking Q from a table, and the table index driven by a state machine that observes the incoming data. The state machine is indexed by S, the current state. To each state S belongs a probability estimate, Q(S), a next state for LPS coding NextLPS(S), and a next state for MPS coding NextMPS(S). That is, if an MPS is detected, the probabily estimate for the MPS interval is enlarged by going to a state with a larger Q, and vice versa, going to a state with a smaller Q for an LPS. On certain states, namely if Q is about 0.5, MPS and LPS are exchanged. This is by a "switch" flag SWITCH(S), which instructs the encoder to exchange LPS and MPS if SWITCH(S) is set *and* an LPS has been coded.

13. The state adaption thus looks like this: For LPS coding: If SWITCH(S) then: MPS <- 1- MPS. Then continue always with: S<-NextLPS(S),Q<-Q(S).

For MPS coding: S<-NextMPS(S), Q<-Q(S). This is a little bit modified to enable a "quick coding path", that is, to reduce the complexity in the most probable execution path, which is (obviously) MPS coding. The switching and state adaption is only made if there is additional complexity in first place, namely if we need to renormalize. Thus, the above MPS-State adaption changes to:

If A < 0x8000 (A < 0.75) then: S<-NextMPS(S),Q<-Q(S). Otherwise, do nothing.

14. Finally, the whole coder is made context-sensitive by not having a single Q and S, but by making Q and S dependent on a context CTX that is passed in as a secondary parameter. That is, Q = Q(CTX), and S = S(CTX). The number of contexts depend heavily on the coding target and can vary between probably 10..20 up to 100.

15. maybe you merge your posts to one

16. Finally, a bit of code. Hopefully, you are able to discover all of the above:
```
#if 0
gcc-3.4 -o arthdeco -ggdb3 arthdeco.c
exit
#endif

/*
** Simple predictive arithmetic audio coding, (c) 2004 Thomas Richter,
** THOR Software.
** \$Id: arthdeco.c,v 1.4 2004-10-10 20:06:33 thor Exp \$
**
*/

#include <stdio.h>
#include <string.h>

/*
** context of the arithmetic coder: index into the probability table,
** most probable symbol
*/
struct MQContext {
unsigned char index;
unsigned char mp;
};

/*
** Arithmetic coder structure.
** The MQ coder is patented software, (c) IBM corp.
*/
struct MQCoder {
unsigned long     a;       /* the inverval */
unsigned long     c;       /* computation register */
int               ct;      /* bit counter */
int               flag;
unsigned char     b;       /* b output register */
};

/*
** MQ coder lookup tables
*/
const unsigned short Qe_Value[] = {
0x5601,0x3401,0x1801,0x0ac1,0x0521,0x0221,0x5601,0x5401,0x4801,0x3801,
0x3001,0x2401,0x1c01,0x1601,0x5601,0x54ff,0x5401,0x527d,0x5101,0x4c5f,
0x4801,0x3f80,0x3801,0x35f7,0x3401,0x31f6,0x3001,0x2801,0x2401,0x2201,
0x1c01,0x1801,0x1601,0x1401,0x1201,0x1101,0x0ac1,0x09c1,0x08a1,0x0521,
0x0441,0x02a1,0x0221,0x0141,0x0111,0x0085,0x0049,0x0025,0x0015,0x0009,
0x0005,0x0001,0x5601
};

/*
** MSB/LSB switch flags
*/
const unsigned char Qe_Switch[] = {
1,0,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0
};

/*
** Next state on MPS coding
*/
const unsigned char Qe_NMPS[] = {
1,2,3,4,5,44,7,8,9,10,
11,12,13,35,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,45,47,48,49,50,
51,51,52
};

/*
** Next state on LPS coding
*/
const unsigned char Qe_NLPS[] = {
1,6,9,12,35,39,6,14,14,14,
20,22,25,27,14,14,14,15,16,17,
18,19,20,21,22,23,24,25,26,27,
28,29,30,31,32,33,34,35,36,37,
38,39,40,41,42,43,44,45,46,47,
48,49,52
};

/*
** Open the MQ coder for writing
*/
static void OpenForWrite(struct MQCoder *self)
{
self->a     = 0x8000;
self->c     = 0;
self->ct    = 12;
self->b     = 0;
self->flag  = 0;
}

{
unsigned char t;

self->b     = getchar();
self->c     = self->b << 16;
t           = getchar();
self->ct    = 8;
if (self->b == 0xff) {
if (t < 0x90) {
self->c += t << 8;
self->ct--;
}
}
self->c  += t << 8;
self->b   = t;
self->c <<= 7;
self->ct -= 7;
self->a   = 0x8000;
}

static int GetBit(struct MQCoder *self,struct MQContext *ctx)
{
unsigned long q = Qe_Value[ctx->index];
unsigned char t;
int d;

self->a -= q;
if ((self->c >> 16) >= q) {
/* MPS case */
self->c -= q << 16;
if (self->a & 0x8000) {
/* short MPS case, return immediately. */
return ctx->mp;
}
/* MPS exchange here */
d = (self->a < q);  /* d == 1 on LPS */
} else {
/* LPS exchange here */
d = (self->a >= q); /* d == 1 on LPS */
self->a = q;
}
if (d) {
/* LPS decoding. Check for MPS/LPS exchange, adjust index */
d ^= ctx->mp;
if (Qe_Switch[ctx->index])
ctx->mp  = d;
ctx->index = Qe_NLPS[ctx->index];
} else {
/* MPS decoding */
d          = ctx->mp;
ctx->index = Qe_NMPS[ctx->index];
}
/* Now renormalize */
do {
if (self->ct == 0) {
t = getchar();
self->ct = 8;
if (self->b == 0xff) {
if (t < 0x90) {
self->c += t << 8;
self->ct--;
}
}
self->c += t << 8;
self->b  = t;
}
self->a <<= 1;
self->c <<= 1;
self->ct--;
} while((self->a & 0x8000) == 0);

return d;
}

/*
** Put out a bit thru the MQ coder.
*/
static void PutBit(struct MQCoder *self,struct MQContext *ctx,int bitf)
{
unsigned long q = Qe_Value[ctx->index];
char bit        = (bitf)?1:0;

self->a -= q;
/*
** Check for MPS/LPS coding
*/
if (bit == ctx->mp) {
/*
** MPS coding
*/
if (self->a & 0x8000) {
/* Short MPS case, no renormalization */
self->c += q;
return;
} else {
/* context change */
if (self->a < q) {
/* MPS/LPS exchange case */
self->a  = q;
} else {
self->c += q;
}
ctx->index = Qe_NMPS[ctx->index];
}
} else {
/* LPS coding here */
if (self->a < q) {
self->c += q;
} else {
self->a  = q;
}
ctx->mp   ^= Qe_Switch[ctx->index];
ctx->index = Qe_NLPS[ctx->index];
}
/*
** Renormalize now.
*/
do {
self->a <<= 1;
self->c <<= 1;
if (--self->ct == 0) {
if (self->b < 0xff) {
if (self->c & 0x8000000) {
/* Overflow into the b register, remove
** carry bit and go on */
self->b++;
self->c &= 0x7ffffff;
}
}
if (self->b == 0xff) {
/* We either have an 0xff here, or generated one due to carry.
** in either case, must have buffered something or the overflow
** could not have happened.
*/
putchar(0xff);
self->b  = self->c >> 20;
self->c &= 0xfffff;
self->ct = 7;
} else {
if (self->flag)
putchar(self->b);
self->b  = self->c >> 19;
self->c &= 0x7ffff;
self->ct = 8;
}
self->flag = 1;
}
} while((self->a & 0x8000) == 0);
}

/*
** Flush out the remaining bits
*/
static void Flush(struct MQCoder *self)
{
int k;
/*
** Number of bits to push out is then in k.
*/
self->c <<= self->ct;
for(k = 12 - self->ct;k > 0;k -= self->ct,self->c <<= self->ct) {
if (self->b < 0xff) {
if (self->c & 0x8000000) {
self->b++;
self->c &= 0x7ffffff;
}
}
if (self->b == 0xff) {
putchar(0xff);
self->b  = self->c >> 20;
self->c &= 0xfffff;
self->ct = 7;
} else {
if (self->flag)
putchar(self->b);
self->b  = self->c >> 19;
self->c &= 0x7ffff;
self->ct = 8;
}
self->flag = 1;
}
if (self->b < 0xff) {
if (self->c & 0x8000000) {
self->b++;
}
}
if (self->b != 0xff && self->flag)
putchar(self->b);
}

int compress(void)
{
struct MQContext ctxt[256*8];
struct MQContext *cx;
struct MQCoder   coder;
int c,code,i;
int last = 0;
int lb   = 0;

/* reset all contexts */
memset(&ctxt,0,sizeof(ctxt));
OpenForWrite(&coder);

do {
c = getchar();
if (c == EOF)
break;
code  = (c - last) & 0xff;
cx    = ctxt + (lb << 3);
lb    = code;
code ^= (code >> 1);
for(i = 0;i < 8;i++,cx++) {
PutBit(&coder,cx,code & (1 << i));
}
last = c;
} while(1);

Flush(&coder);

return 0;
}

int decompress(void)
{
unsigned char grey[256];
struct MQContext ctxt[256*8];
struct MQContext *cx;
struct MQCoder   coder;
int code,i,last = 0,lb = 0;

/* reset all contexts */
memset(&ctxt,0,sizeof(ctxt));
/*
** Initialize the inverse grey coding table.
*/
for(i = 0;i < 256;i++)
grey[i ^ (i >> 1)] = i;

while(!feof(stdin)) {
cx    = ctxt + (lb << 3);
for(i = 0,code = 0;i < 8;i++,cx++) {
if (GetBit(&coder,cx))
code |= (1 << i);
}
code  = grey[code];
lb    = code;
code += last;
last  = code;
putchar(code);
}

return 0;
}

int main(int argc,char **argv)
{

if (argv[1] && !strcmp("-d",argv[1]))
return decompress();
else
return compress();
}
```

17. The Following User Says Thank You to thorfdbg For This Useful Post:

JamesB (7th February 2014)

18. A word about patents: Conditional exchange was patented by Mitsubishi(?). Bitstuffing and parts of the Q coder by IBM. (All "as far as I know") IBM was greedy enough not to release the QM coder under FRAND conditions for 10918-1, with the result that only the Huffman coding part of 10918 became popular. Later on, W. Withers (by Pegasus, back then) offered the ELS-coder under FRAND conditions. ELS-coding also only uses additions and subtractions, but uses a different trick, namely by storing Q in the logarithmic domain: ELS = Extended logarithmic scale, and "by pure coincidence", also the nick-name of its inventor. IBM took this as a note and released the MQ coder under FRAND conditions for later standards, thus it became available under FRAND for JBIG, JBIG2 and JPEG 2000. Around 2003(? I don't remember precisely), ITU approached JPEG to standardize a variant of 10918-1 replacing the (still Mitsubishi-patented) QM coder with Q15-coder with bit-stuffing. JPEG declined - basically because it was pointless (Huffman was already dominating, the advantage was small, not compatible, and patents run out a couple of years later anyhow). This became ITU.T 851 (sp?), ITU only, non-ISO, another unused standard.

19. MQ and beyond: Nowadays, QM and MQ are actually no longer used in new codecs, while binary adaptive encoding is. The additions and adjustments of C and A registers are now more or less replaced entirely by table lookups that drive the register contents depending on their old value and their state. This is more precise and avoids a performance loss due to the replacement of the multiplication. More or less everything within recent MPEG developments works along these lines. Still, Q coding was a breakthrough and a masterpiece of Joanne because it was the first very practical approach of arithmetic coding that was hardware friendly and extremely fast. Unfortunately, the last time I saw her her health wasn't all the best - sorry about this. We owe you much!

The above code is only an illustration, and there are a couple of obvious and a couple of not-so-obvious methods to make it faster. Most of the branches can be eliminated, thus avoiding any branch-penalties on modern processors, and the whole thing can be made to "fly" if you know how to do it. Sorry that I cannot post this here, it's part of our "real code" which is "damn fast"(tm) using exactly our tiny little secrets. But in general, the common coding path for MPS-coding is, as you already see, just an addition, and that's what is executed most of the time.

20. Hope this helps a bit on understanding Q coding. Sorry about any mistakes in our today's mini-series, they are mine, all mine. Please feel free to correct me if my memory was failing, certainly possible that I mixed a couple of things, and hope this helps.

21. The Following 2 Users Say Thank You to thorfdbg For This Useful Post:

boxerab (29th December 2016),Matt Mahoney (8th February 2014)

22. thorfdbg, maybe you have heard of some large alphabet approximations (or table-based versions) of range/arithmetic coding? (I would like to compare with ANS)

23. Originally Posted by Jarek
thorfdbg, maybe you have heard of some large alphabet approximations (or table-based versions) of range/arithmetic coding? (I would like to compare with ANS)
Actually, no, I did not. Sorry to say that I haven't really haven't had the chance to look into ANS. I downloaded the article, but did not yet study it. Anyhow, the typical application is really binarization, then AC. Huffman and variations are also used a lot.

The problem with large alphabets and AC is that the interval size computation can take rather long, which you would typically want to avoid. Besides, it is not hard to see that a context-driven binary AC with properly defined probabilities is actually equivalent to a larger-alphabet AC. (Want to try this exercise?) Thus, the binary version is pretty much all that is needed.

Greetings,
Thomas

24. Thank you Thomas,
I thought that, to avoid multiplication, we could for example put into table the range rescaling of the range coding - it could be a slight speedup, but rather still far behind rANS, not to mention tANS/FSE.
Indeed the preparation for given probability distribution is costly, but the income is a few times smaller computational cost of actual data processing - it can be worth it if distribution doesn't vary so much.
Thanks,
Jarek

25. Thanks Thomas,
Very interesting post.

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•