# Thread: Would it be SR or PPM?

1. ## Would it be SR or PPM?

I have an idea for compression algorithm similar to both SR and PPM. Basically it would have multiple contexts of variable length (like in PPM) but in every context I would maintain MTF-like recency rank (possibly with addition of some short match history) instead of statistics, thus making it similar to SR.

How to call that? 2. PPMd uses such ranks (especially ppmonstr) and is still considered PPM.
I think symbol ranking is PPM without local stats (ie you keep symbol ranks, but not their statistics per context). 3. How does PPMd use recency ranks? I thought PPMd just encodes a flag (or a series of flags if symbol not found in high contexts) per symbol + possibly interval for specific symbol (if there's more than one unmasked symbol in current context).

Matt's definitions are:
PPM (Prediction by Partial Match): order-n, modeled in longest context matched, but dropping to lower orders for byte counts of 0.
SR (Symbol Ranking): order-n, modeled by time since last seen.
My idea is: order-n, modeled in longest context matched by time since last seen, but dropping to lower orders if not present at current order.

Additionally I would want to include a history tag per each context. Each history tag would consist of up to 5 3-bit numbers, each number would be the rank of symbol matched (but number 111b = 7 would mean unseen symbol for example). For very rare contexts (up to 6 occurences) that would provide both symbol ranking information and full statistics. So it would be at least partially a PPM.

Full idea is then: order-n, modeled in longest context matched by time since last seen and short history of matches, but dropping to lower orders if not present at current order. 4. I'd probably classify it as a new algorithm (or maybe a type of CM depending on the model details).

I considered something similar when I wrote SR2 but I wanted to keep it fast. The idea was to have two queues, like orders 3 and 4, then emit a code indicating which queue and which position in the queue. It gets complicated because a character might be in both queues and you would want to give it a higher probability in that case. SR2 already keeps a count of the number of consecutive occurrences of the character at the front for modeling purposes. 5. for ppmonstr its kinda like this:
Code:
```// c = input byte
sum = total;
for( i=0; i<N; i++ ) {
p = rank[i].freq*SCALE/sum;
flag = (c==rank[i].symbol);
rc.Encode( flag, SSE(p) );
if( flag ) break;
sum -= rank[i].freq;
}
if( flag==0 ) // process the lower order```
My opinion is that
1. CM is a model which explicitly mixes predictions from different contexts -
implicit mixing like escapes + masking doesn't count (and gives different results anyway).
2. PPM is a model which tries to encode a symbol in one context, then goes to next context if it failed.
3. SR is kinda like PPM, but doesn't keep symbol statistics per each context. 6. So ppmonstr models by frequency but uses a MTF queue to speed up linear search? Or does it also model by rank order too (implicit in SSE maybe)? 7. 1. Afair ppmonstr uses the same ranking by frequency as ppmd.
(Its "afair" because I don't remember whether its a completely strict sorting, or
some kind of approximation)
2. Aside from rank==0, I don't see the rank itself used anywhere in probability estimation,
but I don't think that it'd become "less PPM" if I add the rank to SSE context. #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•