+ Reply to Thread
Results 1 to 7 of 7

Thread: Would it be SR or PPM?

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Cracov, Poland
    Posts
    713

    Would it be SR or PPM?

    I have an idea for compression algorithm similar to both SR and PPM. Basically it would have multiple contexts of variable length (like in PPM) but in every context I would maintain MTF-like recency rank (possibly with addition of some short match history) instead of statistics, thus making it similar to SR.

    How to call that?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,895
    PPMd uses such ranks (especially ppmonstr) and is still considered PPM.
    I think symbol ranking is PPM without local stats (ie you keep symbol ranks, but not their statistics per context).

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Cracov, Poland
    Posts
    713
    How does PPMd use recency ranks? I thought PPMd just encodes a flag (or a series of flags if symbol not found in high contexts) per symbol + possibly interval for specific symbol (if there's more than one unmasked symbol in current context).

    Matt's definitions are:
    PPM (Prediction by Partial Match): order-n, modeled in longest context matched, but dropping to lower orders for byte counts of 0.
    SR (Symbol Ranking): order-n, modeled by time since last seen.
    My idea is: order-n, modeled in longest context matched by time since last seen, but dropping to lower orders if not present at current order.

    Additionally I would want to include a history tag per each context. Each history tag would consist of up to 5 3-bit numbers, each number would be the rank of symbol matched (but number 111b = 7 would mean unseen symbol for example). For very rare contexts (up to 6 occurences) that would provide both symbol ranking information and full statistics. So it would be at least partially a PPM.

    Full idea is then: order-n, modeled in longest context matched by time since last seen and short history of matches, but dropping to lower orders if not present at current order.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    1,294
    I'd probably classify it as a new algorithm (or maybe a type of CM depending on the model details).

    I considered something similar when I wrote SR2 but I wanted to keep it fast. The idea was to have two queues, like orders 3 and 4, then emit a code indicating which queue and which position in the queue. It gets complicated because a character might be in both queues and you would want to give it a higher probability in that case. SR2 already keeps a count of the number of consecutive occurrences of the character at the front for modeling purposes.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,895
    for ppmonstr its kinda like this:
    Code:
    // c = input byte
    sum = total;
    for( i=0; i<N; i++ ) {
       p = rank[i].freq*SCALE/sum;
       flag = (c==rank[i].symbol);
       rc.Encode( flag, SSE(p) );
       if( flag ) break;
       sum -= rank[i].freq;
    }
    if( flag==0 ) // process the lower order
    My opinion is that
    1. CM is a model which explicitly mixes predictions from different contexts -
    implicit mixing like escapes + masking doesn't count (and gives different results anyway).
    2. PPM is a model which tries to encode a symbol in one context, then goes to next context if it failed.
    3. SR is kinda like PPM, but doesn't keep symbol statistics per each context.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    1,294
    So ppmonstr models by frequency but uses a MTF queue to speed up linear search? Or does it also model by rank order too (implicit in SSE maybe)?

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,895
    1. Afair ppmonstr uses the same ranking by frequency as ppmd.
    (Its "afair" because I don't remember whether its a completely strict sorting, or
    some kind of approximation)
    2. Aside from rank==0, I don't see the rank itself used anywhere in probability estimation,
    but I don't think that it'd become "less PPM" if I add the rank to SSE context.

+ Reply to Thread

Similar Threads

  1. PPM o4 vs ST4
    By Gribok in forum Data Compression
    Replies: 17
    Last Post: 30th July 2010, 21:39
  2. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 03:47
  3. PPM with sparse contexts
    By encode in forum Data Compression
    Replies: 5
    Last Post: 9th July 2010, 03:37
  4. Poor compression of bit-version of PPM
    By Stefan in forum Data Compression
    Replies: 20
    Last Post: 16th March 2010, 17:58
  5. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 18:03

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts