Results 1 to 4 of 4

Thread: State transitions in PPMd

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts

    State transitions in PPMd

    Is there any clear explaination of state transitions in PPMd? I've read the sources (ppmdrevs) and the paper (PPM: one step to practicality), and it's still not clear.

    An example, with N = 2:
    abcdefghabc

    At the second ab state, if I follow a successor chain, how does PPMd recognize that the second ab is the same state as the first ab (N = 2)? I can use a trie or hash table for the same purpose, but I would prefer to understand this method.

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    If you know how on-line suffix tree construction algorithm works then you know what is suffix pointer - for the record: a pointer to a node representing currently represented substring without the first symbol. IIRC PPMd uses trees that also have suffix pointers, so you can move from order(N) context to order(N - 1) context in O(1) time. Other difference between PPMd and suffix trees is that in PPMd the inner paths are not compressed - it allows for simplifications and space savings at lower model orders, but it hurts at high model orders (e.g. > 10).

    PPMd always tries to code symbol at the highest possible order context. When it fails (it doesn't find the symbol in that context) it uses suffix pointer to move to lower order context and retries. It also keeps track of symbols that were present in previous trials, so it removes them from current context to remove possible redundancy.

    Because of the highest possible order context approach it isn't always true that after each 'ab' PPMd will use the same context - if the maximum context order is higher than 2 then it can use different context that also ends with 'ab'.

  3. #3
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks for the explanation, although I'm still not clear on transitions. I understand most other topics in standard PPM (update exclusion, escape history, etc), it's only the state transitions in PPMd that I don't understand.

    I'm assuming in my example than the maximum N = 2 (fixed length). Is there any clear description of the structure of how this is done, or any example with the simple string I wrote above?

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    If maximum N = 2 then PPMd doesn't create nodes for longer contexts so after coding symbol at order 2 it follows the suffix pointer to get to order 1 and looks for the recently processed symbol and possibly goes to order 2 again.

    With your example:
    abcdefghabc and N = 2

    The first symbols ie abcdefgh will be coded at order(-1) I think (ie fixed probabilities).
    The second 'a' should be processed at order(0).
    After processing the second 'a' PPMd will go to the leaf that corresponds to the first 'a'.
    After that PPMd should code 'b' at order(1) with 'a' as an context.
    Similarly for 'c'.

    Not that leaves have a pointer to buffer (IIRC) so that PPMd doesn't need to have all leaf nodes at depth N. It's a form of compression of 'terminal' paths, ie those that don't branch anymore.
    In other words - PPMd only creates nodes when there is some branching at the node or descendant nodes.
    Otherwise it uses that pointer to buffer in leaf node and pretends there are single-symbol contexts.



    If you're inclined then you can look how Ukkonnen's online suffix tree construction algorithms works. It's tricky to understand but it's not complex in itself (ie there's not a lot of different steps). You can then view the PPMd state transitions like trasitions in Ukkonen's algorithm but change two things - decompress internal (ie non-leaf) paths (decompress only on splitting a leaf node) and limit the effective depth of tree.

  5. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    cade (24th February 2014)

Similar Threads

  1. Natural Language Processing, PPMd, PPMZ and frozen models
    By patentsguy in forum Data Compression
    Replies: 7
    Last Post: 6th June 2012, 21:01
  2. Compiling PPMd var J1 on Ubuntu
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 18th December 2011, 20:17
  3. PPMd + TSE
    By Serge Osnach in forum Data Compression
    Replies: 4
    Last Post: 30th April 2011, 10:21
  4. State machines
    By encode in forum Data Compression
    Replies: 3
    Last Post: 3rd November 2010, 21:58
  5. PPMd reformatting
    By Shelwien in forum Data Compression
    Replies: 25
    Last Post: 16th April 2009, 05:06

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •