> Yes the contexts are not bounded.
Then I'd advice to get it working with bounded contexts
first. Actually I'm not aware of a PPM implementation
which could properly handle unbounded contexts... their
compression usually gets worse when you try to increase
context order after some point (depending on file).
order ppmd_vJ vJ_sh5 ppmy_3c 3c+SSE
4 216021 216065 216527 215707
5 210767 210711 209697 209401
6 209844 209597 207793 207654
> I've tried (long ago) the original Cormack DMC
> implementation and my recall is that compression was not
> impressive at all so I even did not consider it as a
> contender to start from.
Yeah, but its old,,, and other implementations might be
tuned for non-textual data...
But there's not much choice, if you want a bitwise PPM:
1. Using bitwise statistics and coding, but bytewise escapes
2. Only add the "proven" contexts to the tree (DMC way).
3. Keep all statistics but discard "unworthy" higher orders (LOE etc).
Also it actually applies to bytewise PPM as well, but there
its much more probable that the context with maximal order would
also give the best predictions (for stationary data at least).
> Bit version PPM has very simple trie organization
Use hashes if you want it simple - its kinda natural with
bitwise statistics and any tree is more complicated than
a hash, especially when you'd try to implement something
like a sliding window with it.
> and no need for escape probabilities
Escapes would still be useful for deterministic contexts
(where only one bit value was encountered).
> Did someone try to build probabilistic context in such
> manner that context is represented not by 0/1 but via some
> state in-between (let say [0...255]) according the 0/1
> probability of node state?
Here it sounds more like using the statistics as a context,
than a fuzzy context, but if you mean that, then its common enough,
known as SSE, APM, or a quantized history mapping.
But that's actually different from my idea of a "fuzzy context",
which imho should be either a quantized context (by coding with
overlapped intervals, for example), or a mix of all contextual estimations
with weights determined by similarity.
> however at the moment of 0/1 prediction the only current
> (longest context on trie) is used to asses P(0/1).
Well, the worst case for such a model would be a dictionary file
(like english.dic) - actually PPM and BWT have the same problem
with it, but it would be even worse if you'd just use the maxorder
context for a bit.
I mean, words have a lot of common suffices, which PPM would use
as a context for the next word... but statistics would be always
completely wrong if its a dictionary.
And though less frequently, but the same happens with plaintext
as well - its quite common to get "accidental" high order contexts,
which would provide random predictions, which is especially bad
>> (Because with mixing its actually hard to get more than 216k)
> Quite encouraging ;o(
Well, its a fact: last time I implemented a tree for bytewise
statistics (plain numbers of context occurences) and when
a dummy model was added (linear mix of contextual probability
distributions, with fixed weights), and right away it was 215.7k or something.
Also, ppmy is not much smarter than that too.