Got an idea for CM coder yesterday - again with compressed data structures basically.
There're currently two incompatible methods -
bytewise coders like ppmd, which use trees for statistics, and
bitwise coders like paq with hashtables and bitwise statistics.
And there're many benefits in each method,
but they don't combine - binary tree would take too much memory,
and dynamic symbol ranking like in ppmd doesn't work with probability-based statistics.
But now i've got a workaround - use fixed binary trees like in bitwise coders
but store them in compressed form, without empty nodes.
This allows to use some benefits of bytewise coders with bitwise.
For example, a coder like ppmd can be given 1G for encoding,
but if it uses only 10M (small file or something), then it can be decoded with 10M,
and it doesn't work like that with hashtables -
if paq compressed a 100 byte file with 1G hashtable,
it won't be able to decode it without the same table.
One problem though, is that i don't know how fast this can be :)
That is, ppmd-like speed is probably possible,
but access to that kind of data structure would be complicated
Well, it would be useful somewhere anyway :)
To be specific, I'm talking about something similar to nibble mask
tree in my "tree" compressor.
Tree nodes there are packed like this:
Latter version is actually more compact with more symbols
Suppose we have a context with '0' x 3, '1' x 2, 'A' x 1
(ie context history is 00011A)
Then ppmd's tree symbol table would be like this:
(3*(1+1+4)=18 bytes with byte counters)
And the context node in "tree" would be like this:
0000,0000,0001,1000 // 3x,4x enabled
0000,0000,0000,0011 // x0,x1 enabled = 0x30,0x31
0000,0000,0000,0010 // x1 enabled = 0x41
(ie if we add 5 more symbols 'B'-'F', it would be 18+6*5=48
And its main point is that it allows to locate a symbol
by code without linear scan.
But these masks were inherited from another experiment with
alternative statistical storage structures ("ctx" with layered
context masks), so I didn't notice that it is actually a compressed
data structure - a full frequency table for 256 symbols, where
zero freqs are compressed by disabling their bits in low nibble masks,
and all-zero masks are again compressed by high nibble mask.
Anyway, this same bitmask compression method can be applied to
binary trees as well.
So now I can make a bitwise CM with good counters (linear or fsm
instead of freqs) and a tree with precise statistics
(so more cache-friendly and without extra multiplication per order
for hash functions).