1. ## Modelling techniques

Any idea on restructuring this or adding something?

1. Basic statistics
1.1. CM (context modelling) with simple incremental counters (frequencies) -
despite common beliefs it can be faster than working with probabilities.
1.2. CM with shift-based probability counters (fpaq0-like)
1.3. combinatorical multiplication-based probability counters (frequency simulation)
1.4. state machine counters
1.5. delayed counters
1.6. delayed counters with context interpolation and indirect update
1.7. (=2.5) +context quantization
2. Secondary estimation
2.1. Using a quantized probability in context
2.2. Nonlinear probability mapping
2.3. +Interpolation
2.4. +Indirect update
3. Prediction merging techniques
3.1. Switching (eg. by amortized code length)
3.2. Static linear mixing
3.4. +Indirect update
3.5. Multi-dimensional version of "secondary estimation"
3.6. Update back-propagation
4. Precision control
4.1. Using interval arithmetics to calculate the prediction error (and make a correction)
4.2. Adaptive correction by using the prediction error in contexts
5. Parameter optimization techniques
5.0. Manual
5.1. Simple bruteforce
5.2. CM-driven bruteforce (using correlations between parameter set and output size)
5.3. Bayesian (likelihood-based) (eg. by polynomial approximation)
6. Symmetry control
6.1. Blockwise redundancy check
6.2. Statistical segmentation
6.3. Adaptive model optimization and parameter values encoding
7. Applied modelling
7.1. Hash function design
7.2. Speed optimization
7.2.1. Alphabet decomposition by Huffman's algorithm
7.2.2. Faster processing of probable cases in general
7.3. Serial improvement (adaptively using a secondary model to determine the design of primary model,
eg. symbol ranking)
7.4. Speculative processing (mostly relevant for decoding and threaded implementations)

2. Btw, I don't have any foundations but feel that "modeling" is something different from "modelling".
Like http://www.start-modeling.com

3. It would be great to provide some code examples for us. For example:
1.2. CM with shift-based probability counters (fpaq0-like)
pr+=((y<<12)-pr)>>5;

...

Just to create a small CM FAQ/glossary and intro!

4. You know, it'd take at least a page of description text for each technique
and a link to ready-to-use coder as example to be of some use.

Though in theory I have an idea about making a site dedicated to CM,
but i want my own blog engine for it and being lazy to write it.

5. For educational purposes would be better if we will provide one or two demo encoders with different techniques - just a different Predictor class or something. FPAQ0*, FCM1, M01 are good examples.
Also it would be super-cool if Chris will release his own small/demo open source CM encoder!
I'm still very interested in his CM/FSM-counters. I'm pretty sure that some day he will do that, or at least his ideas will be available to public! For example LZTURBO was already disassembled...

6. Make a wiki is very good!

7. Originally Posted by encode
For educational purposes would be better if we will provide one or two demo encoders with different techniques - just a different Predictor class or something. FPAQ0*, FCM1, M01 are good examples.
I think that's an excellent idea!

8. What about different alphabet decompositions? You mentioned huffman, but what about unary coding (by symbol rank)?
A long time ago i tried a fixed decomposition in buckets of length 4, 8 and 12 (nibble aligned). Not as good as huffman, but the results were better than ordinary flat decomposition (expect "random" data).

CM/FSM-counters
What do you mean here?

9. Originally Posted by toffer
What do you mean here?
1.4. state machine counters

10. Originally Posted by encode
1.4. state machine counters

11. I intended to write a list of techniques with increasing "advancement".
Of course its not complete, and its true that the only example of
alphabet decomposition there is 7.2.1, state machine counters is a
completely unrelated topic.

Also its nice that somebody finally said something relevant,
but I'd like to see some really different ideas from already mentioned.
There're some for sure.

12. Well, if you mention huffman decomposition you should also mention unary and flat dec. Huffman decomposition was already used within some CTW implementations; but isn't unary decomposition a bit more advanced?

Why should state machine counters be unrelated? They were introduced with PAQ (not that old) and do a nice job at higher orders, since similar contexts are merged - and the PAQ implementation fits them into bytes, albeit they're constructed heuristically (not based on empirical data).

13. > Well, if you mention huffman decomposition you should also mention unary and flat dec.

Its 7.2.1, an applied modelling example. There's also 7.2.2 which could cover unary etc.
But its about optimizations (mostly speed) by applying modelling to compressor itself.

And alphabet decomposition (along with whole coding) just isn't covered in the list.
Maybe you're right and we should make it a separate category somewhere between
(1) and (2).
Though imho coding techniques deserve another list.

> Huffman decomposition was already used within some CTW implementations;
> but isn't unary decomposition a bit more advanced?

Its not as its simpler to implement.
Especially if its blockwise-static huffman tree with which we need to remap
statistics from one decomposition to another.

> Why should state machine counters be unrelated?

I meant that they're unrelated to unary coding and decomposition at whole.
Generally, you can use any decomposition with any counters.

> They were introduced with PAQ (not that old)

In public, maybe. I think its a common technique in communication/media industry.
Eg. I doubt that it was adopted into h264 from paq.

> PAQ implementation fits them into bytes, albeit they're

ppmonstr isn't that worse with byte counters too, but without
any quantization and lookups.

Also I was only impressed with rnd-driven updates, when paq appeared.
While other things seemed common sense.

14. Originally Posted by Shelwien
Btw, I don't have any foundations but feel that "modeling" is something different from "modelling".
Like http://www.start-modeling.com
In Matt's paper Adaptive Weighing of Context Models for Lossless Data Compression we may find the Online Modeling term. So I beleive that Modeling is correct word.

15. They're both correct, as you can see in any dictionary.
But google shows how they're currently used.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•