# Thread: Simple bytewise context mixing demo

1. ## Simple bytewise context mixing demo

This is a C++ source of a simplest possible order16 bytewise mixing coder.
Its called "green" because it doesn't use any trees.
There're no complex data structures at all, so it should be easy to understand.
And it still compresses book1 to 215815 bytes, which is supposed to provide
a reasonable threshold for statistical text compression, as no dynamic mixing
or adaptive counters are used there.

http://ctxmodel.net/files/green/green_v0.rar

Still, the algorithm has O(N^2)-ish complexity due to source rescans
for high-order statistics collection, so for tests with larger files
(like >3M) I'd suggest to use a "less ecological" coder, which
has the same mixing function, but does use a tree for storage of
statistics:

http://ctxmodel.net/files/tree_v1.rar

2. Code:
```<toffer> very simple, indeed
<toffer> but that weighting formula is just ad-hoc i guess?
<Shelwien> not quite
<Shelwien> that weight divides the freq by total
<Shelwien> to convert to probabilities
<Shelwien> while the actual weight for probability distributions is (order+1)
<Shelwien> as to 8192 - 1/8192 is the weight of order-(-1)
<toffer> i see
<toffer> or probability precision
<toffer> well
<toffer> isn't that too inaccurate?
<toffer> anyway it'd be ok just for a demonstration
<Shelwien> sure, if you can suggest a better simple weighting formula - you're welcome :)
<toffer> try paq1's w ~ order^2
<Shelwien> i tried order^2 and it worked better for low orders (like o2-o3)
<Shelwien> but it causes some precision problems with higher orders
<toffer> i meant actually inaccurate due to the two involved divisions
<Shelwien> that's intended actually
<toffer> determinisitc contexts should receive higher weights
<Shelwien> i know :)
<Shelwien> I have a separate mixing function in ppmy and ash
<Shelwien> with whole tables of constants
<Shelwien> but is there really a reason to promote that kind of approach now?
<Shelwien> instead, i'd better implement a bytewise logistic mixer or something
<toffer> you mean non-tree based?
<Shelwien> i mean ad hoc weighting functions
<toffer> i don't see any reason in it - only attracting people
<Shelwien> anyway, if you can improve it with some simple modification - just do it :)
<toffer> what i asked for previously is the "depth" parameter
<Shelwien> its a rescan limit
<toffer> but why green?
<Shelwien> because i saved a tree with it :)```

3. Funny, currently I'm doing the same. After finetuning and some more finetuning and using 80mb of memory for calculation of escape-probabilities, I decided to try to find a simple unpolished version to mix the orders together. Book1 is doing 219000 in my case.

4. http://www.ctxmodel.net/files/green/green_v1.rar
- book1 = 214150 (with DEPTH=8192)
- Mixup() unrolled
- fixed the weighting function
- some simplifications

Made a mistake in v0 - with DEPTH=512 the actual book1
result was 217815, not 215815 as I thought.

> I decided to try to find a simple unpolished version to mix the orders together.

Well, its different in my case... I just got tired of people getting scared off
by ppmd/ppmy/ash/tree/etc data structures... and I couldn't suggest
anything simpler, but with good compression. So now we have a simple example.

5. http://www.ctxmodel.net/files/green/green_v2.rar
v2
book1 = 211691
- added hardcoded coefs for the weighting function

-----

The weighting function is still static

6. Shelwien,

Nice compact sourcecode, with impressive results compared to the size and complexity of the algorithm.

One question: How's the array coef generated?

One comment: your example would gain power with some more comments to explain the brief notation. For example explain the theory used in the simple mixer.

Edit: 493.187 on World95.txt for an order 16 compressor. Compared to the 211k on book1, it feels like the weighting is overoptimised on book1, because simular efficient algorithms from my side doing 211k on book1 would do around 460k on world95.txt. Maybe I'm wrong. The simplicity of the weighting would not be enough to say that speaking of optimisation of whatsoever is impossible.

7. Nice compact sourcecode, with impressive results compared to the size and complexity of the algorithm.
I concur

8. For modelling you generally divide a model into two components: structure and parameters. The model structure is usually fixed, but different parameter values yield different performance. In almost every engineering science it is common sense to tune model parameters to a give training set - but not in compression. I wonder why there're just few people around who realized the importance of that... Wrong model parameters introduce redundancy, thus worse compression.

The static weights are parameters and generated via parameter optimization - minimize the entropy w.r.t. model parameters. Shelwien has a simple perl script for that. The algorithm can be described like that:

- the model has N parameters x1, ..., xN; xi is a bitstring
- infinitely cycle through parameters i=1..N and search along a single degree of freedom (a single xi)
- modify the current xi by flipping bits randomly and keep it if the cost function value improves (compression)

To do some advertisement i have a small set of rather complex genetic optimization algorithms implemented in M1. Btw my current implementation with just four active models (compared to 16) compresses book1 to 210391. I guess this is a pretty good example to show the importance of parameter tuning.

Afair world95.txt is rather large. book1 is poorly chosen for compression of big files, since it's just ~770k. Longer inputs amortize statistics of higher order contexts, thus an optimization will give higher weights for higher orders.

9. I tried compressing enwik8. After 10 hours the output file filled up my disk Here is where it got stuck when I killed it:
Code:
```C:\tmp\green_v2>timer green c \res\enwik8 enwik8.ari

Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
Encoding.
^Cp=30856583/100000000 out=7101125

C:\tmp\green_v2>dir
Volume in drive C is OS
Volume Serial Number is 66E6-426E

Directory of C:\tmp\green_v2

01/22/2010  11:55 AM    <DIR>          .
01/22/2010  11:55 AM    <DIR>          ..
05/22/2000  05:47 PM           768,771 BOOK1
01/18/2010  10:45 PM               108 c.bat
01/22/2010  11:55 AM    22,594,666,496 enwik8.ari
01/20/2010  05:09 PM             4,268 green.cpp
01/20/2010  05:14 PM            21,504 green.exe
01/19/2010  03:02 PM               715 icl.cfg
01/18/2010  10:47 PM             1,581 sh_v1m.inc
01/18/2010  10:46 PM                87 test.bat
01/18/2010  10:47 PM               684 timer.inc
10 File(s) 22,595,464,585 bytes
2 Dir(s)               0 bytes free```

10. > Nice compact sourcecode, with impressive results compared
> to the size and complexity of the algorithm.

Why, thanks
The idea was to get somebody else to write a better new CM, though

> One question: How's the array coef generated?

Bruteforce search for a day, more or less like toffer explained.
Sometime long ago I did that manually though, its not really a
problem with just that many coefs.

> your example would gain power with some more
> comments to explain the brief notation.

and I really hate the "i++; // increment i" style,
so just ask if you really don't understand something.

> For example explain the theory used in the simple mixer.

You see, there's no theory.

Code:
```w = A/(B+total/(C+1))
= A*(C+1)/(B*(C+1)+total)
~= W/total

P[c] = Sum[ W[j] * o[j].freq[c]/o[j].total, j=0..maxorder ] / Sum[ W[j] ]```
Its just a weighted sum of context probability distributions
with static weights.
Its only written that way to avoid a 32-bit overflow.
The only important point there is that weights for
deterministic contexts are much bigger.

> Compared to the 211k on book1, it feels like the weighting
> is overoptimised on book1, because simular efficient
> algorithms from my side doing 211k on book1 would
> do around 460k on world95.txt.

Note that its a _static_ mixing.
Also, world95 is hardly a good example of a stationary text.

> Maybe I'm wrong. The simplicity of the weighting would not
> be enough to say that speaking of optimisation of
> whatsoever is impossible.

There's no weighting, simple or not - only static weights.
That was the point of this demo - not the development of
a new CM coder for my collection.

So testing it on world95, more so enwik8, wasn't intended -
there wasn't enough precision for that.
But ok, here I fixed the overflows and retuned it to world95
(btw its coefs show how weird it is).
Also, added a modification with a nibble tree for faster testing
(see http://ctxmodel.net/rem.pl?-31 http://ctxmodel.net/rem.pl?-32)

http://ctxmodel.net/files/green/green_v3.rar
Code:
```           green   tree
book1   215040   215037
world95   473691   471902
enwik8        ? 21605873*
* tested with order10, 1.2G memory```
- world95 parameter profile
- 64-bit mixing
- tree-based version (aka tree_v2)

11. green_v3 enwik8 -> 21,819,822 in 111475 sec. (about 32 hours). Didn't test decompression.

12. Well, I'd prefer to see some comments on the actual algorithm,
Like whether it makes sense as a CM tutorial, or whether
its possible to further simplify it somehow.

Also, an update with extra 2 days of world95 tuning:
http://ctxmodel.net/files/green/green_v3a.rar
Code:
```           green   tree
book1   214846 214843
world95   473047 471244
- updated world95 parameter profile```

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•