1. ## State tables

Which 8-bit state tables (for bit histories) are currently best? Which ones can I include in my next TarsaLZP version licensed under New BSD License?

In some TarsaLZP versions licensed under Public Domain I've included Matt's state tables which are however GPL licensed, so probably that was a GPL violation. What do you think?

2. 1. You can include the one from rc_v5, its public domain.
There's even a way to tune it to specific data, but the process is rather slow
(would take a few days likely).

2. You can use this, its technically written by me and is also public domain:
Code:
```struct fsm {
byte s[2]; // next state after bits 0,1
word n1;   // bit==1 freq
word n0;   // total freq (0+1)
word pp;
};
fsm cstate[256];
int Init_States( int x=0, int y=0 ) {
static int p = 0;
if( (x==57) || (y==57) ) x<=y ? y-- : x--;
if( r==0xFF ) {
r = p;
int p0 = p++;
cstate[p0].n0 = y+x;
cstate[p0].n1 = y;
cstate[p0].s[0] = Init_States( x+1, (y+3)/4 );
cstate[p0].s[1] = Init_States( (x+3)/4, y+1 );
cstate[p0].pp = (x*4+1)*SCALE/(x*4+1 + y*4+1);
}
return r;
}```
Although I guess it would be a good way to annoy Christian, because its derived from his ccm code.

So the best approach would be probably to understand how it works and make something similar on your own.

It also contains the only public example of parametric fsm counters.
But unfortunately its also not the simplest thing to use.

4. Personally, I'd really appreciate if you could spend some time on development of a proper fsm counter solution.
As currently all known implementations are rather inefficient.

3. I have played with Christian's generator and made my own, which works much better for my TarsaLZP. I've saved 86 kb on enwik8 and 990 kb on enwik9 when I used following generator for generating FSM for LZP models.

Code:
```#include <assert.h>
#include <ctype.h>
#include <math.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

uint8_t nlogd[1 << 11];
int8_t ilog[1 << 8];

void inline static initLogLut() {
ilog[0] = -1;
for (int64_t i = 1; i < 256; i++)
ilog[i] = 1 + ilog[i / 2];
for (int64_t i = 0; i < 1 << 11; i++)
nlogd[i] =
round(log2((double) (i + (1 << 11)) / (1 << 11)) * (1 << 11)) - i;
}

int8_t inline static iLog2(uint64_t const value) {
return 63 - __builtin_clzl(value);
}

uint64_t inline static nLog2(uint64_t const value) {
int8_t const ilog = iLog2(value);
uint64_t const norm = ilog > 11 ? value >> ilog - 11 : value << 11 - ilog;
//return (ilog - 1 << 11) + nlogd[norm - (1 << 11)] + norm;
return (ilog - 1 << 11) + norm; // broken version works better :)
}

struct fsm1 {
uint64_t s[2]; // next state after bits 0,1
int16_t n0, n1;
} cstate1[12345678];
uint64_t freqmask1[64 * 64 * 3 * 3];
int64_t p = 0;
int const limit = 20;

int min(int a, int b) {
return a < b ? a : b;
}

int max(int a, int b) {
return a < b ? b : a;
}

int clamp(int a) {
return min(a, limit);
}

#if 0

int divisor(int a, int b) {
return a + b * 2 / 5;
}

int repeated(int a, int b) {
return a + 1;
}

int opposite(int a, int b) {
return divisor(a, b) > 0 ? b * (a + 1) / divisor(a, b) : b;
}
#else

int divisor(int a, int b) {
return nLog2(b * 5 / 7);
}

int repeated(int a, int b) {
return divisor(a, b) > (1 << 11) ? ((a + 1 + (b << 7) / divisor(a, b)) << 11) / divisor(a, b) : a + 1;
//return divisor(a, b) > 0 ? (a + b / divisor(a, b)) / divisor(a, b) : a + 1;
}

int opposite(int a, int b) {
return divisor(a, b) > 0 ? (b << 11) / divisor(a, b) : b;
}
#endif

int Init_States(int x, int y, int c1, int c0) {
x = min(x, limit);
y = min(y, limit);
uint64_t * const r = &freqmask1[((y * 64 + x) * 3 + c1) * 3 + c0];
if (*r == UINT64_MAX) {
*r = p;
int p0 = p++;
cstate1[p0].n0 = x;
cstate1[p0].n1 = y;
cstate1[p0].s[0] = Init_States(repeated(x, y), opposite(x, y), c0, 0);
cstate1[p0].s[1] = Init_States(opposite(y, x), repeated(y, x), c0, 1);
}
return *r;
}

int main() {
Init_States(0, 0, 2, 2);
printf("%ld\n", p);
puts("static final int State_table[][] = {");
for (int i = 0; i < 256; i++)
printf("   {%3lu, %3lu, %3d, %3d, %3d, %3d},\n",
cstate1[i].s[0], cstate1[i].s[1],
cstate1[i].n0, cstate1[i].n1,
cstate1[i].n0 + cstate1[i].n1,
i);
puts("};");
return EXIT_SUCCESS;
}```
This generates symmetric FSMs. I'll think on assymetric FSMs, as it's somewhat pointless to remember long runs of LZP mismatches.

Update:
I've discovered that code above doesn't call initLogLut() thus arrays are zeroed (usually). That seems broken, but actually works better on enwiks so I'm leaving it as it is.

4. State tables from ZPAQ (libzpaq) are public domain. Actually the tables are generated at run time using parameters you can tune. There might be room for improvement because I discovered after writing the standard that a few states are unreachable.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•