Results 1 to 4 of 4

Thread: State tables

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts

    State tables

    Which 8-bit state tables (for bit histories) are currently best? Which ones can I include in my next TarsaLZP version licensed under New BSD License?

    In some TarsaLZP versions licensed under Public Domain I've included Matt's state tables which are however GPL licensed, so probably that was a GPL violation. What do you think?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. You can include the one from rc_v5, its public domain.
    There's even a way to tune it to specific data, but the process is rather slow
    (would take a few days likely).

    2. You can use this, its technically written by me and is also public domain:
    Code:
    struct fsm {
      byte s[2]; // next state after bits 0,1
      word n1;   // bit==1 freq
      word n0;   // total freq (0+1)
      word pp;
    };
    fsm cstate[256];
    byte freqmask[64*64];
    int Init_States( int x=0, int y=0 ) {
      static int p = 0;
      if( p==0 ) memset( freqmask, 0xFF, sizeof(freqmask) );
      if( (x==57) || (y==57) ) x<=y ? y-- : x--;
      byte& r = freqmask[y*64+x];
      if( r==0xFF ) {
        r = p;
        int p0 = p++;
        cstate[p0].n0 = y+x;
        cstate[p0].n1 = y;
        cstate[p0].s[0] = Init_States( x+1, (y+3)/4 );
        cstate[p0].s[1] = Init_States( (x+3)/4, y+1 );
        cstate[p0].pp = (x*4+1)*SCALE/(x*4+1 + y*4+1);
      }
      return r;
    }
    Although I guess it would be a good way to annoy Christian, because its derived from his ccm code.

    So the best approach would be probably to understand how it works and make something similar on your own.

    3. There's also https://sites.google.com/site/toffer86/m1-project
    I'm not sure about its license, but toffer probably won't be too greedy about it.
    It also contains the only public example of parametric fsm counters.
    But unfortunately its also not the simplest thing to use.

    4. Personally, I'd really appreciate if you could spend some time on development of a proper fsm counter solution.
    As currently all known implementations are rather inefficient.

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I have played with Christian's generator and made my own, which works much better for my TarsaLZP. I've saved 86 kb on enwik8 and 990 kb on enwik9 when I used following generator for generating FSM for LZP models.

    Code:
    #include <assert.h>
    #include <ctype.h>
    #include <math.h>
    #include <stdbool.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    uint8_t nlogd[1 << 11];
    int8_t ilog[1 << 8];
    
    void inline static initLogLut() {
        ilog[0] = -1;
        for (int64_t i = 1; i < 256; i++)
            ilog[i] = 1 + ilog[i / 2];
        for (int64_t i = 0; i < 1 << 11; i++)
            nlogd[i] =
                round(log2((double) (i + (1 << 11)) / (1 << 11)) * (1 << 11)) - i;
    }
    
    int8_t inline static iLog2(uint64_t const value) {
        return 63 - __builtin_clzl(value);
    }
    
    uint64_t inline static nLog2(uint64_t const value) {
        int8_t const ilog = iLog2(value);
        uint64_t const norm = ilog > 11 ? value >> ilog - 11 : value << 11 - ilog;
        //return (ilog - 1 << 11) + nlogd[norm - (1 << 11)] + norm;
        return (ilog - 1 << 11) + norm; // broken version works better :)
    }
    
    struct fsm1 {
        uint64_t s[2]; // next state after bits 0,1
        int16_t n0, n1;
    } cstate1[12345678];
    uint64_t freqmask1[64 * 64 * 3 * 3];
    int64_t p = 0;
    int const limit = 20;
    
    int min(int a, int b) {
        return a < b ? a : b;
    }
    
    int max(int a, int b) {
        return a < b ? b : a;
    }
    
    int clamp(int a) {
        return min(a, limit);
    }
    
    #if 0
    
    int divisor(int a, int b) {
        return a + b * 2 / 5;
    }
    
    int repeated(int a, int b) {
        return a + 1;
    }
    
    int opposite(int a, int b) {
        return divisor(a, b) > 0 ? b * (a + 1) / divisor(a, b) : b;
    }
    #else
    
    int divisor(int a, int b) {
        return nLog2(b * 5 / 7);
    }
    
    int repeated(int a, int b) {
        return divisor(a, b) > (1 << 11) ? ((a + 1 + (b << 7) / divisor(a, b)) << 11) / divisor(a, b) : a + 1;
        //return divisor(a, b) > 0 ? (a + b / divisor(a, b)) / divisor(a, b) : a + 1;
    }
    
    int opposite(int a, int b) {
        return divisor(a, b) > 0 ? (b << 11) / divisor(a, b) : b;
    }
    #endif
    
    int Init_States(int x, int y, int c1, int c0) {
        x = min(x, limit);
        y = min(y, limit);
        uint64_t * const r = &freqmask1[((y * 64 + x) * 3 + c1) * 3 + c0];
        if (*r == UINT64_MAX) {
            *r = p;
            int p0 = p++;
            cstate1[p0].n0 = x;
            cstate1[p0].n1 = y;
            cstate1[p0].s[0] = Init_States(repeated(x, y), opposite(x, y), c0, 0);
            cstate1[p0].s[1] = Init_States(opposite(y, x), repeated(y, x), c0, 1);
        }
        return *r;
    }
    
    int main() {
        memset(freqmask1, 0xFF, sizeof (freqmask1));
        Init_States(0, 0, 2, 2);
        printf("%ld\n", p);
        puts("static final int State_table[][] = {");
        for (int i = 0; i < 256; i++)
            printf("   {%3lu, %3lu, %3d, %3d, %3d, %3d},\n",
                cstate1[i].s[0], cstate1[i].s[1],
                cstate1[i].n0, cstate1[i].n1,
                cstate1[i].n0 + cstate1[i].n1,
                i);
        puts("};");
        return EXIT_SUCCESS;
    }
    This generates symmetric FSMs. I'll think on assymetric FSMs, as it's somewhat pointless to remember long runs of LZP mismatches.

    Update:
    I've discovered that code above doesn't call initLogLut() thus arrays are zeroed (usually). That seems broken, but actually works better on enwiks so I'm leaving it as it is.
    Last edited by Piotr Tarsa; 30th January 2012 at 00:19.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    State tables from ZPAQ (libzpaq) are public domain. Actually the tables are generated at run time using parameters you can tune. There might be room for improvement because I discovered after writing the standard that a few states are unreachable.

Similar Threads

  1. Sac: (State-of-the-Art) Lossless Audio Compression
    By Sebastian in forum Data Compression
    Replies: 38
    Last Post: 25th April 2016, 16:01
  2. Compression state-of-art discussion with C.Bloom
    By Shelwien in forum Data Compression
    Replies: 7
    Last Post: 27th October 2011, 06:18
  3. State machines
    By encode in forum Data Compression
    Replies: 3
    Last Post: 3rd November 2010, 21:58
  4. History tables for LZ compressors
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 26th August 2010, 17:19
  5. Delta: binary tables preprocessor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 1st April 2008, 09:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •