Results 1 to 9 of 9

Thread: combining prediction with entropy coder

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts

    combining prediction with entropy coder

    Hi all, here is my question:

    There is a sequence of integers ( for simplicity, they are all 16-bit integers ) and there is a prediction model for the sequence, e.g., a_{n}^{~} = k_1 * a_{n-1} + k_2 * a_{n-2} + k_3. I can of course compute residues e_{n} = a_{n}^{~} - a_{n}, and then compress e_[n} with an entropy coder ( conventional arithmetic coder and ANS ), but better compression might be achieved if prediction sequence is fed to the entropy coder to compute adaptive probability distribution. While I understand the principle, I don't know how to do it in practice, since how do I map prediction values (integers ) to probabilities? Any concrete examples ( write-ups, source codes, existing packages ) to follow? Thanks in advance for any pointers.

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    If you compute e_{n} you already have enough information to pack it with ANS, you have an upper bound of 2^16, that's your max range, and if your residuals are low (hovering around 0) it should be feasible to use create a probability mapping for rANS simply by doing this: prob = (2^16 - e_{n}), this has the effect of creating a larger probability for lower values, even though there is no probability model involved, it's just a mapping from residual to a pseudo frequency.

    Then just feed the prob and the max range into something like the public domain rANS implementation by Fabian Giesen. However the prob can never be zero, so just add +1 to it so the ANS state at least grows a little bit and remains fully decodable.

  3. The Following User Says Thank You to Lucas For This Useful Post:

    smjohn1 (13th November 2018)

  4. #3
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by smjohn1 View Post
    Hi all, here is my question:

    There is a sequence of integers ( for simplicity, they are all 16-bit integers ) and there is a prediction model for the sequence, e.g., a_{n}^{~} = k_1 * a_{n-1} + k_2 * a_{n-2} + k_3. I can of course compute residues e_{n} = a_{n}^{~} - a_{n}, and then compress e_[n} with an entropy coder ( conventional arithmetic coder and ANS ), but better compression might be achieved if prediction sequence is fed to the entropy coder to compute adaptive probability distribution. While I understand the principle, I don't know how to do it in practice, since how do I map prediction values (integers ) to probabilities? Any concrete examples ( write-ups, source codes, existing packages ) to follow? Thanks in advance for any pointers.
    https://github.com/webmproject/libwe...ctor_enc.c#L46

    I use Shannon entropy with an exponential negative cost of -0.1 * exp(-A |x|) bits. The negative cost favors small values -- which are more likely to lead to good histogram clustering results later, even if Shannon entropy at this stage would be better otherwise.

  5. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    smjohn1 (13th November 2018)

  6. #4
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    Thanks. But if I do this, what is the difference between just input e_{n} directly to rANS or FSE? FSE of course is different from rANS, but wouldn't rANS itself take care of prob. of e_{n}?

    Quote Originally Posted by Lucas View Post
    If you compute e_{n} you already have enough information to pack it with ANS, you have an upper bound of 2^16, that's your max range, and if your residuals are low (hovering around 0) it should be feasible to use create a probability mapping for rANS simply by doing this: prob = (2^16 - e_{n}), this has the effect of creating a larger probability for lower values, even though there is no probability model involved, it's just a mapping from residual to a pseudo frequency.

    Then just feed the prob and the max range into something like the public domain rANS implementation by Fabian Giesen. However the prob can never be zero, so just add +1 to it so the ANS state at least grows a little bit and remains fully decodable.

  7. #5
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Feeding those pseudo frequencies into rANS is equivalent to optimally bit-packing your error codes, FSE would use a static model to attempt to compress your data, so it may be better than bit packing if your data has weird distributions.

    Since you're dealing with 16-bit data you could also just use an order-1 transpose and FSE for the highest ratio without having to develop a complex and wide probability model for 16-bit data:

    void compute_freqs(uint32_t * histo, uint8_t * src, size_t size) {
    uint32_t freqs_unroll[4][256] = {{0}};
    size_t i = 0;
    while((i + 4) < size) {
    freqs_unroll[0][src[i + 0]]++;
    freqs_unroll[1][src[i + 1]]++;
    freqs_unroll[2][src[i + 2]]++;
    freqs_unroll[3][src[i + 3]]++;
    i += 4;
    }

    while(i < size) {
    freqs_unroll[0][src[i]]++;
    i++;
    }

    for(i = 0; i < 256; i++)
    histo[i] = freqs_unroll[0][i] + freqs_unroll[1][i] + freqs_unroll[2][i] + freqs_unroll[3][i];
    }

    uint8_t order_1_transpose(void * src_v, void * dst_v, size_t size) {
    uint8_t * src = (uint8_t*)src_v;
    uint8_t * dst = (uint8_t*)dst_v;

    size_t i = 0, pos = 0, start = 0;
    uint32_t bucket[256], freqs[256];
    memset(bucket, 0, 256 * sizeof(uint32_t));
    memset(freqs, 0, 256 * sizeof(uint32_t));
    compute_freqs(freqs, src, size);

    for(i = 0; i < 256; i++) {
    bucket[i] = start;
    start += freqs[i];
    }

    uint8_t sentinel= src[size - 1];
    pos = bucket[sentinal]++;
    for(i = 0; i < size; i++) {
    dst[pos] = src[i];
    pos = bucket[src[i]]++;
    }

    return sentinel;
    }

    void order_1_untranspose(void * src_v, void * dst_v, size_t size, uint8_t sentinel) {
    uint8_t * src = (uint8_t*)src_v;
    uint8_t * dst = (uint8_t*)dst_v;

    size_t i = 0, pos = 0, start = 0;
    uint32_t bucket[256], freqs[256];
    memset(bucket, 0, 256 * sizeof(uint32_t));
    memset(freqs, 0, 256 * sizeof(uint32_t));
    compute_freqs(freqs, src, size);

    for(i = 0; i < 256; i++) {
    bucket[i] = start;
    start += freqs[i];
    }

    pos = bucket[sentinel]++;
    for(i = 0; i < size; i++) {
    dst[i] = src[pos];
    pos = bucket[dst[i]]++;
    }
    }


    This only has 1 byte of overhead since you must transmit the sentinel, but it'll break apart those 16-bit inputs into a bunch of 8-bit context sorted buckets so you can use a traditional byte-wise entropy coder like FSE or rANS.

  8. The Following User Says Thank You to Lucas For This Useful Post:

    smjohn1 (14th November 2018)

  9. #6
    Member
    Join Date
    Dec 2014
    Location
    Berlin
    Posts
    29
    Thanks
    35
    Thanked 26 Times in 12 Posts
    Here a hint on how to do it with a context model so that the preditions are learned to be the maximal possible probability for an arithmetic coder: https://encode.ru/threads/1211-Compr...ll=1#post50274

  10. The Following User Says Thank You to pothos2 For This Useful Post:

    smjohn1 (16th November 2018)

  11. #7
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by smjohn1 View Post
    How do I map prediction values (integers ) to probabilities?
    This is typically done by "contexts", That is, instead of having a single statistical model from which you derive the probabilities (e.g. by counting symbols and then using relative frequency as an estimate of the probability), you have a statistical model that is indexed by the context. Or, to put it simple, instead of a 1-dimensional frequency table freq[symbol], you use a two-dimensional table freq[symbol][context] that is indexed by both the symbol and its context. Contexts can be as simple as the preceeding symbol, but typically you want to avoid such simple contexts as you will likely not collect enough samples (in a 2d table) to arrive at a robust probability estimate. Hence, one would typically not use the preceeding symbol directly, but some function derived from it. For example, for text compression, the context could say whether the preceeding symbol (=letter) was a consonant or a vowel.

  12. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

  13. #9
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by thorfdbg View Post
    This is typically done by "contexts"
    Both are used. Context modeling is more noise resistant, but often simple prediction is more efficient.

Similar Threads

  1. Sequential implementation of ANS-like entropy coder
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 2nd July 2017, 12:50
  2. CPU Branch Prediction as an entropy model
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 17th December 2016, 21:27
  3. Combining compression algorithms (image + archive)
    By Noob in forum Data Compression
    Replies: 19
    Last Post: 2nd July 2016, 23:58
  4. ao0ec: Bytewise adaptive order 0 entropy coder
    By Kennon Conrad in forum Download Area
    Replies: 7
    Last Post: 20th March 2015, 07:18
  5. Replies: 1
    Last Post: 21st June 2011, 11:48

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •