Page 1 of 7 123 ... LastLast
Results 1 to 30 of 204

Thread: Finite State Entropy

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts

    Finite State Entropy

    Just completed an entropy coder, based on Jarek Duda's ANS theory.
    It's released as an open source code with a BSD license, at https://github.com/Cyan4973/FiniteStateEntropy .

    It's designed to be a drop in replacement to block-based entropy coders, including typically Huffman and range encoder.
    Its speed is pretty decent, and it can give a serious challenge to the fastest Huffman implementations.
    But FSE features the compression performance of an arithmetic coder,
    and therefore precisely estimates probabilities, critically those beyond 50%.

    Arithmetic compression performance at Huffman speed, that seems like a good deal.

    Blog entry :
    http://fastcompression.blogspot.fr/2...-breed-of.html

    FSE only use additions, masks and shifts.
    I initially designed it to work on retro platforms, but it's now good for modern CPU too.
    Last edited by Cyan; 17th December 2013 at 20:16.

  2. The Following 11 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (17th December 2013),encode (2nd April 2016),inikep (17th December 2013),Jarek (17th December 2013),Matt Mahoney (18th December 2013),moinakg (18th December 2013),nburns (24th February 2014),Paul W. (17th December 2013),Piotr Tarsa (17th December 2013),RichSelian (3rd February 2014),stevenxu (21st December 2013)

  3. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    How would I use this as a drop in replacement for an arithmetic coder? From looking at the source code it looks to me like it implements a fixed order 0 model.

    Here are some test results on proba70.bin
    Code:
    1,048,576 proba70.bin
      171,702 proba70.fpaqa
      171,044 proba70.fpaqc
      171,146 proba70.fpaqc-arith
      165,760 proba70.fpaq0 (ratio 6.326)
    fpaqa uses an adaptive order 0 model and table driven ANS coder which I implemented in 2007. The compressor stores the predictions on a stack so they are coded in blocks in reverse order.

    fpaqc uses the same model except the ANS coder is implemented without tables except for a table of reciprocals to optimize division (2007).

    fpaqc-arith is the same model with an arithmetic coder. To compile: g++ fpaqc.cpp -DARITH

    fpaq0 uses a stationary order 0 model with an arithmetic coder. I tested with a few other compressors and this seems to be the best model for this file. For example, it compresses better than zpaq -method 6 but about the same as zpaq -method s4.0c256 which is also a near-stationary order 0 model.

    Also compare lpaq1 and lpaq1a at http://mattmahoney.net/dc/text.html#1440
    These use identical models except that lpaq1 uses arithmetic coding and lpaq1a uses the ANS coder from fpaqc.

    Here are timing results for all the fpaq variants. My implementation of ANS (fpaqa, fpaqb, fpaqc) is somewhat slower than arithmetic coding. fpaq0p is probably the most similar model (with a different adaptation rate). The really fast versions use tightly integrated arithmetic coders that would require some effort to convert to ANS. http://mattmahoney.net/dc/text.html#5586

  4. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    From looking at the source code it looks to me like it implements a fixed order 0 model.
    Yes, that's what I meant by "block-based entropy coder" : probabilities are fixed for the block of input data.
    Adaptation is done by cutting input data into smaller blocks, but that's not per-symbol probability adaptation.
    This methodology is fine for zlib, zhuff, and so on (I guess FreeArc maybe ?), but it's not suitable for fpaq/zpaq/paq, which favor adaptive binary models.

    paqa uses an adaptive order 0 model and table driven ANS coder which I implemented in 2007.
    Of course ! Sorry, binary versions were completely out of my mind when I wrote the blog article.
    They obey a completely different set of properties, and are meant for different use case (aka. fpaq/zpaq/paq and equivalent).
    I will nonetheless mention it in the article.

  5. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Thanks, Yann - indeed while ABS for "bit-wise adaptive" is practically equivalent to arithmetic coding, ANS for fixed probabilities ("block-based") allows to encode symbol from a large alphabet in a single table check - it should be faster than Huffman, having precision like arithmetic.

    The number of states should be a few times larger than alphabet size to work in deltaH~0.001 bits/symbol regime.
    You could make it faster by putting also the bitwise operations into the table, like the whole "(symbol,state)->(bit sequence, new state)" rules ... but at cost of using lower number of states/alphabet to remain in L1 cache.
    You could also increase base b e.g. to 4 to make twice less bit transfers at cost of precision ... generally there are lots of options to optimize among ...

    Another advantage of ANS is that we can slightly perturb the initialization procedure using a pseudo-random number generator initialized with cryptographic key (e.g. and also the number of block): for example choosing between the lowest weight symbol and the second best one.
    This way we simultaneously get a decent encryption practically for free.

    The advantage of Huffman is that we can adaptively modify the tree on the run, but we could also do it in ANS - switch some appearances between symbols and renumerate the following ones ... but it would be costly for a large table.

    About materials, the recent paper should be more readable and here is a poster gathering basic information.
    Last edited by Jarek; 17th December 2013 at 20:02.

  6. #5
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    The advantage of Huffman is that we can adaptively modify the tree on the run, but we could also do it in ANS - switch some appearances between symbols and renumerate the following ones ... but it would be costly for a large table.
    Indeed, even for Huffman, this is a costly operation.
    I guess the most common algorithm used is the Vitter one, and it is way slower than any standard implementation. In the end, it's not even better than just cutting data into smaller blocks, with an optimal header tree into each one of them.

    My guess : block-based algorithms are not meant to be per-symbol adaptive. It just puts them too far away from their comfort zone.
    Per-symbol statistic adaptation is best achieved using Matt's BAC (Binary Arithmetic Coding) and ABS (Asymetric Binary System).


    Btw, another potential advantage of Huffman is that the header can be smaller :
    since it's enough to tell the number of bits, it's typically possible to encode this value using 4 bits or even less. But for arithmetic coding, and for FSE, it's necessary to provide the exact frequency, which costs more bits.
    It may not sound like much, but these added bits translate into heavier header, and put a lower limit to the size of input blocks for the algorithm to remain efficient. As a consequence, anything that can improve this header size is welcomed.
    Last edited by Cyan; 15th May 2016 at 16:10.

  7. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Btw, another potential advantage of Huffman is that the header can be smaller :
    since it's enough to tell the number of bits, it's typically possible to encode this value using 4 bits or even less. But for arithmetic coding, and for FSE, it's necessary to provide the exact frequency, which costs more bits.
    It may not sound like much, but these added bits translate into heavier header, and put a lower limit to the size of input blocks for the algorithm to remain efficient. As a consequence, anything that can improve this header size is welcomed.
    You can store logarithms instead of exact frequencies. What matters is code lengths and logarithms are just that. You can fit a logarithm of single frequency in one byte - I think that would be enough.

  8. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    what's the difference between logarithm of frequency and bit length?

  9. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Basically, there's no difference other than that Huffman implies integral bit lengths, while we don't need to comply with that in case of arithmetic coding.

    In fact, we can do some sort of hybrid mode, ie store fixed-point logairthms for few most frequent values and then for the rest store integral logarithms using the usual tricks for compacting bit lengths.
    Last edited by Piotr Tarsa; 17th December 2013 at 23:02.

  10. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Piotr: since our goal is optimal usage of codespace, it seems that optimal solution is to store all freqs as fixed-precision logarithms. f.e. with 8 binary digits: 0.0101 0001, 0.0001 0111, 0.0000 0101, 0.0000 0001..

    Cyan, your bitflushing code supposes that nbBitsOut+nbBitsOut2<=25

  11. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Piotr: since our goal is optimal usage of codespace, it seems that optimal solution is to store all freqs as fixed-precision logarithms. f.e. with 8 binary digits: 0.0101 0001, 0.0001 0111, 0.0000 0101, 0.0000 0001..
    That's what I was talking about.

    The precision of frequency of infrequent symbols doesn't contribute much to compression ratio improvement (over Huffman) so we can provide bit lengths rounded to integers for most symbols, as I said, saving on header space and probably saving space in general.

  12. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    The optimal solution was ultimately found by one of Shannon's own pupils, David A. Huffman, almost by chance. His version became immensely popular, not least because he could prove, a few years later, that his construction method was optimal : there was no way to build a better distribution.
    Huffman is optimal, given its assumptions. Optimal is a slippery word, because nothing can be optimal without assumptions, and using the right assumptions, anything can be optimal.

    Huffman makes a set of codes that matches a distribution. The assumption is that each code represents a single thing and is always the same. That assumption implies that each code must be a whole number of bits.
    Last edited by nburns; 17th December 2013 at 23:55.

  13. #12
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    About probabilities for the first block, this block can be shorter and so weakly compressed - mainly used to estimate further probabilities.
    So we can store just some very rough approximations for these initial probabilities.
    For example 1 bit (b_i=0,1) per symbol i (32 bytes for 256 used), like: probability of symbol i is (b_i+1)/sum_i (b_i+1) this way.
    The question how to do it optimally is an interesting general problem - for given information limit, how to represent the best approximation of probability distribution from some family ...

    About Huffman, by grouping symbols it allows to approach the Shannon limit as close as we want - in this sense it is optimal.
    However, to get to dH distance, the number of symbols we need to group is approximately proportional to 1/dH, so we would need huge 2^(1/dH) size alphabets - for dH~0.002 bits/symbol it becomes completely non-practical (picture in poster).
    Last edited by Jarek; 18th December 2013 at 00:17.

  14. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Piotr, i mean that we don't need to store bit lengths at all: all symbols with smaller frequency assigned the fixed 0.0000 0001 amount. if we encode higher frequencies only with 8-bit precision, we shouldn't store bitlengths higher than 8 bits - instead assign them all the fixed 1/256 frequency. i believe that it will optimize overall codelength for tables+data, although have no proof

  15. #14
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Cyan, your bitflushing code supposes that nbBitsOut+nbBitsOut2<=25
    Correct. I'll add a #define protection for that. Thanks !

  16. #15
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Here are some test results on proba70.bin
    Code:
    1,048,576 proba70.bin
    171,702 proba70.fpaqa
    171,044 proba70.fpaqc
    171,146 proba70.fpaqc-arith
    165,760 proba70.fpaq0 (ratio 6.326)
    Thanks for the comparison metric.
    Indeed, proba70 (and all probaxx files) is a very "static" file, which means probability distribution does not vary much across the file.

    By default, FSE gets 166,091 bytes with it.
    This is by using the default "32 KB" block mode, which means that the header is repeated for each block.

    If I'm "cheating a bit", by forcing "1 MB" block mode, and therefore a single header for the whole file,
    then FSE gets 165,635 bytes, which is in line with fpaq0.

    This brings us back to mentioned issue in this thread : header contributes to the overall "cost", and can probably be optimized to cost less than it does today.

  17. #16
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Single header means just a static compression - for large uniform data it is very reasonable and ANS is perfect for this purpose.
    We can also use a higher order or context dependent methods here - switching between corresponding coding tables (many, probably for smaller alphabet).

  18. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    1. all lzh/ari codecs i know use alphabets with >256 symbols since they combine literal chars and matches into the one dictionary. so the first improvement i propose is to support 16-bit symbols. one possible implementation:

    #ifndef SYMBOL
    #define SYMBOL unsigned char
    #endif

    although ideally one sourcefile should provide both 8-bit and 16-bit routines


    2. next thing is header-less mode. does i undestand correctly that it's enough to support both coder and decoder with the same counting[] tables? in that case providing lower-level functions that rely on provided frequencies will allow to use codec in semi-dynamic mode where each block is encoded using stats of previous block; as well as provide alternative header encodings without modifying FSE sources. moreover, i feel that header encoding doesn't belong to the core of the algorithm implementation and may be cosidered as part of [example] client code

    3. next thing is variant with reversed compression direction. it may be useful since overlapping modeling and encoding should make compression faster, while for decompression, the fastest way is multithreaded H/ARI/FSE decoding with the following single-threaded LZ step

    4. i like to see higher-order variants of code. f.e. i think about LZ+order1 coder
    Last edited by Bulat Ziganshin; 18th December 2013 at 15:35.

  19. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (19th December 2013)

  20. #18
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    The higher order or context dependent method, instead of e.g. "tableDecode[state]", will use "tableDecode[history/context][state]" - many smaller coding tables.
    Where, while encoding is in backward direction, the history/context means the one for later forward direction decoding.

    The minimal header problem is indeed interesting: how to find the best description/approximation: minimizing "header size + imprecision cost".
    For large data, the header cost becomes negligible (unless using some high order or complex context model) - static compressors can be better then adaptive, as they use the whole data to construct the optimal model - the income can be larger than the header cost.

    ps. As I have written in a different thread, for "reversed entropy coding" problem, we can use generalizations of Kuznetsov-Tsybakov problem to encode such that decoder doesn't have to know the probabilities used - no header is needed.
    I don't see how to do it for standard entropy coding, but maybe something like that is possible? The information about probability distribution seems somehow excess ...

  21. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Where, while encoding is in backward direction, the history/context means the one for later forward direction decoding.
    it should be no problem for fixed-order prediction

    The higher order or context dependent method, instead of e.g. "tableDecode[state]", will use "tableDecode[history/context][state]" - many smaller coding tables
    hopefully someone will write entire code but if i undestood you correctly, it may be problematic - f.e. with 256 contexts and 16 kbyte for every context we will need too much memory (i.e. it will not fit in L1/L2 cache) and with smaller encoding tables precision will be smaller. while for arithmetic coding it's not much problem, we need only 256 elements of encoding table for every context

  22. #20
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    As I have written, it would rather require a smaller alphabet - for example of size 16, which would need a few hundred bytes instead of 16kB.
    Size 16 alphabet means that there are 4 binary steps of arithmetic coding per such single step - still should be much faster.

  23. #21
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Jarek View Post
    The higher order or context dependent method, instead of e.g. "tableDecode[state]", will use "tableDecode[history/context][state]" - many smaller coding tables.
    Where, while encoding is in backward direction, the history/context means the one for later forward direction decoding.

    The minimal header problem is indeed interesting: how to find the best description/approximation: minimizing "header size + imprecision cost".
    For large data, the header cost becomes negligible (unless using some high order or complex context model) - static compressors can be better then adaptive, as they use the whole data to construct the optimal model - the income can be larger than the header cost.

    ps. As I have written in a different thread, for "reversed entropy coding" problem, we can use generalizations of Kuznetsov-Tsybakov problem to encode such that decoder doesn't have to know the probabilities used - no header is needed.
    I don't see how to do it for standard entropy coding, but maybe something like that is possible? The information about probability distribution seems somehow excess ...
    I use to think the header size problem was interesting. But not any more. The optimal solutions are all bijective. Some better suited for large number of symbols and some better suited for when only a small subset of available symbols are used in file to compress. But in general if not known how many symbols are used and if the file IID then the best Huffman is a bijective adaptive huffman. And if you have the time a bijective adaptive arithmetic file compressor will compress the smallest.

    If one wants to write an optimal header by using symbols and bit lengths which is not the way way to go unless you have extra complex rules on how to handle the
    data that follows so that data and extra rules lead to same file again with compression. You can use a million hybrid ways to do it such as how I did the bijective DC the header in that can easily be used to make a bijective huffman or arithmetic. In which case the header is in front of file and the data follows. The resultant file is such that any subset of the file would decompress and compress back to the same subset so fully bijective.

  24. #22
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Jarek View Post
    About Huffman, by grouping symbols it allows to approach the Shannon limit as close as we want - in this sense it is optimal.
    However, to get to dH distance, the number of symbols we need to group is approximately proportional to 1/dH, so we would need huge 2^(1/dH) size alphabets - for dH~0.002 bits/symbol it becomes completely non-practical (picture in poster).
    The bigger the alphabet, the bigger the tree. I think that just moves the problem somewhere else. If you dig up the original papers on Huffman, you should find the proof of optimality. It's a valid proof, but for a different set of conditions than the ones here.

  25. #23
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Jarek View Post
    The minimal header problem is indeed interesting: how to find the best description/approximation: minimizing "header size + imprecision cost".
    For large data, the header cost becomes negligible (unless using some high order or complex context model) - static compressors can be better then adaptive, as they use the whole data to construct the optimal model - the income can be larger than the header cost.
    Theoretically, the header cost for static should equal the startup cost for adaptive, because they contain the same amount of information.

    Yann --
    Storing only the delta for headers after the first would probably help. I haven't taken a look at the code, so I'm not sure if you're doing that already.

  26. #24
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Piotr, i mean that we don't need to store bit lengths at all: all symbols with smaller frequency assigned the fixed 0.0000 0001 amount. if we encode higher frequencies only with 8-bit precision, we shouldn't store bitlengths higher than 8 bits - instead assign them all the fixed 1/256 frequency. i believe that it will optimize overall codelength for tables+data, although have no proof
    I think 1/256 would be too high. It could treat the frequencies as being relative to each other, though. So the denominator would be the sum of all of them. That would help.

    The frequencies are really integer counts. If they are highly skewed, Elias gamma would do a decent job, I think.

  27. #25
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    1. all lzh/ari codecs i know use alphabets with >256 symbols since they combine literal chars and matches into the one dictionary. so the first improvement i propose is to support 16-bit symbols. one possible implementation:

    #ifndef SYMBOL
    #define SYMBOL unsigned char
    #endif

    although ideally one sourcefile should provide both 8-bit and 16-bit routines
    Indeed, that's how zip works, and by way of consequence many other lzh algorithms.
    I'm personally not found of it, since it introduces some complexity which can solved differently.

    Anyway, nonetheless, if that's the way many programmers want to use it, there should be a mode for it.
    It's more complex than it sounds : the decoding state table is directly affected.
    A cell size is 4 bytes, exactly, and I can't increase the size of symbol that easily.
    I guess that one way to proceed would be to "share some space" with '.nbBits',
    resulting in an extended range for symbol from 8 bits to 12 bits (4096 values).
    This size should be large enough for most lzh with more than 256 values.
    For larger number of values than that, modifications are going to be larger (and most likely no longer compatible with L1 cache requirement).


    Quote Originally Posted by Bulat Ziganshin View Post
    2. next thing is header-less mode. does i undestand correctly that it's enough to support both coder and decoder with the same counting[] tables? in that case providing lower-level functions that rely on provided frequencies will allow to use codec in semi-dynamic mode where each block is encoded using stats of previous block; as well as provide alternative header encodings without modifying FSE sources. moreover, i feel that header encoding doesn't belong to the core of the algorithm implementation and may be cosidered as part of [example] client code
    Yes, I agree,
    this is definitely one of the key properties of the interface.
    The code is almost ready for it.
    As you say, header encoding doesn't belong to the core of the algorithm implementation and may be considered as part of client code.


    Quote Originally Posted by Bulat Ziganshin View Post
    3. next thing is variant with reversed compression direction. it may be useful since overlapping modeling and encoding should make compression faster, while for decompression, the fastest way is multithreaded H/ARI/FSE decoding with the following single-threaded LZ step
    I'm not sure to understand this part.
    Keep in mind that, regarding direction, ANS theory is a bit special, since it requires encoding and decoding to be performed in reverse directions.
    The way FSE deals with it is that it encodes in backward direction and output in forward direction,
    while decoding is performed in forward direction while reading the compressed input in backward direction.
    A bit complex, but it works well for block mode.


    Quote Originally Posted by Bulat Ziganshin View Post
    4. i like to see higher-order variants of code. f.e. i think about LZ+order1 coder
    From an algorithm perspective, this point is equivalent to using multiple tables to encode a single bitstream.
    It should cause no problem as long as all state table have same size.
    (For different sizes, though, it's a bit more complex...).

    Now, obviously, the major issue with this is the total size of tables, which is very likely to become larger than L1 cache, directly affecting speed.

  28. #26
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by nburns View Post
    Theoretically, the header cost for static should equal the startup cost for adaptive, because they contain the same amount of information.
    No not theoretically. The length of header followed by data will in general be greater than proper adaptive even if you think of the information as the same. As an example you can think of the storing of header as one number and data as a second number while the adaptive can be thought of as a single number. A single number can be the bijective combination of 2 numbers which require less space to represent it compared to the space needed to represent two single numbers.

  29. #27
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I know most people don't worry about the few bits saved by doing things bijective but. Below I have an example of why header followed by data is not as good as a single number containing what most consider the same info.

    to use the cantor pairing function of http://en.wikipedia.org/wiki/Cantor_...iring_function

    let X and Y be two numbers encode as a universal number plus a binary coded number which has zero waste versus the
    the number z result of the cantor pairing function for x and Y written as a single binary coded number which has no
    waste.

    starting at zero to match the article
    for binary coded integer starting at zero add two drop high order 1 bit from binary representation
    0 0
    1 1
    2 00
    3 10
    6 000
    9 110
    24 01011

    for universal number using Elias gamma code shifted to start with zero

    0 1
    1 010
    2 011
    3 00100

    for <3,3> 00100 10 7 bits
    for 24 01011 5 bits so single number wins here

    for <0,3> 1 00100 6 bits
    for 9 110 3 bits so single number wins here

    for <3,0> 00100 1 6 bits
    for 6 000 3 bits so single number wins here

    of course you could use a different universal number representation if you like.

  30. #28
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by biject.bwts View Post
    No not theoretically. The length of header followed by data will in general be greater than proper adaptive even if you think of the information as the same. As an example you can think of the storing of header as one number and data as a second number while the adaptive can be thought of as a single number. A single number can be the bijective combination of 2 numbers which require less space to represent it compared to the space needed to represent two single numbers.
    If you made a header and bijectively merged it with the data, I wouldn't consider that an adaptive algorithm.

  31. #29
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by nburns View Post
    If you made a header and bijectively merged it with the data, I wouldn't consider that an adaptive algorithm.
    In general neither would I consider it an adaptive alogrithm. But it is easy to make a header that can be concatenated with any file the result of the two can be thought of as result an adaptive bijective algorithm. Note I am not saying bijectively combine but to combine data after the hearder id any. What you call the header or trimmed version of it would be the result of a bijective compression. The shortened header would still be bijective and valid the data is just there after the header if more data is to be compressed.

  32. #30
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I tried to start playing with it, but it keeps segfaulting in FSE_decompress:

    Code:
    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000405485 in FSE_decompress (dest=0x6084d0 "", originalSize=44032, compressed=0x7ffff7ed7013 "F\002") at ../fse.c:625
    625            bitStream = *(U32*)ip;
    (gdb) list
    620            U32 state;
    621    
    622            bitCount = (((*(U32*)ip)-1) & 7) + 1 + 24;
    623            iend = ip + (((*(U32*)ip)+7) / 8);
    624            ip = iend - 4;
    625            bitStream = *(U32*)ip;
    626            bitCount -= FSE_MAX_TABLELOG;
    627            state = (bitStream >> bitCount) & FSE_MAXTABLESIZE_MASK;
    628    
    629            while (op<oend)
    (gdb)
    Edit:
    That was in Ubuntu, by the way, with gcc 4.7.2.

    I get the same result in 32-bit Cygwin, however:

    Code:
    ~/git/FiniteStateEntropy/test$ ./fse COPYING -o FOO
    FSE : Finite State Entropy, capability demo by Yann Collet (Dec 20 2013)
    Compressed 18092 bytes into 10758 bytes ==> 59.46%
    ~/git/FiniteStateEntropy/test$ ./fse -d FOO > BAR
    FSE : Finite State Entropy, capability demo by Yann Collet (Dec 20 2013)
    Segmentation fault (core dumped)
    ~/git/FiniteStateEntropy/test$
    Am I using it wrong?
    Last edited by nburns; 20th December 2013 at 11:21.

Page 1 of 7 123 ... LastLast

Similar Threads

  1. Replies: 1
    Last Post: 17th February 2014, 23:05
  2. ENTROPY 0.5
    By comp1 in forum Download Area
    Replies: 3
    Last Post: 7th February 2013, 00:21
  3. Parametric approximation of entropy(rate) function
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 7th April 2012, 22:58
  4. State tables
    By Piotr Tarsa in forum Data Compression
    Replies: 3
    Last Post: 29th January 2012, 23:14
  5. What is best for a pure Entropy Encoder?
    By biject.bwts in forum Data Compression
    Replies: 4
    Last Post: 28th September 2011, 12:45

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •