Page 5 of 7 FirstFirst ... 34567 LastLast
Results 121 to 150 of 204

Thread: Finite State Entropy

  1. #121
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Well, at least you can read the assembler directly, which is better than just speculating as I do
    wish I could do the same...

    Thanks for confirmation !

  2. #122
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,475
    Thanks
    702
    Thanked 645 Times in 347 Posts
    i'm speculating, but 99% sure. there are no other variants. may be i've seen that effect in asm listings of my own code, not remember exactly

  3. #123
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Maybe one way to move forward would be to keep the symbolTT Table into the innermost compression function.
    It would still be allocated on stack, and would be recalculated each time the compression function is called,
    but since its size only depends on nbSymbols, this is a fraction of the cost of the much larger state table.

    Now, one problem is, it requires 'normalizedCounter' to be built, so that would change the interface contract.

  4. #124
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,475
    Thanks
    702
    Thanked 645 Times in 347 Posts
    you can just copy this table to the stack

  5. #125
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Good point, I'll try that

  6. #126
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    OK, this version is much better

    I still have to complain about a ~2% performance loss on GCC / Linux, but that's much more manageable.
    We are getting pretty close to a github update.

    Btw, I should have a look at this "branching" methodology...
    Attached Files Attached Files

  7. #127
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Is it possible to quickly compute the compressed data size without actually performing the compression just based on the symbol distribution?
    Last edited by caveman; 15th January 2014 at 01:34.

  8. #128
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    572
    Thanks
    180
    Thanked 177 Times in 108 Posts
    caveman, yes we can analytically calculate the expected number of bits per symbol assuming for example i.i.d. symbol distribution - tests in the paper were performed this way.
    First we need to find stationary probability distribution of used states (dominant eigenvector), then we can calculate the expected number bits/symbol it produces - here is Mathematica procedure I have used: https://dl.dropboxusercontent.com/u/12405967/ans.nb

    However, these values are a bit better than experimental ones - should be discussed somewhere here: http://encode.ru/threads/1821-Asymet...l-System/page2
    And I was only using precise initialization, while the github implementation uses less accurate table generation.

  9. #129
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Is it possible to quickly compute the compressed data size without actually performing the compression just based on the symbol distribution?
    There are 2 parts :

    1) Compute, no, estimate, yes.
    If the point is to have a rough idea of final size, it can be done.
    If the objective is to know the exact size of compressed bitStream before generating it, then I don't know how to do that with 100% accuracy.

    2) Quickly : yes, but...
    it requires some pre-computed tables for higher accuracy, which in turn require memory.
    Making it a compulsory part of a portable reference library would risk excluding low-mem systems.
    So it depends on the intended usage.
    A work-around could be to use rough and pessimist approximations instead.


    In both cases, the solution is no different from what could be achieved for Arithmetic compression, since we have same input, and almost same compression efficiency.

  10. #130
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    572
    Thanks
    180
    Thanked 177 Times in 108 Posts
    Large independent tests of FSE: https://groups.google.com/a/webmproj...Y/T1cZQLnJrZ4J

    I have just realized that we can use the huge speedup of large alphabet ANS also for the binary bit by bit case - by grouping symbols of the same quantized probability.
    Let say we use 64 different quantizations of probability of the least probable symbol (1/128 precision) - so we can use 64 of 8bit buffers, add binary choices bit by bit to corresponding buffers, and use one of 64 ANS coding tables for 256 size alphabet every time a buffer accumulates to 8 binary symbols.
    All these 64 tables can work on the same state - decoder knows symbols of which probability run out.

  11. #131
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Interesting,
    I wonder who's skal, he seems to understand the topic pretty fast.

    Anyway, I saw your comment at the end regarding precise initialization, and I can only invite you to test your code, which seems almost functional, by integrating it into the github version for example.
    I can only hope you can prove me wrong regarding the high impact on speed for a comparatively small gain reported from a previous version developed before release.

  12. #132
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Quote Originally Posted by Cyan View Post
    I wonder who's skal, he seems to understand the topic pretty fast.
    Pascal Massimino, he was known as french demo maker:
    http://demozoo.org/sceners/252/
    http://fr.linkedin.com/in/skaaaaaaaaal
    He joined Google in 2006 and notably works on WebP.
    Last edited by caveman; 16th January 2014 at 16:45.

  13. #133
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    OK, thanks for info, now that's much clearer...

  14. #134
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    572
    Thanks
    180
    Thanked 177 Times in 108 Posts
    Indeed - I think the video compression like Google VP is what will gain the most from ANS, as you want high quality video decompression, not killing e.g. smartphone battery.
    We can directly use it for large static distributions like DCT coefficients there ...

    ... but this trick from my previous post: divide binary choices into let say 64 of 8 bit bufferers for different quantized binary probability distributions, and use 256 alphabet ANS every time a buffer fills up, allows to also gain like 5x decoding speed for varying binary choices.
    It seems there is an issue with encoding the remaining values of the buffers when the block ends - we could do it using direct ABS formulas, but we would still need to additionally store the numbers of remaining symbols in each of them, so that decoder knows how many load at start - storing these numbers would require e.g. 3bits * 64 = 24 bytes per block - not very much ...
    ... but it can be avoided, again at cost of encoding time - this time let us use 64 large buffers, feed them with succeeding binary symbols, and start using 256 alphabet ANS after processing the whole data block - this way decoder can start with loading 8 binary choices to each of 64 buffers (assume at least one, even if it will be not entirely used - the not used positions are filled with the more probable symbols).
    It has a small complication at the end of decoding block - decoder has to decide that there has remained less than 8 symbols for given probability to switch to using ABS formulas instead.

    Most of current arithmetic coding applications could get huge speed up this way ...

    ps. If it is too costly from memory point of view, using 16 size alphabet would cost about 20 times less and would give only ~3x speedup ...
    Last edited by Jarek; 16th January 2014 at 22:43.

  15. #135
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,058
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I'm a native English speaker, so I'll throw in my 2 cents. I like FSE_count, FSE_sizeof_counts, and FSE_count_normalized. Frequency is afaik just a fancier word for count. Distribution might be technically correct, but it sounds more obfuscated.

  16. The Following User Says Thank You to nburns For This Useful Post:

    Cyan (17th January 2014)

  17. #136
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,475
    Thanks
    702
    Thanked 645 Times in 347 Posts
    nburns, what are the good names for procedure that takes unnormalized counters, does some calculations and returns normalized counters (aka distribution aka proportion of symbols)?

  18. #137
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,058
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    nburns, what are the good names for procedure that takes unnormalized counters, does some calculations and returns normalized counters (aka distribution aka proportion of symbols)?
    FSE_normalize_counts

  19. #138
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    572
    Thanks
    180
    Thanked 177 Times in 108 Posts
    Pascal Massimino has written than he has made some "obvious optimization" of decoder (which, as he said, was already faster than FSE), what made it nearly twice faster (from being 50% slower than VLC to having similar speed):
    https://groups.google.com/a/webmproj...Y/dPqqSeaQhhYJ

  20. #139
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Unfortunately, there is no code provided.

  21. #140
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,058
    Thanks
    54
    Thanked 71 Times in 55 Posts
    He mentioned getting better compression from an FSE-style coder than fpaqc. I think this must be model-related, because, in my experiments, BWTS+MTF+fpaqc gave compression results that FSE couldn't touch.

  22. #141
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    396
    Thanks
    126
    Thanked 137 Times in 88 Posts
    A basic order(1) FSE style encoder could be a very strong alternative to simple modelling + arithmetic coding.

    I have my own hacky O(1) arithmetic coder (based on Shelwien's code) that removes all divisions for encoding, but still leaves 1 for encoding. That makes it close to FSE for encoding speeds but still annoyingly tardy on decoding. I should investigate FSE more to see how easy it is to convert it to higher orders, or at least order-1 as higher than that becomes hard to do in a static block-by-block manner rather than adaptive modelling.

  23. #142
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    35
    Thanks
    13
    Thanked 6 Times in 3 Posts
    With -march=pentiumpro added to Makefile compiler options i've got ~10% decompression speedup. Tested several times for sure.

  24. #143
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    370
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Quote Originally Posted by ivan2k2 View Post
    With -march=pentiumpro added to Makefile compiler options i've got ~10% decompression speedup. Tested several times for sure.
    you can for sure also test with -march=native mostlikly you will get even more speed

  25. #144
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    35
    Thanks
    13
    Thanked 6 Times in 3 Posts
    Quote Originally Posted by thometal View Post
    you can for sure also test with -march=native mostlikly you will get even more speed
    Already tried 'native', but it's surprisingly slower, then 'pentiumpro'.

  26. #145
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    370
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Quote Originally Posted by ivan2k2 View Post
    Already tried 'native', but it's surprisingly slower, then 'pentiumpro'.
    oh strange very strange

    Which cpu do you have?

  27. #146
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    35
    Thanks
    13
    Thanked 6 Times in 3 Posts
    Quote Originally Posted by thometal View Post
    oh strange very strange

    Which cpu do you have?
    I have 2 laptops with i5 3320M and i3 3217U.

    And here 32bit binaries compiled with default options and with -march=pentiumpro.
    Attached Files Attached Files

  28. #147
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by thometal View Post
    oh strange very strange

    Which cpu do you have?
    Well, it's not strange.
    I've been playing with march/mtune several times in my life and I don't think I've ever seen native being the fastest. The winner used to look random. Even if native has the largest chance of winning, it's no wonder to see it lose.

  29. #148
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    Yes, I confirm the same experience,
    -march options tend to be a bit too "random", and it's difficult to guarantee that one value will always be better for everyone, not even for the architecture it's supposed to target.

    I'm glad -march=pentiumpro gives you better result, and I'm likely to try it too, just to witness the result.

  30. #149
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    I tested -march=pentiumpro with GCC v4.8.1,
    and indeed, the gains are pretty large.
    Test System : Core 2 Duo, 3Gz, Linux Mint Petra 64-bits

    normal version (-O3)
    $ ./fse32 proba70.bin
    FSE : Finite State Entropy, 32-bits demo by Yann Collet (Feb 3 2014)
    proba70.bin : 1048575 -> 165911 (15.82%), 160.9 MB/s , 204.9 MB/s

    march version (-O3 -march=pentiumpro)
    $ ./fse32 proba70.bin
    FSE : Finite State Entropy, 32-bits demo by Yann Collet (Feb 3 2014)
    proba70.bin : 1048575 -> 165911 (15.82%), 172.5 MB/s , 255.1 MB/s

    That's a really big improvement.
    So I'm likely to adopt it for the 32-bits version.
    Thanks ivan2k2 for reporting it !

    Obviously, it only works for the 32-bits version, since Pentium Pro is not 64-bits compatible, therefore the compilation fails for this target.
    Last edited by Cyan; 3rd February 2014 at 22:34.

  31. #150
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    845
    Thanks
    425
    Thanked 240 Times in 97 Posts
    The beta branch of FSE
    now supports experimental U16 input mode :
    https://github.com/Cyan4973/FiniteSt...ropy/tree/beta

    It means it can accept alphabet size > 256.
    Alphabet size can be defined using #define MAX_NB_SYMBOLS, at the top of fse.c

    Current default value is 286, which matches zlib alphabet (literals + EOF + Match).

    This is still beta, and although it passes tests, still needs to be tested a bit more before being merged into master


    [Edit] : tests completed, now merged into master
    Last edited by Cyan; 5th February 2014 at 11:29.

  32. The Following 2 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (4th February 2014),samsat1024 (4th February 2014)

Page 5 of 7 FirstFirst ... 34567 LastLast

Similar Threads

  1. Replies: 1
    Last Post: 17th February 2014, 23:05
  2. ENTROPY 0.5
    By comp1 in forum Download Area
    Replies: 3
    Last Post: 7th February 2013, 00:21
  3. Parametric approximation of entropy(rate) function
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 7th April 2012, 22:58
  4. State tables
    By Piotr Tarsa in forum Data Compression
    Replies: 3
    Last Post: 29th January 2012, 23:14
  5. What is best for a pure Entropy Encoder?
    By biject.bwts in forum Data Compression
    Replies: 4
    Last Post: 28th September 2011, 12:45

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •