Results 1 to 12 of 12

Thread: BCM and openbwt 1.5

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts

    BCM and openbwt 1.5

    Hi all, I am trying to see effects of all BWT + SSTs combined with CM. openbwt 1.5 has a nice sample program including many SSTs, but followed by RLE and Simple Structured Coding Model (test_encoder2), and BCM has bwt + CM. I plan to use sst_test.c in openbwt, but replace test_encoder2 with BCM ( - bwt of course ). My question: what are the meanings of inputs and outputs of

    int test_encoder2(const unsigned char *T, const int *A, int n, int cs)

    I understand T is the input buffer, n is the size of T, return value is size of compressed data, but what about A and cs?

    Tried to contact Yuta Mori directly, but it seems hard to find his contact info through google.

    Thx in advance for any help.

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    *A is temporary space for frequency/bucket based transforms, cs is the alphabet size in bytes (1 for char, or 4 for integer), this mainly used for integer packing schemes like the distance coder.
    Hope this helps.

  3. The Following User Says Thank You to Lucas For This Useful Post:

    smjohn1 (13th February 2018)

  4. #3
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    So if I use CM as in BCM, just ignore cs then? What about A? just encode A like T ?

    Quote Originally Posted by Lucas View Post
    *A is temporary space for frequencies, cs is the alphabet size in bytes (1 for char, or 4 for integer), this mainly used for integer packing schemes like the distance coder.
    Hope this helps.

  5. #4
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Yes you'd use cs = 1. As for *A the best compressing transform in OpenBWT is the sorted rank coder, so you'd need to encode *A as a header of 256 integers, I think BCM has it's own integer arithmetic coder so it should be easy. However it is worth noting that RLE0 adds an extra character into the alphabet (257 symbols) so you'd need to add an extra bit into BCM's arithmetic coder to account for all possible symbols.

  6. #5
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    In my opinion; instead of merging BCM with OpenBWT it'll likely be easier to fuse specific second stages from OpenBWT into BCM. Worth noting that OpenBWT only keeps track of the total shannon entropy which is ludicrously accurate, you can expect a small compression loss of about 1% if you use a real arithmetic coder. I took the liberty of deobfuscating sorted rank coding last year https://github.com/loxxous/Jampack/blob/master/rank.cpp which you can easily use as a drop in for BCM, just encode the frequency data somewhere in the output stream and you're off to the races. (Maybe replace the Error function dependency with an exit(0) to make your life easier.)

  7. The Following 2 Users Say Thank You to Lucas For This Useful Post:

    Bulat Ziganshin (14th February 2018),smjohn1 (13th February 2018)

  8. #6
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    Thx! See private msg ( in order not to bother other people with details ... )

    Quote Originally Posted by Lucas View Post
    In my opinion; instead of merging BCM with OpenBWT it'll likely be easier to fuse specific second stages from OpenBWT into BCM. Worth noting that OpenBWT only keeps track of the total shannon entropy which is ludicrously accurate, you can expect a small compression loss of about 1% if you use a real arithmetic coder. I took the liberty of deobfuscating sorted rank coding last year https://github.com/loxxous/Jampack/blob/master/rank.cpp which you can easily use as a drop in for BCM, just encode the frequency data somewhere in the output stream and you're off to the races. (Maybe replace the Error function dependency with an exit(0) to make your life easier.)

  9. #7
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    Quote Originally Posted by smjohn1 View Post
    Hi all, I am trying to see effects of all BWT + SSTs combined with CM
    You can already do that with Kanzi (with a very limited choice of SST) by providing the BWT as transform, say RANK+ZRLT as SST and CM as the entropy coder.
    My experience is that it is usually better (compression wise) to CM encode directly the output of the BWT as the SST tends to destroy correlations that the CM can encode efficiently.
    SSTs work well when the entropy coder is weaker.
    Last edited by hexagone; 14th February 2018 at 02:49.

  10. The Following User Says Thank You to hexagone For This Useful Post:

    smjohn1 (13th February 2018)

  11. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Context Mixing by itself is just generic technique that allows to mix up predictions from multiple sources

    BCM directly encodes BWT output, hence its 3 models are optimized for this particular data

    OTOH, BSC contains 3 another models, which are optimized for the compression of MTF output, so it provides efficient compression with SST

    The net result is that BCM and BSC compression rates on ENWIK are pretty close, but BCM sources are an order of magnitude shorter, while BSC is several times faster because it processes data after RLE

    Now, the main question - why are you trying all these things? Do you just performing research trying to find something new? Or you need to reach some particular speed/compression ratio? Or you interested in any breakthrough? I ask that because I have some ideas how to improve speed/compression ratios both for BCM and for ST+SST, including GPU support.

    One particular domain which isn't yet implemented properly is fast BWT decompression - Lucas, may be you want to finish that using my guide?

  12. #9
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    One particular domain which isn't yet implemented properly is fast BWT decompression - Lucas, may be you want to finish that using my guide?
    Ah a challenge to improve my old bwt work, when I have time yes, I'll try release my next project early and get back to bwt. By 'guide' are you referring to your data compression research page on Github? PM me your ideas or maybe start a new thread to discuss them, you've got me interested.

  13. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    It's not much, so i will write here: you tried to figure out how many "cores" GPU contains and then split work into equal parts. Instead, you should have thousands of work batches and just start a kernel with one thread per batch. It's up to GPU to split these jobs between threads. Anyway, each GPU "core" can run 7-16 threads simultaneously. The proper implementation should reach speed of 1 GB/s, since GF970 can perform more than 1e9 random memory accesses per second

    There are a lot of other BWT/ST-related optimizations I can propose to implement. In particular, there are two efficient GPU BWT implementations (in NVBio and CUDPP), so it would be great to see them employed by real BWT compressor.

    The rest of my ideas are based on experience with Compression-Research suite and reading through all important BWT papers (of first 10-15 years of BWT existence). In short, these are about making super-fast post-BWT coder, including (pre-BWT) LZP, QLFC and entropy coder. Compression ratio should be ~10% worse than BSC, with speed 1000 MB/s with GPU and 500 MB/s on plain i7-8700

  14. #11
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Bulat Ziganshin View Post

    Now, the main question - why are you trying all these things? Do you just performing research trying to find something new? Or you need to reach some particular speed/compression ratio? Or you interested in any breakthrough? I ask that because I have some ideas how to improve speed/compression ratios both for BCM and for ST+SST, including GPU support.
    So the reason is to continuing my post on compressing CT like pixel data. As already posted in the other post, simple prediction + BCM turns to be quite well, at the compression ratio on par with JPEG2000 and JS-Charles. But then I read a paper claiming BTW+IFC+simple structure encoder yields much compression ratio ( 10% ) than JPEG2000 or JS-Charles, which I cannot verify, since there is no source code available. So I'd like to see if different SSTs coupled with CM would be yield better results, and openbwt-v1.5 has a nice package of many SSTs included. Lucas has kindly given me very helpful info of how to integrate CM from BCM to openbwt-v1.5, and I will do tests once I have more spare time.

    If you could give a pointer to BSC source code, I can test it too.

    Sometimes, practical results are quite far from theoretical expectations.

  15. #12
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    31
    Thanked 22 Times in 15 Posts
    About image codec i have a mostly experiments image transform ,color space + prediction(speed results do not include the transform)
    girl.ppm
    Code:
          463061    19.6       9.55       9.02   bcm                              out
          471246    20.0      13.58      11.15   bcmec                            out
          487385    20.7      34.62      48.56   fpaq0p_sh                        out
          487413    20.7       3.55      80.62   lzma 9                           out
          501646    21.3     523.82     745.20   fse                              out
          514900    21.8     155.22    1627.10   fpc 0                            out
          524309    22.2     724.82    1422.12   fpc 16                           out
          532622    22.6     786.70    1452.77   fsehuf                           out
          553203    23.4       3.47     292.21   zlib 9                           out
         1051947    44.6     354.57    2277.32   trle                             out
    abandoned factory.ppm
    Code:
         1669411    21.2       9.10       7.28   bcm                              out
         1676821    21.3      13.50      10.98   bcmec                            out
         1720427    21.8      33.62      46.67   fpaq0p_sh                        out
         1769266    22.4       2.68      73.77   lzma 9                           out
         1772181    22.5     535.22     769.41   fse                              out
         1791522    22.7     150.58    1625.41   fpc 0                            out
         1823733    23.1     715.83    1421.21   fpc 16                           out
         1848250    23.4     776.99    1350.61   fsehuf                           out
         1930873    24.5       2.58     287.94   zlib 9                           out
         3583961    45.5     343.97    2151.98   trle                             out

Similar Threads

  1. OpenBWT-2.0.0 development
    By smjohn1 in forum Data Compression
    Replies: 11
    Last Post: 7th November 2017, 17:45
  2. BCM 0.12 is here!
    By encode in forum Data Compression
    Replies: 89
    Last Post: 19th June 2013, 22:57
  3. BCM's future
    By encode in forum Data Compression
    Replies: 17
    Last Post: 9th August 2009, 01:00
  4. BCM v0.05 is here! [!]
    By encode in forum Data Compression
    Replies: 19
    Last Post: 8th March 2009, 21:12
  5. BCM v0.03 is here! [!]
    By encode in forum Data Compression
    Replies: 25
    Last Post: 14th February 2009, 15:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •