Results 1 to 4 of 4

Thread: A possible trick for rangecoders

  1. #1
    Member
    Join Date
    Dec 2010
    Location
    Netherlands
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts

    A possible trick for rangecoders

    Lately I've been trying to understand how PARCODER from Shelwien works. And I may have found a possible trick to it that saves some bytes. Perhaps this could be useful for other rangecoders as well.
    Maybe this is completely outdated or maybe I've even overlooked something and it doesn't really work all too well, but I thought I should mention it anyway. Note that I'm quite a beginner with compression so I could use incorrect terms or explain something oddly/incorrectly perhaps. I'm just a beginner who thinks he may have found some trick to it

    In this compressor a 256 table is initialised with 01h for each byte and then the range gets divided by 100h (256) at the first stage. But I figured that often not all bytes are used in the input file, especially not with text files. Which means that you could divide the range by a smaller number and thus create bigger ranges to work with.

    In the output file the first byte from the actual compressor engine is the carry byte which, as far as I can tell, never gets higher than 01h (Heck, I've never even seen it being a 01h ). From this carry byte that leaves us with 7 option/switch bits, and to be safe I'll use the left-most bit for this trick to indicate whether the trick should be used (1) or not (0).

    In the initialisation stage create a 256 table with all dwords being 00h's (This table will later on be used by the compressor engine instead of the default 256 table with all 01h's) and assign a 01h for each byte that's used in the input file. Then count the number of different bytes that are used (The 01h's) +1 to get the new divisor. This new divisor will always be higher than 01h and below 100h (Or 100h if all bytes are used, in that case skip the whole thing) which means that you can nicely store it as a byte. Store it after the carry byte and set (1) the left-most bit in the carry byte.
    The only downside to this trick is that you also need to store the new 256 table in the output, you can do that with 8 dwords (Append them after the new divisor byte) where each bit stands for a byte in the 256 table and whether it's 00h (0) or 01h (1) (Or perhaps use somekind of compression for this if possible) and thus you'll lose 32 bytes +1 for the new divisor byte, so 33 bytes total are extra added to the output.
    You actually need to somehow check if the new divisor saves more than 33 bytes, if not then just use the default 256 bytes. I'm not really sure how this can be checked efficiently.

    So yeah, there you go. The range gets divided by the number of actual bytes used from the input file. From a test with book1bwt, compression was improved with 233 bytes.

    I hope this made sense and is somehow useful (And that I didn't make a mistake somewhere), if not, oh well

    Regards,
    Last edited by TheVoid; 22nd February 2011 at 11:33.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. Parcoder is a proof-of-concept for rc vectorization, there's not much sense to tweak its compression.
    The model for parcoder was introduced somewhere in aridemo series, and there're other models too, some
    might be better for bwt data.

    http://compression.ru/sh/aridemo0.rar
    http://compression.ru/sh/aridemo1.rar
    http://compression.ru/sh/aridemo2.rar
    http://compression.ru/sh/aridemo3.rar
    http://compression.ru/sh/aridemo4.rar
    http://compression.ru/sh/aridemo5.rar
    http://compression.ru/sh/aridemo6.rar
    http://compression.ru/sh/coders6a.rar
    http://compression.ru/sh/coders6b.rar
    http://compression.ru/sh/coders6c2.rar

    2. I don't really understand what you tried to describe, but surely its possible to encode bitmaps of used symbols or
    even frequency tables, and it would slightly improve compression of order0 coder.
    However, it doesn't scale so well, and makes things really complicated when used with higher order models

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    ??? ? ??? ??????????? - ?? ???? ???????????? ????? ? ??????? ????????? ??????????? ????????
    ???????? ? 1 ?? 0. ?? ??? ??????? ????? 233 ????? ?? book1bwt

  4. #4
    Member
    Join Date
    Dec 2010
    Location
    Netherlands
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So I wasn't making much sense afterall

    I guess in order to understand what I was talking about you would need to know how it works on Assembly level. But it seems 2 pretty much answers it Shelwien.

    And thanks for the links Shelwien, I'll check those out.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •