22nd February 2011, 11:30
A possible trick for rangecoders
Lately I've been trying to understand how PARCODER from Shelwien works. And I may have found a possible trick to it that saves some bytes. Perhaps this could be useful for other rangecoders as well.
Maybe this is completely outdated or maybe I've even overlooked something and it doesn't really work all too well, but I thought I should mention it anyway. Note that I'm quite a beginner with compression so I could use incorrect terms or explain something oddly/incorrectly perhaps. I'm just a beginner who thinks he may have found some trick to it
In this compressor a 256 table is initialised with 01h for each byte and then the range gets divided by 100h (256) at the first stage. But I figured that often not all bytes are used in the input file, especially not with text files. Which means that you could divide the range by a smaller number and thus create bigger ranges to work with.
In the output file the first byte from the actual compressor engine is the carry byte which, as far as I can tell, never gets higher than 01h (Heck, I've never even seen it being a 01h ). From this carry byte that leaves us with 7 option/switch bits, and to be safe I'll use the left-most bit for this trick to indicate whether the trick should be used (1) or not (0).
In the initialisation stage create a 256 table with all dwords being 00h's (This table will later on be used by the compressor engine instead of the default 256 table with all 01h's) and assign a 01h for each byte that's used in the input file. Then count the number of different bytes that are used (The 01h's) +1 to get the new divisor. This new divisor will always be higher than 01h and below 100h (Or 100h if all bytes are used, in that case skip the whole thing) which means that you can nicely store it as a byte. Store it after the carry byte and set (1) the left-most bit in the carry byte.
The only downside to this trick is that you also need to store the new 256 table in the output, you can do that with 8 dwords (Append them after the new divisor byte) where each bit stands for a byte in the 256 table and whether it's 00h (0) or 01h (1) (Or perhaps use somekind of compression for this if possible) and thus you'll lose 32 bytes +1 for the new divisor byte, so 33 bytes total are extra added to the output.
You actually need to somehow check if the new divisor saves more than 33 bytes, if not then just use the default 256 bytes. I'm not really sure how this can be checked efficiently.
So yeah, there you go. The range gets divided by the number of actual bytes used from the input file. From a test with book1bwt, compression was improved with 233 bytes.
I hope this made sense and is somehow useful (And that I didn't make a mistake somewhere), if not, oh well
Last edited by TheVoid; 22nd February 2011 at 11:33.
22nd February 2011, 12:58
1. Parcoder is a proof-of-concept for rc vectorization, there's not much sense to tweak its compression.
The model for parcoder was introduced somewhere in aridemo series, and there're other models too, some
might be better for bwt data.
2. I don't really understand what you tried to describe, but surely its possible to encode bitmaps of used symbols or
even frequency tables, and it would slightly improve compression of order0 coder.
However, it doesn't scale so well, and makes things really complicated when used with higher order models
22nd February 2011, 13:03
??? ? ??? ??????????? - ?? ???? ???????????? ????? ? ??????? ????????? ??????????? ????????
???????? ? 1 ?? 0. ?? ??? ??????? ????? 233 ????? ?? book1bwt
22nd February 2011, 19:42
So I wasn't making much sense afterall
I guess in order to understand what I was talking about you would need to know how it works on Assembly level. But it seems 2 pretty much answers it Shelwien.
And thanks for the links Shelwien, I'll check those out.
Tags for this Thread