27th December 2015, 02:03
New here: Misc Efforts
so, I have made various compressors generally for other purposes.
a few designs are at least vaguely notable:
which while seemingly otherwise a slight tweak on Deflate, gets BZIP2 like compression without sacrificing much in terms of speed, and fairly easily allows a encoder/decoder which is bitstream compatible with Deflate (IOW: less code if Deflate is also in use).
on my K10 it can get in-memory decode speeds of around 475 MB/sec, though falls behind ZLIBH for pure Huffman benchmarks (~ 140MB/s vs 205MB/s on this PC).
and a video codec:
which while not winning much in terms of epic speeds or bitrate, does at least do pretty ok at both, and also has fairly solid encode speeds (so, works pretty good for screen capture). it is lightweight enough to do real-time video encoding using a single CPU thread on the Raspberry Pi 2, and can stream webcam video from the original RasPi (mostly for robotics uses).
ADD: decoding a video to BGRA via its VfW API (single threaded), get around 150 Mpix/sec (1680x1050 at ~90 fps).
XviD generally gets around 105 Mpix/sec, and H264 gets around 160.
some of my past faster VQ codecs got around 200 Mpix/sec per thread on the BGRA path, but had a worse bitrate and quality. this one can at least give ok video quality at 0.2-0.4 bpp or so (video in question was 0.12 bpp).
ADD2: turns out a bug caused the benchmark to do needless BGRA2RGBA conversion, fixing gives around 225 Mpix/sec. another optimization (skipping decoding skipped blocks) bumped decode to ~ 280-384 for BGRA.
it uses a combination of SMTF and AdRice for a lot of its entropy coding. it was based on an earlier MTF+AdRice based LZ77 compressor, which gave better compression than Deflate in the sub 500-1000 byte range, due mostly I think to having a lower constant overhead.
though not quite as fast, it does show up ok in entropy benchmarks:
where I had added a few of mine at the bottom:
BGB ZH0 = BTLZH being used as a Huffman coder (basically, LZ matches are disabled in the encoder, but this is about it);
BGB MTFRice0 = SMTF+AdRice.
note SMTF here:
noting that it is at least in a similar speed range to some of the other Huffman compressors.
granted, its compression is worse in many cases than static Huffman, but tends to be pretty close, and has the partial advantage of supporting adaptive coding, and is reasonably simple and cheap (relatively little state or memory is required compared with Huffman, and less code is needed to encode or decode it).
while lookup tables are used (for speed), they are effectively constant (ex: you could put them in ROM on an MCU or similar).
granted, it may be the case that FSE or ANS renders all this fairly moot, but oh well...
Last edited by cr88192; 28th December 2015 at 20:01.
The Following User Says Thank You to cr88192 For This Useful Post:
Cyan (29th December 2015)
29th December 2015, 01:03
It's a good thing to try these variations.
Even if they don't end up beating world records, they help you get a better understanding of how it works.
This objective alone is worthwhile.
The Following User Says Thank You to Cyan For This Useful Post:
cr88192 (29th December 2015)
29th December 2015, 04:42
yeah. the entropy coding strategies are neither the best compression nor the fastest around.
but, at least, they are reasonably straightforward to implement and are reasonably well behaved.
in my case, both the AdRice coding and Huffman coding are handled pretty similarly. both basically boil down to "here are some bitstream bits, shove them through a lookup table and advance the steam N bits". SMTF adds a few conditionals and some swapping, but manages not to kill speed. while it seems like it should suck pretty bad in terms of compression, it manages to work pretty ok (pretty similar to Adaptive Huffman, but much faster). the "Adaptive" aspect is given by a 4-bit k-factor, which is updated for each symbol (via the decode lookup), along with updating the contents of the SMTF table.
for the video codec, blocky-VQ may also seem like archaic nasty technology (vs DCT codecs, or other transform-based designs), but I had been motivated by blocky-VQ technology mostly because IME it seems to go so much faster, and with a lot less "heavy lifting" needed to make it fast, than is common with other technologies. though, blocky VQ isn't really new, most traditional blocky-VQ codecs (such as RPZA or Cinepak) tend to be pretty much awful in terms of image quality and bitrate. combining the speed of VQ with better compression and image quality seems like it could be useful for at least some use-cases.
though, I suspect most people automatically jump to the conclusion of looking down on it, given the amount of "dude, just use MJPEG / H.264 / ..." responses I have gotten. but, I have my reasons, and the bitrate/quality isn't actually all that bad (ex: 1 hour of 1050p30 screen-capture in 4GB, averaging 0.16bpp and 8Mbps, while still having pretty acceptable video quality, and using about 10-15% CPU load).
ADD: my main BTLZH encoder (BTLZA) does support Range coding in a similar manner to that used in VP8/VP9 (bitwise range coding applied to the Huffman-coded output), but it isn't really used as it rarely did much to improve compression, but had the drawback of making it a lot slower.
was working on CBR encoding for BTIC1H, but for full-motion 720p or 1080p, quality goes to crap much under 8Mbps (looks ok at 8Mbps, but 3-5 looks pretty poor). better could be possible if the encoder were smarter, but tradeoffs are made in the name of being able to do real-time encoding.
Last edited by cr88192; 30th December 2015 at 02:19.