Results 1 to 24 of 24

Thread: State of the art byte compression (for 8-bit computers)

  1. #1
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts

    State of the art byte compression (for 8-bit computers)

    I suppose I have a very special interest. I write programs for ZX Spectrum and compression techniques that we tend to use there are all relatively simplistic. I wrote a survey of what is available for a Z80 programmer last year; you can find it here (if you don't mind that it is in Russian): http://hype.retroscene.org/blog/dev/740.html (being one year old it is not fully up to date anymore, but still mostly relevant). Historically, the compression on ZX Spectrum went from RLE, to byte-based LZ77-style affairs, to mixed bit/byte-based LZ77-style variations. People tried and implemented compressors using Huffman codes (Exomizer in my survey uses it), but it tends to be useful only at the top end of compression. There are attempts at using arithmetic coders, even LZMA, but they tend to be very narrow in scope, because of the performance issues and memory limitations during decompression. Useful decompressors cannot be too complex (so that they do not occupy too much memory). Useful decompressors cannot use (large) memory buffers (Exomizer uses ~160 bytes and it is already considered a limitation).

    Basically, competitive compressors at the top end tend to be sophisticated variations on the LZ77 pattern, e.g. with partial matches (e.g. "Hrust 1") or with mechanisms for re-using recent offsets (e.g. "Aplib"). They have to have well-optimized compressors, which do not need to be particularly efficient, because no-one ever deals with files much larger than ~40Kb. These techniques are fairly well understood and mature.

    However, if one is interested in achiving high decompression speed, one pretty much has to use byte-based LZ77-type compression techniques, which are mostly forgotten since the 1990s. Hence, most of these methods are now being re-discovered, partly due to the reneissance of such methods on PCs. For example, there are several ports of LZ4 for Z80. If one plots a diagram showing compression ratio vs decompression speed, one can make a guess about Pareto frontier as applied to Z80-based systems, such as ZX Spectrum. This is what the result from my tests looks like:

    Click image for larger version. 

Name:	3beafc.png 
Views:	277 
Size:	131.3 KB 
ID:	6119

    It is therefore clear to me that there is a substantial gap in what is currently available. It is very likely to be possible to have compressors with relatively high decompression speeds that will compress significantly better than LZ4. However, my experience tells me that to retain high decompression speeds we pretty much must forget about using bit-based (or mixed bit/byte-based) data streams. This is where my question is.

    What would you consider to be the current "best" (subjectively, of course) byte-based compression techniques? I tried to search around, but it is not always easy to find explicit information about specific compression methods used. I can see that LZF would be an example of such technique, it definitely compresses better than LZ4, but still quite far behind even the simplest mixed bit/byte-based LZ77 compressors. How close can the byte-based compressors get to the mixed ones? Are there any (many?) successful examples of such compressors?

  2. The Following User Says Thank You to introspec For This Useful Post:

    svpv (9th November 2018)

  3. #2
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Is this LZ4 or LZ4HC? It seems that with such a small input file you can go for optimal parsing, so the limitation then becomes the file format. LZ4 is a simple format so even with optimal parsing it's not going to match some other tools, but it may work OK.

    There's also Lizard (formally LZ5): https://github.com/inikep/lizard

  4. #3
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Well, to obtain my results I used smallz4, so it is an optimal parser indeed. I had a look at Lizard as well. Its byte-based version ("lizard -29") has worse compression ratio than LZF.
    Starting with "lizard -30" it is using some form of Huffman, which makes it definitely too slow in my book (and frankly, "lizard -49" compresses only about 1% better than LZF, so it is worse compression-ratio-wise than every single compressor in my diagram, except for LZ4).

  5. #4
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    - in 1995 - there was tools on CPM-Z80 like ARK / ARC

    - in 2018 - i think for Z80 you should try lzo - compression:
    (http://www.oberhumer.com/opensource/lzo/)

    - there is a lz4 - decoder - version for z80:
    (https://groups.google.com/forum/#!to...4c/A6TLHThL0c8)

    - decoders (for z80) exists too for *.arj , *.zip

    best regards

  6. #5
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    @joerg, I am pretty sure that ARC used LZW. I would be very surprised if it can be competitive without Huffman on top and, either way, it is definitely would not be fitting my specified memory footprint. I do know *.arj, *.zip, even *.rar (old versions) can be decompressed on Z80. However, the resulting performance is very, very poor. The memory footprint is also not very good. Hence, none of these standard solutions fit the described profile. None of these are byte-based packers, by the way.

    In addition, I do know about at least four different LZ4 decoders for Z80; in fact, I wrote the one that is closest to Pareto frontier on my diagram. However, I asked about formats that would increase the compression ratio in comparison with LZ4, as much as possible, while not using mixed bit/byte data stream formats.

    Last, but not least, I am currently trying to understand decompressor for LZO1X-999; I find the code really impenetrable. However, yes, this one is on my todo list, because its compression ratio is halfway between LZ4 and ZX7 (which is a ZX-specific simplistic bit/byte packer with the optimal parsing) and because its speed seems to suggest that it may be byte-based.
    Last edited by introspec; 23rd August 2018 at 01:04.

  7. #6
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    for 8-bit era, I remeber PMARC did the best.

  8. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I think BWT might be interesting there. bzip2?
    Simple unBWT is like 3 lines of C code and MTF/RLE are not much more either.
    A ready-to-use version fit for ZX might not exist, but should be easy to make.
    https://en.wikipedia.org/wiki/Burrow...eler_transform

  9. #8
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @introspec: there was a thread in this forum: Ultra-fast LZ from encode
    ---
    https://encode.ru/threads/550-Ultra-...ght=ultra+fast
    ---
    Since ULZ uses only Byte-I/O - no Huffman, no Arithmetic compression, even no Bit-I/O. I'm still wondering how with that simple compression format and that simplest decompressor (a few lines of a code) it achieves some compression
    ---
    http://compressme.net/#downloads ... Byte-aligned LZ77 compressor a la LZ4/LZOP with an optimal parsing and ultra fast decompression
    ---
    http://mattmahoney.net/dc/text.html ... the world's fastest decompressor , searching for ulz
    ---
    you should consult encode directly, because he has not published the source code or algorithm

    best regards

  10. The Following User Says Thank You to joerg For This Useful Post:

    encode (9th November 2018)

  11. #9
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    @roytam, I do not know much about PMARC, but I know that one of its compression formats - POPCOM - is a direct predecessor of the LZ7 (it is a LZSS with Elias Gamma code for match lengths and two types of offsets). Frankly, I am pretty sure that these old 1990s techologies are pretty much irrelevant nowadays, because no-one had optimal parsers at the time and even lazy-evaluation optimizers were not nearly as advanced as they are nowadays. Basically, modern compressors are very complex and adaptable compared to the compressors of the old times. However, what works best for 8-bit platforms are the techniques exploiting asymmetry to the maximum, because for small files relevant for 8-bitters, one can have true optimal or very close optimizing parsers on PC squeezing everything out of the specific compression format. Thus, the format itself comes to the forefront, and this is the reason why I am looking at formats so closely.

    @Shelwien, well, I'll say honestly, I am complaining about performance issues, and you propose BWT!!! However, I like this crazy idea, because it may well be useful for tiny intros. When I have more time, I'll definitely experiment with it.

    @joerg, ULZ has about 1% worse compression ratio compared to LZ4, this is why I am not considering it. However, I've already implemented LZF decompressor, for another encode's packer, and it is definitely in the correct ballpark for speed. It is not closing the gap completely, but it is reducing the gap size quite a bit.
    Last edited by introspec; 23rd August 2018 at 14:50.

  12. #10
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    909
    Thanks
    531
    Thanked 359 Times in 267 Posts
    Here is a site with C64 packers. As I remember there was a very efficient compressor called Cruel Cruncher but could be there some more powerful.
    ftp://www.zimmers.net/pub/cbm/c64/packers/index.html

  13. #11
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    @Darek, the comment for Cruel Cruncher says "1988 RGJ of ATG, Executable arithm. packer v2.2+". I guess it is very likely to be a compressor incorporating an arithmetic coder, i.e. definitely slow and definitely not something byte-aligned.

    It also says on the top of your page: "These crunchers and packers are presented here mainly for historical reasons. If you are looking for a fast and decent cruncher for serious use, check Pasi Ojala's PuCrunch, a cross-platform cruncher." Funnily enough, a diagram that I included in my first post contains the results of my tests for PuCrunch. It is clear that, from the point of view of a modern Z80 programmer, PuCrunch is also no longer relevant. Its compression ratio is worse than Exomizer, ApLib or Hrust 1; its decompression speed is slower than the top-compressing Exomizer.

  14. The Following User Says Thank You to introspec For This Useful Post:

    Darek (23rd August 2018)

  15. #12
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Since some posts here disappeared, I am going to write a brief summary of what did I find about byte-based compression since I started this thread.

    1. By far the best discussions of the issues and relevant ideas for byte-aligned compressors can be found in blog posts by Charles Bloom, esp. his series of posts: https://cbloomrants.blogspot.com/201...nclusions.html

    2. I do not have access to Charles Bloom's Oodle compression suite. It is very likely, from the available compression test results, that Oodle's faster packers (Selkie and, possibly, Mermaid) are byte-aligned. Their modern design means that they are very likely to be the best currently available compressors for compression ratio vs decompression speed. They seem to tick all the boxes in the requirements listed in my original post. Nevertheless, since none of my programming projects pay any money, and Oodle is neither publicly available nor cheap, I cannot really discuss it.

    3. These are some selected results from my recent tests:
    Code:
    lzoma 0.2 c,7,100000                       138732
    Exomizer 2.11                              139579
    Appack (ver. by r57shell, 17/10/2017)      140724
    
    ZX7                                        153912
    bCrush 0.1.0 --optimal                     154826
    Crush 1.1 -9                               155697
    Crush 1.00 cx                              157332
    --------------------------------------------------
    lzop 1.03 -9                               158095
    LZ5 1.4.1 -15                              158632
    lzop 1.03 -7                               158654
    
    doboz                                      161924
    LZ5 1.5.0 -15                              163431
    
    LZF 1.03 cx                                167634
    Lizard -29                                 170601
    
    smallz4 1.2 -9                             173235
    --------------------------------------------------
    Uncompressed:                              314069
    Compressors above the first horizontal line all use mixed bit/byte streams. This flexibility gives them higher compression ratio. Exomizer and aPLib have Z80 decompressors and both are fairly popular in appropriate applications. All compressors below the first horizontal line use byte-aligned formats of compressed data. Crush, funnily enough, seems to have a kind of compression ratio that is approximately at the borderline of what seems possible with publicly-available byte-aligned compressors. Otherwise, Crush trails quite far behind most compressors using mixed compressed streams.

    4. The decompression speed for LZ4 on Z80 is very high, about 3x the fastest possible copying speed or 1.5x the "default" copying command LDIR. Thus, I am not really interested in compressors with compression ratio below LZ4. Instead, I am more fascinated by the top-end of the byte-aligned compression, mainly because this is where, in my opinion, new Pareto-optimal compressors can be made. In this category there are basically three compressors:
    • LZOP contains an implementation of LZO1X, the best byte-based compression algorithm in this test. It is interesting, I party read it, but I am still studying it and would love to discuss it at some future point in time. However, the algorithm is somewhat complex, source code is seriously unorthodox and documentation is next to non-existent, so I am not overly keen.
    • LZ5 v.1.4.1 is almost as good, but comes with much clearer documentation and clarity of format and is pleasure to work with. The key to its good compression ratio seems to be in its use of "repeated offset" command. I implemented a simple WIP decompressor for Z80 and its speed is amazing given its compression ratio. This is especially relevant due to the fact that part of the compressed data format responsible for long (3-byte offset) matches cannot be used for small Z80 files. Thus, there is a clear scope for modifying LZ5 in ways that would benefit Z80 implementation very significantly. At the same time, other versions of LZ5/Lizard represent really unpredictable offerings regarding the compression ratio. The fact that LZ5 1.5 compresses much worse that LZ5 1.4.1 is weird. It seems that the author is mostly concerned with the compression/decompression speed, so the compression ratio gets overlooked.
    • Doboz also looks neat, but it cannot really compete with LZ5 v.1.4.1.
    • LZF offers a fair improvement in compression ratio compared to LZ4 but cannot compete as is with either LZO or LZ5. However, given the difference in compression ratios between LZ4 and LZ5, I am pretty confident that the addition of repeated offsets to LZF is likely to produce a byte-aligned compressor with the ratio that would be able to compete with Crush.
    Last edited by introspec; 9th November 2018 at 15:13. Reason: Added compression results for ZX7

  16. The Following User Says Thank You to introspec For This Useful Post:

    jibz (9th November 2018)

  17. #13
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Some random thoughts...
    1) It would be nice if something like that is also ranked/labeled by presence of compressor/decompressor source under some reasonable license (e.g. "any OSI-approved license" isn't bad watermark I guess). That could help to evaluate some (de)compressors for possible other strange uses that are somewhat similar. I've attempted lurking on Z80 resources but it felt more like wares scene of 90s where nobody gives crap about licensing. It maybe fine for some informal uses but kills off many other uses where similar compression properties can be handy as well.

    2) In some selected cases/configurations LZ4 manages to decompress faster than memcpy... that would be due to reduced bus contention in one direction I guess. BLOSCLZ manages to do it more often on "larger" ARMs, but its ratio just awful when it does.

    3) As for LZ5 1.x: take a look on actual git commits, including -dev branch of 1.x flavor. Inikep has been changing both levels and parameters he uses on match finder, likely "balancing" levels or so. So -15 could mean different things across versions and drawing some conclusion of that is weird. It is better to see which matchfinder is actually used and which parameters it used - there is some very specific table that defines levels vs matchfinders vs parameters in source. Speaking for myself I've managed to take on crush's ratio (original one ) on Linux kernel after tweaking matchfinder parameters to be slower but search harder. It pays for itself on things like Linux kernel since there're e.g. some large areas of zeros to chop out. So obviously longer matches improve ratio, at cost of speed. I guess it could be further improved as something like RLE + LZ, but it would complicate things.

    4) As for crush vs ratio... isn't exomizer using something like huffman? Though it source looked rather complicated and statement prohibiting commercial use made me rather unmotivated, if I remember correctly. Lzoma goes for rather advanced features that pay for itself and strong "optimizing" parsing. So telling crush "trails behind" is only partially true I guess.

    And I'm curious... can you share some of your data or something similar? I'm curious what your bitmaps could look like (in sense of data properties vs compression). I know some strange kinds of data can give "strange" results - and I've got idea some of your data are of this kind as well.

  18. #14
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    1) I understand your concerns regarding the licensing, but the reality of it is that there is no real commercial software development for Z80 for quite a while now. I do not belong to a warez scene and a lot of people programming Z80 nowadays would do it mainly for unusual challange or simply nostalgic reasons. I am not saying licenses are somehow bad; they just represent something very remote from reality for hobbist programmers. Reading this forum I realized the limitations of this approach and will definitely try to include licenses for at least my own work in the future. From my notes:
    * LZOMA comes with a GPLv2;
    * Exomizer sources have something custom prohibiting commercial exploitation, but Z80 decompressors are under LGPLv2.1;
    * aPLib comes with a custom freeware license, its Z80 decompressors are written by the same people who wrote Z80 decompressors for Exomizer, so I am guessing LGPLv2.1 can be obtained by contacting the authors.

    2) Yes, I am aware of this, but the reason for this is mostly to do with the much lower memory bandwidth compared to the CPU speed. If you have a Z80-based machine, CPU is the slow guy, since memory bandwidth is rarely limited to the same extent. You are more likely to benefit from realtime decompression of disk data or, say, tape data. Hence, the main application of fast decompression is not for speeding up data copying, but for using realtime decompression of program data. Then the combination of speed and compression ratio offered by LZ4 is extremely compretitive.

    3) You are right, I'll have to tweak the LZ5 compressor some more; I only used ready-made binaries thus far. Anyway, like I said earlier, I am very interested in trying to made a modification of LZ5 designed specifically for Z80. This way I can re-tune the file format to be more convenient to decompress, remove the inefficiency caused by support of very long offsets, irrelevant on Z80 and also play with other features. I do not have much time to do it at present, but it is definitely on my TODO list now.

    4) Exomizer is a somewhat strange beast, with a small Huffman tree transferred as part of the compressed data. It is one of the very few *practical* compressors with entropy coding on Z80. It comes at a cost: it is about 3 times slower at decompression compared to aPLib, it requires the 156 byte memory buffer for decompression (the only one in the compressors shoot out in my first post). It is used quite a lot, both on Z80 and 6502 (which is its home platform). Similarly to what you mentioned elsewhere, I do not consider Exomizer an LZ77/LZSS type coder, not in the classical sense, and I would draw the line similarly to you, by stating that it has to have a memory buffer to operate. However, to the best of my knowledge, there exist packers for Z80 which will use Huffman tree and will simply keep traversing it realtime during decompression. Hence, they would be crazily slow, but won't need any extra memory buffers. So "no memory buffer" is an unreliable borderline. Anyway, the reason it is in my list is simply to indicate where the compression ratios achieveable with entropy coding are.

    More to the point regarding Crush; the main reason why I talk about "trailing" is to do with the fact that Crush is behind the simplest mixed bit/byte compressor in my tests above (I think I talked about it in one of my lost posts). Check out the diagram with my last year's tests: https://encode.ru/attachment.php?attachmentid=6119
    All commonly used compressors fill the space between Exomizer (best compressing) and ZX7 (fastest decompressing, while using mixure of bits and bytes). LZ4 is standing really far apart. This thread is mostly about me working to fill the gap between LZ4 and ZX7, basically. Now, ZX7 is a modern compressor, originating from Spectrum community. It has a windows compressor with optimal parsing and very well tuned decompressors. It is also very primitive (LZSS-style bit signalling literals/matches, Elias Gamma codes for match lengths, 7/11 bit codes for offsets, with an extra bit selecting the current range of offsets). Basically, in my personal opinion, you would struggle to be more primitive than ZX7 in a modern mixed bit/byte compression. Unsurprisingly, it does not actually compress all that well compared to the competition. Crush, even with optimal parsing, is definitely behind ZX7 in all of my tests, and this is the reason why I do not believe it can be relevant on Z80. Just to illustrate my point, I will now edit my table with compression results and add ZX7 results to it.

    Last, but not least. All my tests are based on two test sets.
    * http://introspec.retroscene.org/comp...estset_full.7z is my main test set, which is used to generate diagrams like the one I attached and measure decompression speeds.
    For practial reasons, I am also using a much smaller test set that is a subset of the main test set:
    * http://introspec.retroscene.org/comp...set_express.7z

    Files in both sets are organized into 5 folders:
    * calgary - all files from the Calgary corpus <=64Kb.
    * canterbury - all files from the Canterbury corpus <=64Kb.
    * graphics - uncompressed ZX Spectrum-specific images in several formats.
    * music - uncompressed ZX Spectrum music in several tracker formats, as well some beeper music, which would also contain a replayer code.
    * mixedzx - a mixture of actual binaries from games and demos, with mixed code and graphics. Some are uncompressed, some are partly compressed, some are fairly well compressed.

    In practice, folder "music" tends to behave similarly to folders "calgary" and "canterbury", whereas "graphics" is remarkably similar to "mixedzx".
    Last edited by introspec; 9th November 2018 at 18:05. Reason: Added the information about my test sets

  19. #15
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by introspec View Post
    How close can the byte-based compressors get to the mixed ones?
    I learned English from a reference manual of Z80 machine language and can symphatize with your cause.

    Gipfeli might be interesting to you. It codes symbols in 6, 8 or 10 bit sequences for literals for 7.77 % savings over byte based approaches like snappy or lz4, and it gets another 7.77 % from static entropy coding of distances and lengts. Gipfeli has been shadowed by more recent work and is not interesting to look at for general purpose compression, but it is a minimal step up from byte based compressors.

  20. #16
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    I guess, trying to run compression on things like microcontroller or unusual (pre-boot/bare metal/etc) environments or "strange" data for specific purposes, etc is also more of "unusual challenge" or curiosity than anything else. Though I try to keep it more pragmatic and some things could eventually find some real-world use. This prompts me to consider some more mundane things like licenses. It boring, in perfect world I wouldn't care of it, but in real world I consider it as "declaration of intentions" by program authors, to get idea what to expect next, if there is some catch or something, etc. This way it doesn't even looks totally silly.

    As for "CPU is the slow guy, since memory bandwidth is rarely limited to the same extent" - speeding up RAM is "rather exotic" by any standard. I've had some fun with 68xx 8-bit things, so I have coarse idea on tradeoffs.

    As for LZ5, tracking git history of 1.x (including -dev branch) could be not too bad idea to see all tweaks around match finders and levels tuning. At some point even number of levels changed (some LZ5 1.x versions had like 12 levels, some like 16 if I remember correctly). Guess some things were not released as binary and eventually it turned to (quite different) LZ5 2.0 that looks a bit like "zstd-lite". One can try to chop few extra bytes vs stock version by using more aggressive matchfinder limits. But I can imagine it mostly rewarding on large, well-compressible data. Somehow I caught myself I never looked on how it works for smaller data. I've asked Inikep once about "technical" limits and it seems he has been kind enough to put them into source, near levels table; overall appearance of table gives good idea how levels are created and what is changed. Though technical limits are basically where algo could/would break or lose compatibility with formal specs. Pushing to these may or may not make sense. Either way, what it outputs at maximum level isn't necessarily absolutely best it can do.

    As of crush... while it can't do wonders, using smaller W_BITS in bcrush (should match in both encoder and decoder) could help for "small" data in terms of ratio.
    Could look like this:
    - Best "stock" bcrush -x does on particular mixed binary: 262144 -> 130634
    - Best I was able to get with W_BITS changed adequately (256K file -> W_BITS=18, should match in encoder and decoder): 128991.

    Or 3600 bytes of code, stock bcrush -x goes for 2567, W_BITS = 17 goes down to 2518 instead (this file isn't terribly redundant, lzoma can do 2187 at best). Settings W_BITS below of block/file to chew on obviously hurts as well.

    Similar idea stays for many other files. Wouldn't beat LZOMA, but not terribly bad for virtually free lunch obtained in about ~5 minutes of looking on bcrush source . I guess for Z80-sized things are likely to be best with W_BITS=17 (seems to be technical minimum, below that things can break)

    As for filling gap... I see, there is large gap, and I can imagine it caused by difference between being "byte aligned" and "bit aligned" and resulting change in decoding complexity. I guess trying to fit in between could be challenging, possibly demanding some relatively unusual/hybrid approach. Maybe something in spirit of https://cbloomrants.blogspot.com/201...-12-lznib.html if goal is to fill gap I spotted? Also something from http://cbloomrants.blogspot.com/2012...ors-leave.html could make sense. Most obvious problem is that dealing with units smaller than byte when "cpu is a slow guy" could hurt speed considerably, especially if "slow guy" lacks efficient bit-level operations (I have no reasonable expertise in Z80 assembly)

  21. #17
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by xcrh View Post
    As for LZ5, tracking git history of 1.x (including -dev branch) could be not too bad idea to see all tweaks around match finders and levels tuning.
    Yes, I already downloaded all versions. I will go through the parser settings; however, I believe it will be more important to re-define the file format.

    The existing format looks like this:
    [1_OO_LL_MMM] [OOOOOOOO] - 10-bit offset, 3-bit match length, 2-bit literal length
    [00_LLL_MMM] [OOOOOOOO] [OOOOOOOO] - 16-bit offset, 3-bit match length, 3-bit literal length
    [010_LL_MMM] [OOOOOOOO] [OOOOOOOO] [OOOOOOOO] - 24-bit offset, 3-bit match length, 2-bit literal length
    [011_LL_MMM] - last offset, 3-bit match length, 2-bit literal length
    First, 24-bit offsets are truly meaningless in 8-bit world. Hence, tokens 010llmmm are not used at all for my data.
    Second, 16-bit offsets are a little too long too, so I am sure there is some win in re-purposeing some of these bits for something else.
    Third, the short offsets are little bit too long as the experience from other Z80 packers suggests.
    Fourth, there is no dependence on the minimum match length depending on the offset length. This is inefficient, as e.g. suggested by work by jibs.
    Fifth, the "free" code 010llmmm can be used, for example, as the offset before last. Not quite LZMA, but still an opportunity to represent more complex structures in data.
    Plenty of possibilities, just not enough time in a day...


    Quote Originally Posted by xcrh View Post
    As of crush... while it can't do wonders, using smaller W_BITS in bcrush (should match in both encoder and decoder) could help for "small" data in terms of ratio.
    This type of things have been successfully attempted on Spectrum. I mentioned ZX7 several times before. Also on the same diagram is something called Pletter 5c. It is, well, not the same, but almost the same thing as if someone took LZ7 with 7 possible offset codes, tried each of them with an optimal parser and made a note, as part of the compressed data file, of the one that worked best. Charles Bloom calls this "parameter optimization": http://cbloomrants.blogspot.com/2012...ors-leave.html
    Of course, he did not invent it

    Quote Originally Posted by xcrh View Post
    As for filling gap... I see, there is large gap, and I can imagine it caused by difference between being "byte aligned" and "bit aligned" and resulting change in decoding complexity.
    No, it is only there for historical reasons. Byte packers were the thing in the early 1990s. Then everyone switched to complex LZ77/LZSS-style compressors and the byte packers were mostly forgotten. Of course, my intuition at the start of the project was that the transition from byte packers to bit packers on PC is very smooth, one technology gradually overtakes the other as the compression ratio grows. Hence, I assumed to see the same on Z80. This is what my current preliminary results look like:
    Click image for larger version. 

Name:	Clipboard02.png 
Views:	126 
Size:	29.5 KB 
ID:	6288
    (dashed line is the old Pareto frontier, solid line is the new; decompressors on the new Pareto frontier for aPLib, Pletter5d, MegaLZ, LZ5, LZF and LZ4 are mine). In my mind this is already plenty enough evidence that fast decompression has been mostly overlooked thus far and that the gap is only there because people did not attempt to revive byte-based compression on Z80.

  22. #18
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by introspec View Post
    Yes, I already downloaded all versions. I will go through the parser settings; however, I believe it will be more important to re-define the file format.

    The existing format looks like this:
    [1_OO_LL_MMM] [OOOOOOOO] - 10-bit offset, 3-bit match length, 2-bit literal length
    [00_LLL_MMM] [OOOOOOOO] [OOOOOOOO] - 16-bit offset, 3-bit match length, 3-bit literal length
    [010_LL_MMM] [OOOOOOOO] [OOOOOOOO] [OOOOOOOO] - 24-bit offset, 3-bit match length, 2-bit literal length
    [011_LL_MMM] - last offset, 3-bit match length, 2-bit literal length
    First, 24-bit offsets are truly meaningless in 8-bit world. Hence, tokens 010llmmm are not used at all for my data.
    Gipfeli's command encoding has two types of commands: emit literals and LZ77 copy. (((now it is easy for me to say that one command that does both emit-literals and lz77-copy would have been better, but I had not yet figured it out in early 2011)))

    Emit literals has 00 in the command code stream. Six bits after 00 indicates how many literals to emit (with some more complications for longer literal sequences). Literals have 6, 8 or 10 bit coding in the literal stream.

    LZ77 copy has six variations (010, 011, 100, 101, 110, 111 in the command code stream) for bits for copy length and bits for distance: (length bits, distance bits) = one of (2, 10), (2, 13), (2, 16), (3, 10), (3, 16), (6, 16).

  23. #19
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Gipfeli's command encoding has two types of commands: emit literals and LZ77 copy. (((now it is easy for me to say that one command that does both emit-literals and lz77-copy would have been better, but I had not yet figured it out in early 2011)))
    Sorry, I meant to reply to you too. Thanks for the details; I'll definitely have a look at Gipfeli too, to see if it can make it for the Pareto frontier potentially.

    I am not actually sure if the idea to combine literals with match-copying is all that great for compression ratios. It seems to work fairly well for byte compressors and is, possibly, one of the main reasons behind the high speed of LZ4. However, none of the better compressing LZSS-style schemes use it. There is always a potential that this is because people did not try something like this, but my guess is that it is simply not paying off for bit-based coding schemes.

  24. #20
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Yes, I already downloaded all versions. I will go through the parser settings;
    I gave a quick shot to "stock" LZ5 1.5 (somewhat beyond 1.5 point, I built version at 645655ffba8ae1eebf8d7cf0b4f16cb0610a9da2 - one of latest commits I know working with all goodies, before it started to turn into Lizard), and I can confirm something strange happened to high levels. It really compresses worse than it meant to, underestimating what LZ5 can do considerably.

    For example level like this:
    { MAXD_LOG, MAXD_LOG+1, 28, 24, 1<<14, 4, 1<<14, 1, LZ5HC_optimal_price }, // level 15
    ...beats stock level 15 (yes, it meant to have 15 levels) to dust, making things normal again. That table is in file lz5hc_common.h, just in case. These settings are slow, esp on large redundant data, beware. Seems LZ5HC_optimal_price works best generally.

    however, I believe it will be more important to re-define the file format.
    Makes a lot of sense. Initially LZ5 apparently targets larger datasets, 24 bit imply this. Unfortunately Z80 does not benefits from it. This makes me to wonder - I guess you've seen https://cbloomrants.blogspot.com/201...-12-lznib.html and https://cbloomrants.blogspot.com/201...ge-window.html right? These were rather fancy explorations, aren't they? While you don't want "large window" part of that, article name is deceptive, Bloom also tests tricks on stream encoding that could be good for ratio. Bloom concludes neither encoding is a big win - and its true on his mixed data set. But if we would specifically look for small data, 3-3-2-B MML3-4 or so seems to be obvious winner (unless his small-sized data also all have strange bias). And then LZNib approach seems to target somewhat higher ratios (at some cost of speed?)

    Fifth, the "free" code 010llmmm can be used, for example, as the offset before last. Not quite LZMA, but still an opportunity to represent more complex structures in data.
    I guess it can be good for ratio. If you manage to do so and retain more or less optimal parsing it would be interesting to see what it can do. Because LZ5 isn't really bad in terms of ratio already. Especially fot byte-aligned thing. And then if one is crazy about ratio, prefilters aren't bad idea either.

    Plenty of possibilities, just not enough time in a day...
    Yeah... and that's what makes me sometimes ok with "good enough" kind of thing or being happy with "low hanging fruit". But it looks so cool when ppl trying to improve things further. Your new pareto frontier appearance is a very good example .

    No, it is only there for historical reasons.
    I can remember, Yan told LZ4 only uses its stream for historic reasons, it appeared when Yan just learned things and did like 1st thing he imagined, later he just refused to change stream to avoid breaking esisting uses. He wrote if he would have to reinvent it later he would do it the other way. So it seems to be somewhat true - there is room for improvement any day. Bit-based things have inherent advantage: they do not have to bother self with aligning on byte boundaries, eventually either giving up on ratio to keep alignment or going for complicated encoding rules (at which point it can start looking a bit like entropy coding, slower to decode & larger decoder). Say, Bloom told about "carryover" tricks and using "ranges" rather than "bits" - it good for ratio. But it can get rather complicated.

    It is, well, not the same, but almost the same thing as if someone took LZ7 with 7 possible offset codes, tried each of them with an optimal parser and made a note, as part of the compressed data file, of the one that worked best
    Interesting. Somehow I got idea slightly similar approach could be "cheap-n-dirty" way to improve "optimal" crush results further for virtually no efforts, its enough to unconstify slots & window bits and signal that to decoder - and encoder can choose what works best for particular scenario, possibly trying several combinations in process

  25. The Following User Says Thank You to xcrh For This Useful Post:

    introspec (18th November 2018)

  26. #21
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    rcm is lz based like LZ4

  27. #22
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by xcrh View Post
    I gave a quick shot to "stock" LZ5 1.5 (somewhat beyond 1.5 point, I built version at 645655ffba8ae1eebf8d7cf0b4f16cb0610a9da2 - one of latest commits I know working with all goodies, before it started to turn into Lizard), and I can confirm something strange happened to high levels. It really compresses worse than it meant to, underestimating what LZ5 can do considerably.

    For example level like this:
    { MAXD_LOG, MAXD_LOG+1, 28, 24, 1<<14, 4, 1<<14, 1, LZ5HC_optimal_price }, // level 15
    ...beats stock level 15 (yes, it meant to have 15 levels) to dust, making things normal again. That table is in file lz5hc_common.h, just in case. These settings are slow, esp on large redundant data, beware. Seems LZ5HC_optimal_price works best generally.
    Sorry about the delay, was very busy at work. I compiled ver.1.5.0 sources and can confirm that your set of settings works better. Nevertheless, ver.1.5.0 with your settings is still not compressing quite as well as ver.1.4.1. I am still in the process of trying to understand what is causing it (ver.1.4.1 is not compatible with VS, but I use VS2010 so I cannot compare the versions completely easily). Briefly, the parser did not change much at all, but the match finder did undergo some changes, so it is probably the most likely culprit. I hope to report more when I have more to report.

    BTW, it is not surprising that LZ5HC_optimal_price works better than LZ5HC_optimal_price_bt. As far as I can see, they use the same parser, but LZ5HC_optimal_price_bt is not attempting to generate all matches, i.e. the version that is not optimal anymore.

  28. #23
    Member
    Join Date
    May 2008
    Location
    USA
    Posts
    42
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by introspec View Post
    This thread is mostly about me working to fill the gap between LZ4 and ZX7
    Over in the 8086 world, I came to the conclusion that it wasn't necessary to try to fill the gap -- most of my compression experiments need either fastest decompression (LZ4) or "reasonable compression at a fast decompression rate" (ZX7) and I didn't want to fall into the rabbit hole of trying to improve the compression ratio or speed of either one. In fact, to round out the effort, I created both a small and a fast (unrolled, uses REP STOS for runs) 8086 version of both LZ4 and ZX7. (To give credit where due, Peter Ferrie wrote the size-optimized portion of ZX7_8086.ASM.)

    While the 8086 isn't quite as slow as the ZX Spectrum, the fast decompressors I wrote have a worst case of 3x memcpy for LZ4 and 9x memcpy for ZX7, with LZ4 actually able to beat memcpy if there are a lot of runs (this is due to the fact that REP STOSW is faster that REP MOVSW on 808x CPUs). Even at 9x worst case, ZX7 was more than fast enough to be completely unnoticable when loading+decompressing off of floppy disk, as used in the recent kickstarter game Planet X3.

    I just looked at Lizard, which seems to be probably the only thing that could beat LZ4 in terms of both size and decomp speed, but I'm not interested in it for 8086 because it requires keeping track of 4 streams, and the 8086 only has two combinations of two index registers you can use at the same time, so it would be difficult to implement at top speed (my LZ4 decompressor uses only registers for all variables, which is another reason it is fast). Also, it wasn't clear from reading the block file format if it supported runs, which are important on my platform for speed reasons as previously mentioned.

    I guess my point is: I don't see the need to try to find a middle ground between LZ4 and ZX7. If you still think there is, I'd be very curious to know what situation you're in where you feel a middle ground is the most appropriate solution.

  29. #24
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by Trixter View Post
    I guess my point is: I don't see the need to try to find a middle ground between LZ4 and ZX7. If you still think there is, I'd be very curious to know what situation you're in where you feel a middle ground is the most appropriate solution.
    Good to see you here, Trixter

    My argument goes like this:
    1) A lot of my thinking stems from the review of compression technologies for Z80 that I did in the summer of 2017. Unfortunately, I wrote it up in Russian, but you can take your chances with Google-translate if you wish: http://hype.retroscene.org/blog/dev/740.html The key take away point from that review from my perspective is the diagram indicating the approximate Pareto frontier for compression ratio/decompression speed on Z80. The diagram is also included in the first post of this thread.

    2) The diagram shows that LZ4 is a factor of 1.5 from the "standard" copying on Z80. This is pretty close to maximum one can hope for. There are things that can be done to increase speed even further (long story, some other time), but fundamentally, I am happy with decompression speed offered by LZ4. LZ4 also comes with very good infrastructure (e.g. the wide availability of optimal compression that was completed on this forum last autumn, the possibility to have dictionaries for batches of similar files, availability of everything in source codes, etc etc). This is my default compressor for real-time decompression (e.g. I used it in a complex effect, decompressing 0.4K each frame, to reduce memory footprint for pre-calculated data).

    At the same time, most of my coding is for ZX Spectrum and memory tends to be fairly tight even for 128K demos. Thus, I was interested to know if one can compress even slightly better with comparable speed. The priority for this is not very high at present for me, but my experiments with LZF have been fairly positive and I have a somewhat crazy decompressor for LZF that can compete with LZ4 at speed while reliably beating its ratio.

    3) The real weakest link in this story is ZX7. The Pareto frontier showed quite clearly, that ZX7 decompression is too slow for the ratio. Basically, even in 2017 you could have better compression ratio that would be decompressed faster. Given that I used in the tests my own highly tuned decompressor for ZX7, I knew that the technology must change. So in 2017 I went for Pletter (slightly faster to decompress, slightly better ratio vs ZX7). More recently, after I did some more research, see the more recent Pareto diagram here: https://encode.ru/threads/3001-State...ll=1#post58633
    I established that at least for Z80 you can get noticeably better decompression speed from MegaLZ, while having better ratio than ZX7. This is my current default compressor for demo kernel.

    4) However, the fact that ZX7 was usually fine for me in terms of compression ratio was still eating at me. The Pareto frontier clearly suggested that there must be something there, 1.5 times faster with similar ratio. It was obvious that ZX7 is too intensive in its use of the bit stream. There must be a way to be more byte-based and be just as good at compressing. This is what I was really hunting for. And I believe I actually found it: LZ5 1.4.1 has ALMOST the same compression ratio as ZX7, and I can decompress it on Z80 about 1.5 times faster than MegaLZ (check out the recent Pareto diagram linked just above). Unfortunately, Lizard went for another kind of compromize, and is not nearly as useful on Z80.

    This is not a complete story yet. LZ5 1.4.1 has been optimized for compression. Already LZ5 1.5 is not capable of the same compression ratio. The compression format is also very impractical for Z80-targeted data: the short offsets are too large; the long offsets are actually not used at all. Basically, LZ5 1.4.1 needs to be re-formulated and re-tuned for small files, typical to 8-bitters. I do not know when I will have time for this, but believe me, when this work will be done, ZX7 will be simply blown away. Byte-based compression can come back to 8-bit computers in a big way.

Similar Threads

  1. Replies: 3
    Last Post: 8th April 2018, 01:12
  2. State-of-the-art recompressor for DXT and raw graphics.
    By Chirantan in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 24th January 2018, 15:33
  3. State of the art for texture compression ?
    By boxerab in forum Data Compression
    Replies: 7
    Last Post: 9th September 2017, 16:47
  4. Sac: (State-of-the-Art) Lossless Audio Compression
    By Sebastian in forum Data Compression
    Replies: 38
    Last Post: 25th April 2016, 16:01
  5. Compression state-of-art discussion with C.Bloom
    By Shelwien in forum Data Compression
    Replies: 7
    Last Post: 27th October 2011, 06:18

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •