Results 1 to 25 of 25

Thread: blz4 and bcrush

  1. #1
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts

    blz4 and bcrush

    These are the little experiments I wrote using ssparse and leparse from BriefLZ to produce LZ4 and CRUSH compatible compressed files. They were mentioned in the BriefLZ 1.2.0 thread which has sadly disappeared.

    Here are some results for the Silesia compression corpus:

    Code:
    | File    |   Original | `blz4 --optimal` | `lz4 -12 -l` |  `lz4x -9` |
    | :------ | ---------: | ---------------: | -----------: | ---------: |
    | dickens | 10.192.446 |        4.380.430 |    4.380.430 |  4.380.436 |
    | mozilla | 51.220.480 |       22.025.940 |   22.025.988 | 22.025.952 |
    | mr      |  9.970.564 |        4.190.675 |    4.190.774 |  4.190.681 |
    | nci     | 33.553.445 |        3.621.482 |    3.621.567 |  3.621.495 |
    | ooffice |  6.152.192 |        3.535.258 |    3.535.258 |  3.535.261 |
    | osdb    | 10.085.684 |        3.951.474 |    3.951.474 |  3.951.478 |
    | reymont |  6.627.202 |        2.063.060 |    2.063.060 |  2.063.060 |
    | samba   | 21.606.400 |        6.100.521 |    6.100.539 |  6.100.549 |
    | sao     |  7.251.944 |        5.668.742 |    5.668.742 |  5.668.742 |
    | webster | 41.458.703 |       13.835.336 |   13.835.336 | 13.835.347 |
    | xml     |  5.345.280 |          759.868 |      759.901 |    759.871 |
    | x-ray   |  8.474.240 |        7.177.203 |    7.177.203 |  7.177.203 |
    blz4 optimal saves only a few bytes at best, so it is hardly worth the huge time requirement. I am still not entirely convinced it is in fact bit-optimal, there was some discussion in the BriefLZ thread about this.

    Output is in the legacy frame format (8MiB block size, no dependency between blocks), and it should be compatible with the parsing restrictions described at https://github.com/lz4/lz4/blob/dev/...lock_format.md

    Code:
    | File    |   Original | `bcrush --optimal` | `crush cx` | `crushx -9` |
    | :------ | ---------: | -----------------: | ---------: | ----------: |
    | dickens | 10.192.446 |          3.148.963 |  3.350.093 |   3.343.930 |
    | mozilla | 51.220.480 |         18.037.611 | 18.760.573 |  18.281.301 |
    | mr      |  9.970.564 |          3.367.533 |  3.532.785 |   3.428.968 |
    | nci     | 33.553.445 |          2.407.286 |  2.624.037 |   2.750.658 |
    | ooffice |  6.152.192 |          2.832.224 |  2.958.518 |   2.871.884 |
    | osdb    | 10.085.684 |          3.424.687 |  3.545.632 |   3.457.335 |
    | reymont |  6.627.202 |          1.523.547 |  1.644.701 |   1.610.306 |
    | samba   | 21.606.400 |          4.720.964 |  4.912.141 |   4.911.613 |
    | sao     |  7.251.944 |          5.344.713 |  5.472.035 |   5.368.466 |
    | webster | 41.458.703 |          9.766.251 | 10.430.228 |  10.322.130 |
    | xml     |  5.345.280 |          5.717.405 |  5.958.603 |   5.747.141 |
    | x-ray   |  8.474.240 |            535.316 |    563.744 |     561.118 |
    Source code available here:

    https://github.com/jibsen/blz4
    https://github.com/jibsen/bcrush
    Attached Files Attached Files
    Last edited by jibz; 7th November 2018 at 11:27. Reason: Add note about blz4 output format

  2. The Following 5 Users Say Thank You to jibz For This Useful Post:

    Cyan (7th November 2018),encode (7th November 2018),introspec (7th November 2018),xcrh (7th November 2018),xezz (10th November 2018)

  3. #2
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    I found a cached copy of the BriefLZ thread including my last reply from yesterday, I'm going to repeat it here since it was about blz4:

    Quote Originally Posted by Cyan
    Quick question @jibz :
    How do you get 2,798,592 bytes on the canterbury corpus ?
    Oh, it appears the canterbury.tar on this machine has unix line endings for some reason. Here are the results for cantrbry.tar available from http://corpus.canterbury.ac.nz/

    Code:
    original         2,821,120
    lz4x -9            930,661 (legacy)
    smallz4 -9         930,636 (incl. 3 extra bytes for new format)
    lz4 -12 -l         930,634 (legacy)
    blz4 --optimal     930,627 (legacy, optimality under discussion)
    Quote Originally Posted by introspec
    Charles Bloom described an optimizing parse for LZ4 here: https://cbloomrants.blogspot.com/201...mal-parse.html
    He did not believe it to be optimal; however, when he later returned to this parse, he convinced himself that his new parse is optimal: https://cbloomrants.blogspot.com/201...nclusions.html (see esp. discussion in the comments section). I am not fully following their argument, it may be partly because I do not have a good intuitive understanding of LZSS-style backward parse. Would you agree that the parse they discuss in the comments is truly bit optimal? Are you aware of a counter-example showing that their parse may be wrong?
    Thanks for the link, that was very interesting! Reading the comments there, it seems my solution is similar to that of Wylie.

    For each literal I store the number of literals up to the next match, and use that to set the cost of the literal to the cost of the next match encoding this run of literals. In order to avoid long runs of literals preceding a match suddenly jumping up in cost, I select the match whenever the choice is between a literal and a match of the same cost.

    I have done a lot of testing, and had some files where the results with one of the others was better than for blz4, but rerunning the tests I can't find any file where this is the case -- perhaps it was an older version of blz4. If it turns out this is optimal after all, that's great.

    My argument for optimality would go something like this: The thing that could cause it to be sub-optimal is if we have the choice between a literal and a match and the literals preceding this position end up changing the cost of the match (if we choose that), or the next match (if we choose the literal) in a way that means we made the wrong choice. At worst, this could cost us one byte in the literal run length encoding, so if the difference in cost between the literal and the match is non-zero, then taking the lowest cost option is best (it saves us at least one byte and might cost us one, as opposed to definitely loosing one byte and perhaps one more).

    So the question is what to do if the literal and the match have the same cost. We note that if the next match is 4 or more positions ahead, then the match is at least one byte shorter than the literals, so we only have the same cost if the next match is 1, 2, or 3 literals ahead. Choosing the match means more preceding literals can be encoded without extra cost (14 as opposed to 13, 12, or 11). The only way choosing the literal could be better, given they have the same cost, is if the next match already had 15 literals in front of it, so it could absorb more preceding literals than the match without increasing in size, but since we only have the same cost if is it no more than 3 ahead, this is not the case. So choosing the match on even cost is the best choice.

    I still have a feeling there might be something I have overlooked though.

    I plan to clean up blz4 a little and put it on GitHub so you can test it if you like.

    But as Lucas points out, optimal parsing trying all possible match positions is not practical for larger files. Since LZ4 has fixed size encoding of the match distance, you only need the longest match and not the closest one for each length, so blz4 is actually doing way too much work. It should be a lot faster to replace the match finding with a suffix array or tree to get the longest match.

    I think ssparse might be O(N^2). The thing that could save it from being O(N^3) is that we are checking matches in order from closest and back, and the distance encoding is non-decreasing, so a match further away can only be better if it is longer. This means if we have already found a match of length m, then we can check if the byte m ahead matches before we start checking the rest of the match. Together with the fact that matches are allowed to overlap the current position, this means for the case of a file of all zeroes you get: at position size - k there is a match with offset 1 length k, it checks the size - k - 2 preceding positions, but the check for the next byte at size - k + k fails for each because it cannot extend past the end, thus each position takes size time.

    There could be some cleverly constructed worst-case input I haven't thought of yet though.

  4. The Following User Says Thank You to jibz For This Useful Post:

    introspec (7th November 2018)

  5. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The unreleased LZ4X v1.50 compresses this cantrbry.tar down to 930,627 bytes. And compressed sizes of ENWIK9 are equal as well!

  6. The Following User Says Thank You to encode For This Useful Post:

    jibz (7th November 2018)

  7. #4
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Many thanks for releasing these programs; I'll run them on my data asap (I'll just need to re-write my utility for stripping raw LZ4 streams, so that I can compare like for like).

    Quote Originally Posted by jibz View Post
    I found a cached copy of the BriefLZ thread including my last reply from yesterday.
    Can you possibly share this cached copy with me? I still have not processed some of the things that were discussed there, which is a shame. My email is zxintrospec@gmail.com.

  8. #5
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    Quote Originally Posted by introspec View Post
    Can you possibly share this cached copy with me? I still have not processed some of the things that were discussed there, which is a shame.
    Yes, I found the discussion very interesting. I have attached a markdown backup of the thread, hope that's okay.
    Attached Files Attached Files

  9. The Following User Says Thank You to jibz For This Useful Post:

    introspec (8th November 2018)

  10. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    BTW, I have no clue how the BriefLZ thread could dissappear?!

  11. #7
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Yesterday morning all posts after something like 25th October simply disappeared. So it is not specific to BriefLZ. It feels like the forum just went to a week-old backup.

  12. #8
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    I thnk I've been sleeping and had a fancy dream when there was rather fancy chat about compression, including the kinds of compression I like. Though now I woke up and it seems it gone... (uhm, there was announce of new version of BriefLZ, or was it just dream?). But alright, now there is something new. And it rather cool either. Somehow it seems it based on similar techs, but now also goes for crush bitstream. Needless to say it sounds like good idea granted crush has been not too bad already.

    I've quickly ran these on some data I eventually use to benchmark things and I have to tell doing crush bitstream on same tech like new brieflz seems to be good idea.
    Most interesting comparison was compressing some ARM Linux kernel I use as relatively known benchmark where I know what to expect. On its own it would be 13926620 bytes, fairly compressible mix of code and data, large sections of zeros and so on. To put some borders, LZ4X can compress it to 6600257 bytes and LZOMA -9 goes as low as 4941192. Most LZs without entropy coding (at least bit based or dictionary larger than LZ4) are supposed to fit somewhere in between I guess.

    - "Stock" crush -2 goes down to 5563920 bytes.
    - Xezz's version with -9 goes for 5439442 - it isn't most "optimal" one can get, but still "optimized" compared to "stock" and wins noticeably.
    - bcrush -9 goes below of that for 5382470 - and gets there in reasonable time. An improvement, any day!
    - bcrush -x (optimal) goes down to 5360771 - squeezes even a bit more. But it took about 5 hours to complete (!!!).
    - blz4 -9 does not beats LZ4X, being just 6602039 but at least takes reasonable time to complete.
    - blz4 -x ... haven't completed yet
    - new brieflz -9 goes for shy 6056712, one can get within this size using byte-level LZ5 1.x that is faster to decompress. Slightly more with Lizard -29, but comparable.
    - new brieflz -x (optimal) ... haven't completed yet . I wouldn't expect huge win.

    I've tried on another data as well, and overall idea stays. Usually blz4 -x can win few more bytes vs other LZ4 things, -9 would normally lose to LZ4X. So blz4 only seems useful to set high-score on optimal level, if one can live with its speed (fine on small data, some hours on something like linux kernel). Then, bcrush does way better than stock or even xezz versions. So both bcrush -9 and -x make sense for "precompressed" data in crush bitstream (the later is being really slow though, way below of e.g. lzoma, while not beating lzoma's ratio ofc).

    So I guess new brieflz has been best to the date as ... new crush . Seems changing bitstream gives a significant gain while decompression speed/complexity stays more or less in same ballpark.

    p.s. and if someone is too lazy to install stupid python, meson, ninja and so on, just like me, both things fortunately could be built as mere "gcc -O3 *.c -o bcrush" oneliner
    Last edited by xcrh; 7th November 2018 at 23:00. Reason: Whoops, wrong number, brieflz got underrated by mistake

  13. The Following User Says Thank You to xcrh For This Useful Post:

    jibz (8th November 2018)

  14. #9
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by encode View Post
    BTW, I have no clue how the BriefLZ thread could dissappear?!
    Is it runs on your hosting? Or it some "shared", etc managed by someone else? It looks like if database got reverted to a ~week old state or so, losing all changes since this point (including all new posts, etc). As example I've clarified what I consider "pure LZ" somewhere around LZOMA discussion where someone claimed it isn't LZ. But now it all gone and I don't have any related posts in my profile. Though I've been curious to see if ppl agree or disagree this particular view.

  15. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The forum is managed by webmaster. I just informed him, so let's wait and see.

  16. #11
    Member
    Join Date
    Mar 2016
    Location
    Croatia
    Posts
    181
    Thanks
    74
    Thanked 10 Times in 10 Posts
    well, even shared web hosting these days has every day backup to external location...i am saying this from personal experience.

  17. #12
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Some little extras:
    - blz4 -x ... haven't completed yet <- that one failed to complete in 12 hours... either it hanged or way too slow, it isn't trivial to track actual progress when it takes so long, so I finally gave up and killed the process.
    - new brieflz -x (optimal) ... haven't completed yet . I wouldn't expect huge win. <- that one made it to 6029426... some improvement over -9, but compression took about 5 hours and ratio isn't terribly exciting (rather close to LZO/LZ5 1.x or so).

    So I think out of all these bcrush is most interesting thing around. Well, different ppl may or may not share this view, esp introspec who seems to have rather unusual data to chew on.

  18. #13
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    I did some testing of both of these programs on my data. The new bcrush --optimal seems to provide 1.6% improvement in compression ratio compared to the standard crush 1.00 cx. The alternative compressor crush 1.1 -9 mentioned by xcrh is clearly not optimal, as it is trailing about 0.5% behind bcrush --optimal. So, for situations where every ounce of compression must be squeezed out of crush file format, bcrush is clearly the tool to use.

    Testing bLZ4 --optimal was more difficult, because of the differences between frame formats used by different LZ4 compressors (smallz4 is an important competitor, but it does not support legacy frames like most other compressors). Thus, I was forced to program a small tool for measuring precisely raw size of the compressed data in an .lz4 file. Using this tool I have got the following results:
    Code:
                                      total size    raw size
    
    calgary.tar (uncompressed)        3,265,536
    
    lz4.exe -12 -l (ver.1.8.0)        1,231,229    1,231,221
    lz4x.exe -9 (ver.1.3)             1,231,214    1,231,206
    smallz4.exe -9 (ver.1.0)          1,231,215    1,231,200
    
    lz4.exe -12 -l (ver.1.8.2)        1,231,205    1,231,197
    smallz4.exe -9 (ver.1.2)          1,231,207    1,231,192
    blz4.exe --optimal (ver.0.1.0)    1,231,198    1,231,190
    
    
    cantrbry.tar (uncompressed)       2,821,120
    
    lz4.exe -12 -l (ver.1.8.0)          930,676      930,668
    lz4x.exe -9 (ver.1.3)               930,661      930,653
    
    lz4.exe -12 -l (ver.1.8.2)          930,634      930,626
    smallz4.exe -9 (ver.1.0)            930,636      930,621
    smallz4.exe -9 (ver.1.2)            930,636      930,621
    blz4.exe --optimal (ver.0.1.0)      930,627      930,619
    My interpretation of these results is as follows:
    • lz4.exe -12 (ver.1.8.0) is clearly not optimal;
    • lz4x.exe -9 (ver.1.3) is also not optimal;
    • smallz4.exe -9 (ver.1.0) is definitely not optimal;
    • lz4.exe -12 (ver.1.8.2) seems to produce output that is 5 bytes longer than the output of smallz4.exe -9 (ver.1.2) that is 2 bytes longer than the output of blz4.exe --optimal. I'll have to look at these files more closely, but assuming I did not make any mistakes, these apparently systematic differences could have something to do with the end-of-the-frame rules of the LZ4 data format.

  19. The Following User Says Thank You to introspec For This Useful Post:

    jibz (9th November 2018)

  20. #14
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by introspec View Post
    lz4.exe -12 (ver.1.8.2) seems to produce output that is 5 bytes longer than the output of smallz4.exe -9 (ver.1.2) that is 2 bytes longer than the output of blz4.exe --optimal. I'll have to look at these files more closely, but assuming I did not make any mistakes, these apparently systematic differences could have something to do with the end-of-the-frame rules of the LZ4 data format.
    No, my guess was wrong. It is just a coincidence. Here are some more tests on files from the Silesia corpus:
    Code:
               lz4.exe -12 (ver.1.8.2)    smallz4.exe -9 (ver.1.2)    blz4 --optimal (ver.0.1.0)
    
    ooffice         [3,535,250]                   3,535,253                  [3,535,250]
    reymont         [2,063,052]                   2,063,057                  [2,063,052]
    sao             [5,668,734]                   5,668,738                  [5,668,734]
    x-ray            7,177,191                   [7,172,982]                  7,177,191
    xml                759,893                      759,870                    [759,860]
    (raw compressed size is indicated, not the actual file size. I highlighted the best results for each file). I think it is safe to say that none of the three best compressors are actually optimal.

  21. #15
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    I believe x-ray is about 80k larger than 8MiB, so smallz4 has the advantage of being able to find matches across the 8MiB boundary, while lz4 (assuming -l) and blz4 do not. That likely accounts for the comparatively large difference on that file.

    You can use lz4 -12 -BD to get similar results using the new format with lz4. blz4 has no support for the new format at the moment, so you will have to resort to files below 8MiB to directly compare I am afraid.

  22. #16
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    Quote Originally Posted by encode View Post
    The unreleased LZ4X v1.50 compresses this cantrbry.tar down to 930,627 bytes. And compressed sizes of ENWIK9 are equal as well!
    Thanks, I take that as a good sign.

    xcrh and introspec, thank you very much for testing and the interesting results you're posting, it's great!

    The thing I find most interesting about (hopefully) bit optimal results is that it says something about the relative strength of the compressed data format alone. When you are comparing the stock compressors you are really comparing both the format and the algorithm, which makes it harder to be conclusive -- what if a new version came out with a better compression algorithm, perhaps it would change the result.

    But the optimal parsers explore the limits of how good the algorithm can do -- we can see that no matter how much money/time/machines we throw at BriefLZ 1.x it will not beat the compression ratio of CRUSH on this type of file.

    Of course they are so slow that they are not practical for large files. There might be some niches where they are useful though, like compression of binary blobs that go into embedded hardware and similar, where it might not matter you spend a week compressing the final image if that means your device has room for one more useful function.

    LZOMA is very interesting. In a way it is in a different class of algorithms than BriefLZ/LZ4/CRUSH -- if you think about it, the repeat offset is really a simple type of entropy encoding. For instance Huffman coding assigns shorter codes to symbols that appear more often, in the same way the repeat offset coding assigns a shorter code to an offset that has occurred recently and is more likely to occur again. The fascinating thing to me is that it appears to have the potential to improve compression ratio significantly while not affecting decompression speed and footprint much.

    Sadly, I think this also means there is no known efficient way to achieve bit optimal results for it, since the repeat offset introduces a dependency between past and future matches.

  23. The Following User Says Thank You to jibz For This Useful Post:

    introspec (11th November 2018)

  24. #17
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    I guess "relative strength of compressed data format alone" varies considerably, depending on nature of data and other factors like typical amount of data, to point things could reverse depending on scenario. So I guess it fair to point it for some "typical" use cases or so. So speacialized task specific format/preprocessing could score win eventually. Just like these "slowly changed samples" thing (or was this topic also lost?).

    Trivial example of reversal: while bcrush tends to perform better vs brieflz "in average", let's take 256K zeros. Now give it to brieflz (even old version would do, it able to make it "optimal" very fast). And then to bcrush... or any other crush I'm aware of. Even LZ4X wins over all crushes here, despite of LZ4 way to represent numbers plays really badly in this case (most of LZ4 stream in this case would consist of 0xFF for obvious reason). Brieflz scores win. However it very specific scenario of questionable practical importance ("zero eaters" of ECL pretend there could be some use). As side effect this little experiment pinpointed "troublesome" data for blz4: blz4 gets really stuck trying to compress this. I guess it explains why I failed to compress Linux kernel in 12 hours. Bcrush -x and blzpack -x are less affected by this - they get slow as well, but not THAT bad.

    As for embedded, tradeoffs could be wildly different in various areas and scales. Yet one thing remains: its good to have some extra headroom, just in case bugs/vulns emerge, it proves to impossible to live without some feature and so on. Failing to consider it leads to very painful experience. If it comes to situation where gaining few bytes matters, it likely would bring development to halt in many other places and would be very impractical. So most realistic things stick to "good enough", and +/- 1% isn't really meant to be deal breaker, getting short by this margin is very hazardous situation. Also secondary considerations could apply, be it increased startup time, licensing and so on. Though good compression never hurts and there're surely cases where data compressed once and compression time isn't very important.

    As for LZOMA... I guess it can be formulated like this: bitstreams seems to be about trying to represent bunch of numbers in a compact way that is also (hopefully) fast & light to decode. Or not so fast and light for heavyweights that are all about ratio at all costs (like LZMA). This makes things rather vague. Under this angle huffman can be viewed as way to write "more frequent" numbers using less bits and "less frequent" using more bits, at which point it could have something in common even with byte aligned things, not to mentioned mixed and bit-aligned, eventually trying to achieve similar thing. But there is difference: one can feed even raw bytes (not processed by LZ) into algos like huffman, and often get some compression, at which point it surely looks like 2 separate algos piped one into another. LZOMA is somewhat closer to this border than simpler things, but I have trouble imaginating how one could use this without LZ counterpart. Would it make any sense? At which point I guess it more like "1-phase LZ" rather than "2-phase pipe". As counter example Lizard stream seems to be heavily tailored to deal with 2nd phase, to degree stream is explicitly split to subblocks of different nature, I guess at least one of reasons is to make it easier to give this to "entropy coders". Interestingly I never seen "1-phase" LZs with bitstream like this.

    On side note, LZOMA packs 256K zeros down to 25 bytes (some of them are headers) - overall it seems while it isn't "bit perfect", its packer is "optimizing" and manages to get to some more or less reasonable "working point" where ratio starts looking appealing, yet decompression isn't terribly slow/complicated and does not needs large structures or complicated algos. So I'm curious if that thing is ok on Z80.

  25. #18
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by jibz View Post
    I believe x-ray is about 80k larger than 8MiB, so smallz4 has the advantage of being able to find matches across the 8MiB boundary, while lz4 (assuming -l) and blz4 do not. That likely accounts for the comparatively large difference on that file. You can use lz4 -12 -BD to get similar results using the new format with lz4. blz4 has no support for the new format at the moment, so you will have to resort to files below 8MiB to directly compare I am afraid.
    Yes, sorry, I should have thought of it first before submitting my results. In any case, I think with this utility of mine I am getting more consistent numbers in the tests:
    Code:
    "calgary.tar" (3,265,536 bytes)      file size    raw size (no of frames)
    
    lz4.exe -12 -l (ver.1.8.0)           1,231,229         1,231,221 (1)
    smallz4.exe -9 (ver.1.0)             1,231,215         1,231,200 (1)
    lz4.exe -12 -l (ver.1.8.2)           1,231,205         1,231,197 (1)
    smallz4.exe -9 (ver.1.2)             1,231,207         1,231,192 (1)
    blz4.exe --optimal (ver.0.1.0)       1,231,198       [ 1,231,190 (1) ]
    lz4x -9 (ver.1.49 beta)              1,231,198       [ 1,231,190 (1) ]
    
    
    "cantrbry.tar" (2,821,120 bytes)
    
    
    lz4.exe -12 -l (ver.1.8.0)             930,676           930,668 (1)
    lz4.exe -12 -l (ver.1.8.2)             930,634           930,626 (1)
    smallz4.exe -9 (ver.1.0)               930,636           930,621 (1)
    smallz4.exe -9 (ver.1.2)               930,636           930,621 (1)
    blz4.exe --optimal (ver.0.1.0)         930,627         [ 930,619 (1) ]
    lz4x -9 (ver.1.49 beta)                930,627         [ 930,619 (1) ]
    
    
    "ooffice" (6,152,192 bytes)
    
    
    smallz4.exe -9 (ver.1.0)             3,535,274         3,535,255 (2)
    lz4.exe -12 -BD (ver.1.8.2)          3,535,276         3,535,253 (2)
    smallz4.exe -9 (ver.1.2)             3,535,272         3,535,253 (2)
    lz4.exe -12 -l (ver.1.8.0)           3,535,259         3,535,251 (1)
    lz4.exe -12 -l (ver.1.8.2)           3,535,258       [ 3,535,250 (1) ]
    blz4.exe --optimal (ver.0.1.0)       3,535,258       [ 3,535,250 (1) ]
    lz4x -9 (ver.1.49 beta)              3,535,258       [ 3,535,250 (1) ]
    
    
    "xml" (5,345,280 bytes)
    
    
    lz4.exe -12 -l (ver.1.8.0)             760,126           760,118 (1)
    lz4.exe -12 -BD (ver.1.8.2)            759,917           759,894 (2)
    lz4.exe -12 -l (ver.1.8.2)             759,901           759,893 (1)
    smallz4.exe -9 (ver.1.0)               759,890           759,871 (2)
    smallz4.exe -9 (ver.1.2)               759,889           759,870 (2)
    blz4.exe --optimal (ver.0.1.0)         759,868         [ 759,860 (1) ]
    lz4x -9 (ver.1.49 beta)                759,868         [ 759,860 (1) ]
    P.S. Just in case it is useful to someone else, I am attaching to this post my Windows command line tool weighLZ4.exe, which prints out the number of frames plus the size of raw compressed data within .lz4 files (it should work for both normal and legacy file formats).
    Attached Files Attached Files
    Last edited by introspec; 13th November 2018 at 16:25. Reason: Added the command line tool for inspecting the raw compressed size of .lz4 files

  26. The Following User Says Thank You to introspec For This Useful Post:

    encode (13th November 2018)

  27. #19
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by jibz View Post
    the optimal parsers explore the limits of how good the algorithm can do -- we can see that no matter how much money/time/machines we throw at BriefLZ 1.x it will not beat the compression ratio of CRUSH on this type of file
    This is exactly the reason why I obsess over optimal parsers! Having an optimal or very nearly optimal parser allows me to fix at least one coordinate on the compression ratio / decompression speed diagram, which then allows me to estimate how big is the gap to Pareto frontier and assess whether I am likely to reach the speed needed to be Pareto-efficient.

    Quote Originally Posted by jibz View Post
    LZOMA is very interesting. In a way it is in a different class of algorithms than BriefLZ/LZ4/CRUSH -- if you think about it, the repeat offset is really a simple type of entropy encoding. For instance Huffman coding assigns shorter codes to symbols that appear more often, in the same way the repeat offset coding assigns a shorter code to an offset that has occurred recently and is more likely to occur again. The fascinating thing to me is that it appears to have the potential to improve compression ratio significantly while not affecting decompression speed and footprint much.
    I guess it is a matter of interpretation. Charles Bloom shows two interpretations: one of them is that repeat offset allows encoding a long match interrupted by a literal, another - is that it provides a mechanism for efficient encoding of periodic data. First interpretation is very much in style of LZ77 (some ZX Spectrum compressors used matches with gaps explicitly). The second is more like a form of entropy coding. Either way, this seems to be a rare form of entropy coding that can be (somewhat surprisingly for me) applied to byte packers. The discovery that LZ5 does it was extremely illuminating to me.

    Quote Originally Posted by jibz View Post
    Sadly, I think this also means there is no known efficient way to achieve bit optimal results for it, since the repeat offset introduces a dependency between past and future matches.
    I was trying to think about complexity of an optimal parse for this case. Would it not be O(lzwindow*N^2)?

  28. #20
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by xcrh View Post
    On side note, LZOMA packs 256K zeros down to 25 bytes (some of them are headers) - overall it seems while it isn't "bit perfect", its packer is "optimizing" and manages to get to some more or less reasonable "working point" where ratio starts looking appealing, yet decompression isn't terribly slow/complicated and does not needs large structures or complicated algos. So I'm curious if that thing is ok on Z80.
    I don't have time to read LZOMA properly at the moment (its getXXX macros are driving me crazy anyway). However, it is clearly not using an explicit Huffman code or anything similar. Thus, it is bound to be computationally easier that Exomizer, while better at compressing data. I am sure that Z80 is just as capable at bit tricks as any other contemporary. So for me it is definitely an interesting technology. The reason why it is not a high priority for me is because I mainly need reasonably fast decompression and it is clear that although LZOMA will faster than Exomizer, it won't be fast enough for my main area of interest.

  29. The Following User Says Thank You to introspec For This Useful Post:

    encode (15th December 2018)

  30. #21
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by introspec View Post
    Tfor me)The discovery that LZ5 does it was extremely illuminating to me.
    And overall bitstream encoding like that somewhat starts to remind reduced version of huffman, no? In sense it being like a variable length bit encoding of some kind (going for bits in bytes to some extent, despite being byte aligned). Yet once again, unlike huffman this does not makes much sense without LZ counterpart, being more of optimization than "dedicated" entropy coding phase.

    However, it is clearly not using an explicit Huffman code or anything similar.
    Sometimes you can even see raw literals (e.g strings) in stream, that does not happens with "real" huffman and other full fledged entropy coders I guess. It just nothing wrong to put bytes offset by some few bits into BIT stream, at which point things would be "unreadable".

    I am sure that Z80 is just as capable at bit tricks as any other contemporary.
    "Best" things I know were able to extract (or inject) bits by specified masks - so extracting or injecting bit field can take like 1 asm instruction :P. But it relatively exotic it seems.

    p.s. bcrush comes with little free lunch: even with -x "optimal" one can get "a bit more optimal" further - by adjusting window size to data size (should do the same in decompressor). So on Z80-like (or microcontroller-like) sizes bcrush looks a bit better with W_BITS 17, etc. On Linux kernel it seems it slightly better with 22. And so on.

  31. #22
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Somehow introspec made me curious about "small data" where things don't exceed 64K or so. It seems I found even more "free lunch" for "small data" with bcrush. While it wasn't advertised in source, it seems to be fine with W_BITS 16 as well (15 would break, just as >23 would, good thing it gets detected at compile time). This chops few extra bytes on small data. So I had some fun running that version on some files from Introspec's test set - and it is easy to guess the outcome. It got considerably more competitive in terms of ratio (hey, really unusual and interesting data btw). As glaring example: alterego.tzx on bcrush -x gone down from 16383 ("stock" bcrush -x) to 15083 (same thing, but W_BITS 16, decompression ok). I wonder how this ratio scores on introspec's radar.

    So I guess (b)crush could benefit from (rather trivial) heurustic: W_BITS is least possible thing that is >= data size (as long as it stays 16...23). Unfortunately it seems there is no "standard" way to transfer W_BITS to decoder and it meant to be compile-time const. This said... woudln't this bunch of constants look better in "crush.h" just once, rather than 3 copy spread across crush.c, depack.c and depack_file.c?

    Another pleasant property of bcrush is that it somehow virtually never crashed/misbehaved during all kinds of strange experiments so far. Either it works or reports decode error or assertion fails.

  32. #23
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    I think if you lower SLOT_BITS to 3, you can experiment with W_BITS down to 8. But I haven't tried it.

  33. The Following User Says Thank You to jibz For This Useful Post:

    xcrh (15th November 2018)

  34. #24
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Hmm.. interesting. That actually seems to work.

    It proven to be mixed bag: on its own, SLOT_BITS = 3 tends to hurt compression considerably on any data I tried. As the result, say (W_BITS/SLOT_BITS) 15/3 proven to be rather pointless, losing to (more straightforward) 16/4 on any data I tried. On other hand, 12/3 version manages to beat 16/4 by small margin - as long as data around 2..6K or so, ofc. On larger data larger window pays for itself, obviously.

    That makes me to think compression ratios could be cheap-n-easy improved for crush(-like) thing by making slot bits and window bits transferred to decoder and choosing appropriate combo for data, it seems it takes no changes to decompressor code, except these would no longer be "const". Maybe even ZX7-style approach trying few different sets and selecting best if one really crazy about ratios . For SLOT_BITS = 4 heuristic is rather simple, selecting smallest W_BITS that fits the bill does the trick. For SLOT_BITS = 3 it seems to be less straightforward, 3-bit version starts to perform reasonably at window of about 12 bit. Above it, SLOT_BITS=3 loses to e.g. 16/4, even if 16 bits for window would be too much.

  35. #25
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    46
    Thanks
    36
    Thanked 18 Times in 12 Posts
    Quote Originally Posted by xcrh View Post
    It proven to be mixed bag: on its own, SLOT_BITS = 3 tends to hurt compression considerably on any data I tried. As the result, say (W_BITS/SLOT_BITS) 15/3 proven to be rather pointless, losing to (more straightforward) 16/4 on any data I tried. On other hand, 12/3 version manages to beat 16/4 by small margin - as long as data around 2..6K or so, ofc. On larger data larger window pays for itself, obviously.
    Windows >64K are obviously inefficient for small file datasets or computers that cannot address more. Hence, any bits describing offsets beyond 64K were never used and the wins you described were trivial, simply because the redundant offset data was clearly redundant. On the other hand, once you move below 64K the situation gets more interesting. You lose some when you restrict offsets to a number below 64K, because you cannot represent longer matches. However, you also win some, because the matches you can represent are represented more efficiently. There is a sweet spot, of course, typical-data-dependent, where the two are in balance.

    This is why simply inventing a new file format for an LZSS-style engine is rarely competitive. The best compressors tends to have formats tuned on "representative on average" data, whatever that can mean. The majority of data compressors on ZX Spectrum have window sizes between 2Kb and 4Kb. Partly this is dues to the restrictions of the original hardware for native compression, but I am sure there is more to it. In my experiments with several compressors of this type, and also in your experiments, this tends to be close to the optimal windows size for typical data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •