Page 1 of 2 12 LastLast
Results 1 to 30 of 33

Thread: LZOMA

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts

    LZOMA

    I just came across a couple compression libraries I hadn't heard of, and AFAIK haven't been discussed here. One of them is LZOMA which, based on the README (and the name), takes elements from LZMA and LZO. It sounds like it targets the same space as LZHAM and Brotli, favoring fast decompression and a high ratio at the expense of compression speed, and is GPLv2 licensed.

    The README contains a brief description of the algorithm which I'll reproduce (in part) here:

    Compressed format has some features similar to both LZO and LZMA. Does not use range coding. Special bit added to matches that follow literals, indicating to re-use previous offset instead of always storing the offset for each match. This allows to more efficiently compress patterns like abcdEabcdFabcdGabc, as offset will be stored only for first match.

    This idea allows much higher compression than classical LZ algorithms but compressor is much more complicated.

  2. The Following 4 Users Say Thank You to nemequ For This Useful Post:

    alef (11th January 2016),Bulat Ziganshin (2nd September 2015),Stephan Busch (2nd September 2015),xcrh (2nd January 2016)

  3. #2
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    Did anyone managed to compile that?

  4. #3
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    This version uses divsufsort. and uses qsort for small block.
    Fixed bug for incompressible block.
    Attached Files Attached Files

  5. The Following 2 Users Say Thank You to xezz For This Useful Post:

    alef (11th January 2016),comp1 (30th December 2015)

  6. #4
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    Has anyone figured out how to compress with maximum ratio?

  7. #5
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by xezz View Post
    This version uses divsufsort. and uses qsort for small block.
    Fixed bug for incompressible block.
    You might want to consider sending changes back to the author (via github) instead of just posting them here.

    Also, you might want to consider using a more common archiver (sorry, Bulat). FreeArc is cool, but it's a pretty exotic format which a lot of people can't read. For example, AFAIK there are no RPMs for Fedora, and no support in libarchive (which means no support in file-roller, or other tools which use libarchive). I also don't see it in Debian/Ubuntu, FreeBSD ports, Arch, Gentoo, etc. Even on Windows and OS X FreeArc is generally not installed, and (AFAIK) none of the more standard archivers support FreeArc archives.
    Last edited by nemequ; 1st January 2016 at 11:25.

  8. The Following 2 Users Say Thank You to nemequ For This Useful Post:

    schnaader (8th January 2016),xcrh (2nd January 2016)

  9. #6
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    ok

  10. #7
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by Stephan Busch View Post
    Did anyone managed to compile that?
    I did. In fact there is simple makefile which gives idea how to do it. I've build it under 64-bit ubuntu Linux, using gcc 5.21 and -O3 optimizations. No silly -m32 has been used, because I do not even have 32-bit libs installed and bringing them for single program on the Earth is going to be utterly unwise. So it has been build as 64-bit code.

    Packer builds as is, unpacker from github code requires little change if you do not want to use (32-bit) assembly. Unpacker C source contains unpacker function, it only named a bit different. So it takes patching name and getting rid of "extern ". Then it builds perfectly. And it even works. These adventures were worth of it, because most unusual LZ in my whole life welcomed me.

    To use some some known reference I've compressed (previously decompressed) Linux kernel binary, after all this thing targets on similar uses and I've ran other benches on this file. This is ~22Mb ELF file, mostly x86-64 code, plenty of data & holes, fairly redundant overall.

    Some observations:
    1) This thing seems to try e8 transform, but it does not seems to be aware of x86-64 or so, so it gives no effect, so it performs as "pure" compressor with no preprocessing cheats. Ok, compression result counts even better!
    2) This is easily slowest LZ compresor on the Earth. Speed estimate is like 1-4 kb/second for kernel. It took me like a few hours to compress 22Mb kernel. Wow.
    3) It also took whole 1Gb RSS memory to do it. Whoa, that's an appetite to match.
    4) Compression ratio. Waiting has been worth of it. Compressed kernel size is between brotli 6 and brotli 7. But wait, brotli uses entropy coding?! This thing did it as almost-pure bit-oriented LZ data format if I understand it correctly. Sounds like an achievement for entropy-less LZ, right? Btw output is pretty redundant and extra entropy coding or 2nd pass can further shave quite some extra of output size.
    5) Decompression speed. Ok, it takes like 0.3 second to decrunch 22M kernel, decompression speed is really quite okay and more or less on par against UCL/zlib/brotli on this system.

    It looks like this:
    6141476 Jan 2 02:33 _x8664-linux-kernel.lzoma

    For the reference, lzbench of the same file with zlib 9, crush lvl 2 and brotli 6 & 7 would give:
    crush 1.0 level 2 0.25 MB/s 79 MB/s 7267494 31.88
    zlib 1.2.8 level 9 3.22 MB/s 85 MB/s 6781199 29.74
    brotli 2015-10-29 level 6 2.75 MB/s 80 MB/s 6163692 27.03
    brotli 2015-10-29 level 7 1.58 MB/s 75 MB/s 6108121 26.79

    ...so it did between brotli 6 and 7, using its default
    #define level 3
    #define match_level 1000

    ...but looking on source, it also supports level 5 and match_level 10000000. Ok, I'm coward and haven't dared to try it. Maybe it would pwn brotli 8, but I'm not sure I'll be patient enough to wait for it

    Code:
    $ ./lzoma-pack _x8664-linux-kernel out
    got 16777216 bytes, packing _x8664-linux-kernel into out...
    stats noe8 13054985 e8 13054890
    reverted e8
    init_same done.
    qsort done.
    sort done.
    res=40806940  
    res bytes=5100868
    out bytes=5100027
    got 6022016 bytes, packing _x8664-linux-kernel into out...
    stats noe8 5351026 e8 5350994
    reverted e8
    init_same done.
    qsort done.
    sort done.
    res=8334375  
    res bytes=1041797
    out bytes=1041431
    closing files let=1852858 lz=1299350 olz=928736
    bits lzlit=4080944 let=14822864 olz=1302628 match=20909136 len=8016040
    Conclusion: this thing should be renamed to LZMonstr or something like this. Truly monstrous brotli-grade compression in what still seems to be just a bit-aligned LZ w/o entropy coding at all. Uhm, I've thought crush level 2 is quite slow and memory hungry, but I should admit I've been really wrong on what to consider slow and memory hog when it comes to LZ compression

  11. The Following 2 Users Say Thank You to xcrh For This Useful Post:

    nemequ (2nd January 2016),willvarfar (5th January 2016)

  12. #8
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by xezz View Post
    This version uses divsufsort. and uses qsort for small block.Fixed bug for incompressible block.
    Er, sorry for my ignorance, but how one supposed to decompress this ARC data format? It does not seems to be MSDOS ARC, I guess, because text in end of file mentions it used lzma:21, but headers do not look like anything I can remember. Is it some modern kind of pkunzip.zip? And wouldn't it be wise just "fork" source repo on github and commit changes here, possibly filing pull request to original author? At the end of day, while I can diff source myself, it is much better when version control system does it on its own during its normal course of actions. Not to mention it would give much better overall "visibility" to code so plenty of random people can stumble on this code via search. And I bet this thing worth of it, since github version easily scored as most powerful "entropyless" LZ I've ever seen.

  13. #9
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by comp1 View Post
    Has anyone figured out how to compress with maximum ratio?
    Source gives us a hint one can use at least level 5 instead of 3 and also tweak match_level to much higher values, read the commented out part. But, uhm, it slow like a hell even with default parameters, I guess raising levels is going to make it even slower. And looking on output data I've got in case of Linux kernel, I bet 2nd compression phase like entropy coding would improve compression further, at least in some cases (of course at cost of decompression speed).

  14. #10
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Er, sorry for my ignorance, but how one supposed to decompress this ARC data format?
    archiver is freearc.
    And wouldn't it be wise just "fork" source repo on github and commit changes here, possibly filing pull request to original author?
    I have not know how to use the github well. I sent mail to author.

  15. #11
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by xezz View Post
    archiver is freearc.
    Hmm, really strange program. It claims it opensource, BUT...
    - When I try to use svn it refers to, it returns just empty page with no data. Am I doing something wrong?
    - When I try to get source, server claims file not found. For both 0.666 and 0.7 alpha.
    - When I try to view dir to try to get idea if file renamed, server tells I do not have rights. Reasonable, but...

    There're binary files, at least some of them download, but they do not support my OS/CPU combo and I'm not a big fan of downloading binaries without source. Are there technical issues on this server, or something? After day of trying I still haven't got how to decompress it :\. So it would be very kind to at least use some more popular/supported archive format, e.g. 7z. I'm still curious what this bug was and what you've changed and how it performs.

    I have not know how to use the github well. I sent mail to author.
    Actually using github is quite easy, on my former job even non-techie manager somehow has learned basics of it. In no way I'm expert in git as well, but I can handle git good enough to be able to say, switch to -dev branch of LZ5 and give most recent version a shot to see if claims about beating Crush 2 are true. Furthermore, thanks to git I've been able to pinpoint table where compression levels are set up to get idea how to trade speed vs compression ratio further, creating my own LZ5 level 16 settings. This squeezed some few extra bytes, but not much. Actually, crush still wins on some data, but sight of byte-aligned LZ beating good bit-aligned LZ looks very interesting, almost like this bit-aligned LZ beating many LZ+entropy schemes
    Last edited by xcrh; 2nd January 2016 at 14:55.

  16. #12
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Btw...
    1) I think I've found bug related to incompressible blocks myself, it seems to check var which was never touched
    2) Source also contains #ifdef LZX. I guess it haves something to do with LZX compressor? If I build it like this, ratio drops quite a lot.
    3) Whoever asked about maximum compression: I've gave it a try uncommenting level 5 and match_level 10000000 on relatively small (1Mb) file vs default, so I had chance to get result in some foreseeable future. If we assume defaults are "fucking slow", these settings hinted by author are (fucking slow)^(fucking slow). Forget megabytes per second. What about measuring compression speed in kilobytes per minute instead? At the end of day, it compresses better. But...

    Basically, "ultimate" settings killed compression speed very hard, while it has been below the floor already. Ok, at the end of day it shaved extra ~200 bytes per megabyte file. IMHO these ultimate settings only make sense if you're really up for Guinness World Records in bit-aligned LZ compression and you really need to shave few extra bytes at ABSOLUTELY ANY cost. I have not dared to compress linux kernel using these settings, rough estimate gives me idea it could fail to complete within whole night. Defaults are already "impressive enough" while seriously faster in terms of compression speed. Choosing between few kbytes/sec and few kbytes/min... choice is pretty clear for almost any practical use cases

    It has been like this:
    -rw-r--r-- 1 user group 1104524 Jul 20 06:45 adbacdcacdacca (original file)
    -rw-rw-r-- 1 user group 266711 Jan 3 03:39 adbacdcacdacca.5.zzz (level 5, match_level 10000000 as described above)
    -rw-rw-r-- 1 user group 284517 Jan 3 03:06 adbacdcacdacca.lzxzzz (default level 3, -DLZX=1, checking how LZX define works)
    -rw-rw-r-- 1 user group 266931 Jan 2 07:24 adbacdcacdacca.zzz (author's default, level 3)
    Last edited by xcrh; 3rd January 2016 at 04:35.

  17. #13
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Hmm, really strange program...
    OK, I post 7z format version. It is added some options.
    I think I've found bug related to incompressible blocks myself, it seems to check var which was never touched
    my posted lzoma avoids this bug.
    Source also contains #ifdef LZX. I guess it haves something to do with LZX compressor? If I build it like this, ratio drops quite a lot.
    maybe reserved or old version...
    Attached Files Attached Files

  18. The Following User Says Thank You to xezz For This Useful Post:

    xcrh (4th January 2016)

  19. #14
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by xezz View Post
    OK, I post 7z format version. It is added some options.

    my posted lzoma avoids this bug.

    maybe reserved or old version...
    I see, you've made level, block size and match level configurable. Nice idea! And also cleaned some debug stuff which has been hogging quite a lot of memory in compressor.

    As for LZX, code seems to be WIP/experimental and it seems like some leftover from some tryouts or so. If I define LZX=1, it works and generates valid data which can be decompressed, but ratio is getting seriously worse.

    And some things to consider...
    1) IMHO, deleting copyright header isn't best idea ever. This is easily tightest "entropyless" LZ I ever seen and I think it is good idea to mention one who created it. Knowledge of licensing terms does not hurts either. You can add yourself as well, I guess, since there were quite some changes.

    2) More readable variable names are good idea either. Original source wasn't super-good at it, sure. But I can decipher its main() any day. But your main() gives me headache. Why don't you use some more meaningful names? Reading code like if(*c<58&&*c>47){if(a<9)n=n*10+*c-48,a++;} really reminds reading (uncommented) disassembly. What are a, c, 58, 9 and 48? Not like if I can immediately guess it. Since this is not assembly, it is really good idea to use more or less descriptive names for vars & constants. Same idea about T, SA and LCP in "divsufsort(T,SA,LCP,n);". Sure, I can decode it if in dire need, but... it really would get some extra time.

    3) Not really sure why all parameters are "off by one". So level, match and block would be not what I expect. Well, there is mention for block size. It turns out same idea applies to level and match. Wouldn't it be more logical to leave computations for computers?

    4) typedef unsigned int U32; <--- Ok, are you sure "unsigned int" is always 32 bit on various CPUs/bits/compilers and so on? Original code isn't perfect in this regard either and uses strange mix of types. But still, world is full of strange machines and compilers and things like uint32_t were invented for a reason.

    5) Does use of some fancy lib which is like 2x vs rest of code improves something? It makes total source size 3x of original. And what is advantage? I'm curions. I feel like I'm nut and seriously missing something...

    6) At the end of compression your program crashes:
    bits lzlit=194968 let=676664 olz=77378 match=667526 len=518810 time=170.122
    *** Error in `./lzoma2': free(): invalid pointer: 0x00007ffd85ec72fe ***
    Aborted

    Looking on message, I can see it has completed compression and has been about to shut everything down. At this point there was some wrong call to free(); I've used 64-bit ubuntu and gcc 5.2 to compile it.

  20. #15
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    Quote Originally Posted by xezz View Post
    OK, I post 7z format version. It is added some options.

    my posted lzoma avoids this bug.

    maybe reserved or old version...
    Could anyone compile a 64-bit Windows binary for testing?

  21. #16
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    216
    Thanks
    97
    Thanked 128 Times in 92 Posts
    I tried to compile a 64 bit exe and I found an issue in the original 0.2 version (sometimes on >~30/40 Mb .7z files).
    You can reproduce it in this way:
    7z1514-x64\7z.exe a -mx1 ENWIK8 ENWIK8
    lzoma c ENWIK8.7z ENWIK8-c.lzoma
    lzoma d ENWIK8-c.lzoma ENWIK8-d.lzoma
    fc /b ENWIK8.7z ENWIK8-d.lzoma
    Code:
    C:\lzoma-0.2>7z1514-x64\7z.exe a -mx1 ENWIK8 ENWIK8
    
    7-Zip [64] 15.14 : Copyright (c) 1999-2015 Igor Pavlov : 2015-12-31
    Scanning the drive:
    1 file, 100000000 bytes (96 MiB)
    Creating archive: ENWIK8.7z
    Items to compress: 1
    
    Files read from disk: 1
    Archive size: 34908481 bytes (34 MiB)
    Everything is Ok
    
    C:\lzoma-0.2>lzoma c ENWIK8.7z ENWIK8-c.lzoma
    output=ENWIK8-c.lzoma
    block size:16777216, compression level:[3, 1000]
    got 16777216 bytes, packing ENWIK8.7z into ENWIK8-c.lzoma...
    stats noe8 262077 e8 262084
    init_same done.
    suffix sort done.
    sort done.
    res=149047651
    res bytes=18630957 got 16777216 bytes, packing ENWIK8.7z into ENWIK8-c.lzoma...
    stats noe8 262501 e8 262497
    reverted e8
    init_same done.
    suffix sort done.
    sort done.
    res=149048518
    res bytes=18631065 got 1354049 bytes, packing ENWIK8.7z into ENWIK8-c.lzoma...
    stats noe8 21196 e8 21194
    reverted e8
    init_same done.
    suffix sort done.
    sort done.
    res=12029958
    res bytes=1503745 closing files let=0 lz=0 olz=0
    bits lzlit=0 let=0 olz=0 match=0 len=0 time=29.048
    size:34908481 -> 34908493 (100.00%)
    time:29.113 s
    
    C:\lzoma-0.2>lzoma d ENWIK8-c.lzoma ENWIK8-d.lzoma
    output=ENWIK8-d.lzoma
    size:34908493 -> 34908481 (100.00%)
    time:0.040 s
    
    C:\lzoma-0.2>fc /b ENWIK8.7z ENWIK8-d.lzoma
    Confronto in corso dei file ENWIK8.7z e ENWIK8-D.LZOMA
    00011AA6: A5 4B
    00011AA7: 37 52
    00011AA8: C6 C7
    00017E18: 73 8B
    00017E19: 3B B9
    00017E1A: 32 33
    0001A7BB: 4E 09
    0001A7BC: 7D 25
    0001A7BD: 4C 4E
    0001C8D9: 9B 74
    0001C8DA: AD 76
    0001C8DB: 54 56
    0001F88E: 49 D7
    0001F88F: 62 5A
    0001F890: 58 5A
    00029174: DC 50
    00029175: 1E B0
    00029176: 28 2A
    0002B352: F1 43
    0002B353: BA 6E
    [...]

  22. #17
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    @xcrh
    1) Sorry, my mistake.
    2) You are correct.
    But this case, I didn't feel no need to complicated name. I'm sorry.
    *c<58&&*c>47 means *c<='9'&&*c>='0'.
    a<9 is to avoid overflow.
    SA: Suffix array, ISA: Inverse SA, LCP: Longest Common Prefix.
    Often it will be used their names in the paper.


    3) It's for type arguments easy.
    4) I have not yet accustomed to the C/C++.
    5) maybe reason is divsufsort. It's very fast.
    6) It's strange... my system does not report its error.


    @Mauro Vezzosi
    The fast block looks e8 transformed, but not incompressible and not called e8back(). it's bug.
    Last edited by xezz; 5th January 2016 at 00:58.

  23. The Following User Says Thank You to xezz For This Useful Post:

    xcrh (5th January 2016)

  24. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i think that for(;ISA[SA[i]]=i--;); should be for(;ISA[SA[i]]=i;i--); to ensure correctness

  25. #19
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by xezz View Post
    But this case, I didn't feel no need to complicated name. I'm sorry.
    Uhm, at least parameters passed to main() are normally named argc and argv, so everyone seeing these names immediately gets idea code related to commandline parsing, etc. Technically you can use whatever names, but it only makes sense if you take a part in obfuscated C code contest. In other cases... programs use some common convention, doing it some unusual ways only makes reading code harder for no apparent reasons.

    Also, variables names starting from _ (underscore) are basically "reserved for internal use" and should not be used by programmers in usual cases. Some vars/macros/etc with _names are actually used by compilers internally and if you manage to use existing internal name, strange things can happen. Most of time it works, but if you take a look around, _varnames aren't generally used in C programs, unless something special takes place.

    SA: Suffix array, ISA: Inverse SA, LCP: Longest Common Prefix.
    Yeah, after getting curious what is it, I've googled around and stumbled on mention of these, somewhere around Matt Mahoney site and his book or so.

    4) I have not yet accustomed to the C/C++.
    They are tricky and allow to do all sorts of tricks, yet not all tricks should be done during normal course of actions. Using main() with names other than argc and argv is something like this, just take a look on other commandline programs... And C types... uhm, before C99 appeared, they were, ahem, strange/troublesome. Roughly speaking, before C99, standards have not required int, char, long, ... to have any particular well defined size. This made programs fragile and writing portable program could be quite a challenge. At C99 ppl got fed up with it and standard got some new types defined, which are guaranteed to have width one expects, not something else. Can you imagine, char can be, say, 16 bit?! It does not violates standards, standard only demands char to be 8 bits or larger. So 16 bits are ok, and some devs implemented C compiler using 16 bit chars. Just because their CPU is inherently unable to deal with bytes and only can deal with 16-bit words, and their implementation is strange, but does not violates any standards. Its those who think char is 8 bits, int is 32 bit, etc are making really wrong assumptions. Real shame? AFAIK MS still lacks complete C99 support in MSVC, 16 years later. This forced many of my fellows to retreat to gcc or clang, who are much better at it. Yet, there're some headers defining C99 types for non-C99 compilers or one can go for C++, but types alone are IMHO utterly stupid reason to use C++ compiler.

    E.g. if we take a look on something like Crush, it uses C++ for no reason. In no way it counts as C++ code, it is just C, with like a couple of small goodies, most C++ish is probably namespace. Everything else is basically plain C. But it is C++. So most C programs would have hard time calling it as lib. It can be solved, but most people around would be not in mood to deal with such code, unless it something groundbreaking.

    If you want some really awful examples related to types, etc - take a look on LZO or UCL, these are old enough and so they have a lot of cruft to support even really awkward/old/defective/obsolete compilers. To make it more hard, it even supports some defective segmented memory models from MS-DOS age, etc. Which makes it code rather troublesome to read since it full of conditional defines. But it supports ... almost everyithing that got C compiler. Somewhat better code example could be something like LZ4. It still fairly portable, but does not supports horribly obsoleted cruft so it somewhat easier to read.

    Another example is byte order. Machines can be big endian or little endian. When you just write, say, int to file "directly", it thrown to file as is. No warranties made about byte order. So, good luck to read your file using straightforward code on machine with different endianess. Portable writing of files meant to interchange data is a bit more tricky, just as networking. Original code of mentioned program gives us some bad examples . So if someone would want to e.g. turn it to serious lib for practical use, things like this have to be fixed, etc.

    5) maybe reason is divsufsort. It's very fast.
    Hmm, I see, I failed to notice difference because on most data I've fed into this thing, it spends most time somewhere else - sorting takes very little part of overall compression time.

    6) It's strange... my system does not report its error.
    This seems to be message from glibc, and in recent versions it got more accurate about tracking invalid memory operations. Some libc implementations can just silently swallow invalid memory operations, but it can lead to exploitable vulns (not a case in this program I guess). It is better safe than sorry...
    Last edited by xcrh; 5th January 2016 at 04:37.

  26. The Following User Says Thank You to xcrh For This Useful Post:

    xezz (6th January 2016)

  27. #20
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    This is getting a bit off topic, but what the hell, I'll bite…

    Quote Originally Posted by xcrh View Post
    Also, variables names starting from _ (underscore) are basically "reserved for internal use" and should not be used by programmers in usual cases. Some vars/macros/etc with _names are actually used by compilers internally and if you manage to use existing internal name, strange things can happen. Most of time it works, but if you take a look around, _varnames aren't generally used in C programs, unless something special takes place.
    Not just "basically", they are reserved. § 7.1.3 of both C99 and C11 (there is probably a similar section in C90, but I didn't bother to look it up). There are a few name spaces where underscore followed by a *lowercase* letter are allowed (labels, struct/union members, etc.), but they are forbidden in the ordinary name space.

    Quote Originally Posted by xcrh View Post
    AFAIK MS still lacks complete C99 support in MSVC, 16 years later. This forced many of my fellows to retreat to gcc or clang, who are much better at it. Yet, there're some headers defining C99 types for non-C99 compilers or one can go for C++, but types alone are IMHO utterly stupid reason to use C++ compiler.
    FWIW, Visual Studio as of 2015 is almost C99-compliant. According to their web site:

    C99 Conformance Visual Studio 2015 fully implements the C99 Standard Library, with the exception of any library features that depend on compiler features not yet supported by the Visual C++ compiler (for example, <tgmath.h> is not implemented).
    That includes stdint.h. That said, I'm aware of at least one bug in their standard library which prevents it from being C99-compliant even though conformance wouldn't require compiler support (compare the return value of vsnprintf in standard C to Microsoft's implementation), but for the most part it's finally okay to just assume C99 support… nobody really uses the type-generic math functions anyways.

    Quote Originally Posted by xcrh View Post
    E.g. if we take a look on something like Crush, it uses C++ for no reason. In no way it counts as C++ code, it is just C, with like a couple of small goodies, most C++ish is probably namespace. Everything else is basically plain C. But it is C++. So most C programs would have hard time calling it as lib. It can be solved, but most people around would be not in mood to deal with such code, unless it something groundbreaking.
    I probably should have mentioned it on this forum a long time ago, but I have a port of CRUSH to straight C in Squash (see crush.c and crush.h). Some changes to make it easier to use as a library are also included (e.g., my version is thread-safe).

    Quote Originally Posted by xcrh View Post
    Another example is byte order. Machines can be big endian or little endian. When you just write, say, int to file "directly", it thrown to file as is. No warranties made about byte order. So, good luck to read your file using straightforward code on machine with different endianess. Portable writing of files meant to interchange data is a bit more tricky, just as networking. Original code of mentioned program gives us some bad examples . So if someone would want to e.g. turn it to serious lib for practical use, things like this have to be fixed, etc.
    It's actually more fun than that. The byte order can change *within the same program*. Apparently it's not too uncommon for compilers which can target both the CPU and a DSP simultaneously.

    Quote Originally Posted by xcrh View Post
    This seems to be message from glibc, and in recent versions it got more accurate about tracking invalid memory operations. Some libc implementations can just silently swallow invalid memory operations, but it can lead to exploitable vulns (not a case in this program I guess). It is better safe than sorry...
    You should try running the program in valgrind, or using AddressSanitizer (included in recent versions of GCC and Clang). They're both very good at exposing problems like that.

  28. #21
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by nemequ View Post
    This is getting a bit off topic, but what the hell, I'll bite…
    If you're up to me, its fine ... as long as there is good technical reasoning :P

    Not just "basically", they are reserved.
    I do not use _ in beginning of C vars, so do not bite me for this.

    FWIW, Visual Studio as of 2015 is almost C99-compliant.
    Er, my calendar shows it is 2016, 16+ years later. Maybe VS developed/used by MacLeod, or something. But for us, mortals, 16 years is a number.

    but for the most part it's finally okay to just assume C99 support… nobody really uses the type-generic math functions anyways.
    Finally, we can use C99 goodies and (hopefully) not to face yells from VS users. Honestly speaking, I do not have Windows machines around me for like 8 years, so I do not track VS development much. But I can admit these 8 years back, lack of C99 wasn't something great either.

    I probably should have mentioned it on this forum a long time ago, but I have a port of CRUSH to straight C in Squash (see crush.c and crush.h). Some changes to make it easier to use as a library are also included (e.g., my version is thread-safe).
    Whoa, you really should mention that (and you just did). Though your crush version appears to be uber-generic, etc. Not really sure I need THAT much for my "internal use", experiments and so on. I usually care about relatively simplisic (de)compression source buffer to destination. Also, original version appear to use static memory allocation. It haves useful property: if it has started, it started. Failure isn't an option. Either user GTFOffs at startup or it just works. Though it kills flexibility in choosing dict size.

    But when I read code like that in crush_compress
    Code:
      int* head = (int*)ctx->alloc((HASH1_SIZE+HASH2_SIZE) * sizeof(int), ctx->user_data);
      int* prev = (int*)ctx->alloc(W_SIZE * sizeof(int), ctx->user_data);
    
      memset (head, 0, (HASH1_SIZE+HASH2_SIZE) * sizeof(int));
      memset (prev, 0, W_SIZE * sizeof(int));
    ...it raises question: these are large chunks, OS running other stuff and may or may not have this much in backpack. What happens if it fails? I'm not expert in your lib, but at first glance it seems in case memory alloc failure it can just crash on following memset? Have I got it wrong?

    It's actually more fun than that. The byte order can change *within the same program*.
    ARM can change endianess of memory accesses on the fly (matter of one bit, set in one of registers), and size of instruction set (ARM <-> thumb mode). In reality it more complicated. Not all combinations are supported by all devices. Typically, ARMs are used in little endian, but it is possible to stumble on BE system. Nobody in sane mind switches data endianess during runtime for obvious reasons, though thumb to ARM code switches are used. Similar idea goes for mips. Most are little endian, but some few are running in BE mode. Though most "real" DSPs I've seen were a secondary processor with entirely different instruction set and very own program, though modern CPUs got "DSPish" things like multiply-accumulate.

    You should try running the program in valgrind, or using AddressSanitizer
    I think I have rough idea where it crashes by program behavior itself, but it can be a good idea.
    Last edited by xcrh; 8th January 2016 at 00:15.

  29. #22
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by xcrh View Post
    I do not use _ in beginning of C vars, so do not bite me for this.
    Not at all, just wanted to point out that you might want to be forceful about telling other people not to do it. A lot of people think it's just frowned upon, they don't realize that it's actually disallowed.

    Quote Originally Posted by xcrh View Post
    Er, my calendar shows it is 2016, 16+ years later. Maybe VS developed/used by MacLeod, or something. But for us, mortals, 16 years is a number.
    Oh, I agree it's been far too long (and they're still not there!). TBH I actually started telling VS users to piss off a couple years ago, but they finally got their act together enough that I can just tell people to upgrade. In fact, now that I think about it, I'm certain the reason they added the C99 support they did was because of me. Yup, that's definitely it.

    Quote Originally Posted by xcrh View Post
    Whoa, you really should mention that (and you just did). Though your crush version appears to be uber-generic, etc. Not really sure I need THAT much for my "internal use", experiments and so on. I usually care about relatively simplisic (de)compression source buffer to destination.
    Meh, I didn't think anyone really cared about crush. From my perspective it's not overly interesting, and I've never seen anyone express much interest in it after the initial announcement…

    It's really only as generic as I needed it to be to support streaming with minimal effort. The original version just used putc/getc (IIRC, maybe it was read/write or something), so converting it to use callbacks to replace those functions was the easiest way to support stuff other than file i/o.

    Quote Originally Posted by xcrh View Post
    Also, original version appear to use static memory allocation. It haves useful property: if it has started, it started. Failure isn't an option. Either user GTFOffs at startup or it just works. Though it kills flexibility in choosing dict size.
    Right, that was a big part of the reason it wasn't usable for me as it was. It's not thread-safe to use a file-scope buffer, and there is way too much data to put everything on the stack. That leaves the heap. I suppose I could have used TLS, but that seems a bit absurd.

    If I were writing it from scratch I would probably have it accept a workmem parameter like LZO does, but that would have required more work than what I did and I wouldn't have been useful to me. If I thought people really cared I might still do that, but the APi would probably require some restructuring so compression and decompression contexts are different types (IIRC they have very different memory requirements).

    Quote Originally Posted by xcrh View Post
    But when I read code like that in crush_compress
    Code:
      int* head = (int*)ctx->alloc((HASH1_SIZE+HASH2_SIZE) * sizeof(int), ctx->user_data);
      int* prev = (int*)ctx->alloc(W_SIZE * sizeof(int), ctx->user_data);
    
      memset (head, 0, (HASH1_SIZE+HASH2_SIZE) * sizeof(int));
      memset (prev, 0, W_SIZE * sizeof(int));
    ...it raises question: these are large chunks, OS running other stuff and may or may not have this much in backpack. What happens if it fails? I'm not expert in your lib, but at first glance it seems in case memory alloc failure it can just crash on following memset? Have I got it wrong?
    Oops. I usually check mallocs, not sure what I was thinking there; I've pushed an updated version.

    Anyways, if you want to keep talking about crush we should probably switch to its thread (or use the Squash bug tracker). I feel bad stealing this one.

  30. #23
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by nemequ View Post
    just wanted to point out that you might want to be forceful about telling other people not to do it.
    Yep, I should
    TBH I actually started telling VS users to piss off a couple years ago
    You're verrrrrry patient person.

    Meh, I didn't think anyone really cared about crush. From my perspective it's not overly interesting
    Depends on how you #define interesting. Overall it one of "pure" LZs (lacking entropy coding) which isn't bad in terms of ratio, yet decompression is simple. Though due to bit-aligned nature it isn't very fast, so by modern standards overall tradeoff isn't groundbreaking. Since it isn't bad in compression ratio and simple at decompression, it has been used as some fancy milestone. Sure, these ratios are better at THESE decompression speeds. I guess crush decompression can also be optimized e.g. in ARM assembly, but byte-aligned LZs have inherent advantages in this regard.OTOH, my experiments on ARM board shown many "fast" lz+entropy schemes can actually be "not so fast" to decompress on ARM. Ironically, that's where I need for speed most. E.g, brotli or lzham are faster than lzma even on ARM, sure. But on ARM they are still seriously slower than zlib. Like 1.5x-2x slower. Zstd is only marginally faster than zlib, may or may not beat it at ratio, and nowhere close to lz4 and lz5 in terms of decomp speed. Not to mention trying to get things like LZHAM to decompress me a kernel can be ultimate headache, it just way too far off the track. What if I only have CPU, memory and source data? No files, no threads, possibly no even malloc, at least not "real" one. That's where one can take a look on things under different angle, and simple decompressor which is not picky to environment is a good bonus, any day. LZOMA seems to be interesting in this context: simple decompressor, which isn't picky, tight ratio, no reasons to seriously degrade decomp speed on e.g. ARMs, etc.

    It's really only as generic as I needed it to be to support streaming with minimal effort.
    TBH most of time when I care about compression it isn't a generic file archiver to stream 20Gb through it, but some more specific cases, with input and output fully in memory. As you can see I like to compress e.g. Linux kernel. Which (self-)decompressed before starting by either own stub or by boot loader. On e.g. ARM systems I can face slow storage and weak CPU at once, so it can be interesting to try to improve it, but it means really different tradeoffs. And at the end of day I can always resort to several "independent blocks". Not uber-optimal, but works. LZOMA seems to do it, when input exceeds MAX_BLOCK. And still achieves remarkable ratio. I guess it is possible to improve LZOMA ratio on my kernel example in "cheap and dirty" way, by raising MAX_BLOCK, but it would also make compression even more memory hungry

    Mentioned approach happens to be ok to load some pre-compressed resources which are supposed to be fully in RAM anyway, etc. But sure, if I would be up for some generic file-based compressor shuffling 20Gb something, I would agree this approach isn't exactly best. Though once again, people may want to access data in random way with reasonable overhead, in this case "independent blocks" aren't bad either. Actually, all filesystem compressions I've seen were independent blocks of faily limited size, it hurts ratio but allows random access at sane speed.

    Right, that was a big part of the reason it wasn't usable for me as it was. It's not thread-safe to use a file-scope buffer, and there is way too much data to put everything on the stack. That leaves the heap. I suppose I could have used TLS, but that seems a bit absurd.
    In some cases where I care, the whole notion of "thread" could be missing. What is a thread if I have CPU, memory, data, and nothing else around? No, sure, threads aren't bad. But C is versatile thing and can run in quite strange environments.

    If I were writing it from scratch I would probably have it accept a workmem parameter like LZO does,
    Yeah, if I haven't told, despite scary code, LZO seems to have quite reasonable API, which is quite nice to use. But darn, reading its code... uhhh, it seems it has accumulated a lot of bones and dust for really weird & obsolete things.

    Anyways, if you want to keep talking about crush we should probably switch to its thread (or use the Squash bug tracker). I feel bad stealing this one.
    Yeah, probably it would be better to stop talking about crush here. But it had something in common: LZ without "real" entropy coding, simple decompressor. Somehow, this approach still manages to surprise me, taking new heights here and there. Of course there're some limits. One obvious example is compression real-world bitmap images w/o preprocessing. Even LZOMA would not show groundbreaking results. There're just not too much matches LZ kind of algos can catch "directly".

  31. #24
    Member
    Join Date
    Jan 2016
    Location
    Russia, Samara
    Posts
    3
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Thanks to everyone for your interest in LZOMA code.

    Quote Originally Posted by xcrh View Post
    As you can see I like to compress e.g. Linux kernel. Which (self-)decompressed before starting by either own stub or by boot loader.
    This was primary LZOMA goal, I had some experimental code that decompressed and loaded Linux kernel. Another possible usage would be read-only filesystems like squashfs.
    But it was not usable due to very slow compresion speed and huge compressor memory requirements, and I did not have time to rewrite that yet (as LZOMA was written in spare time, my main job has nothing to do with data compression).

    I will look at divsufsort version, but that is probably still slow as it still uses nearly-optimal parsing starting from the end of the buffer.
    LZOMA compressed format is like simplified LZMA with static entropy coding instead of "real" entropy coding. It should be possible to adapt LZMA or Tornado/FreeARC code to make a much faster LZOMA packer.

    It is possible to improve compression ratio by using real entropy coding, but at the cost of decompressor speed.
    Currently, decompression does not use additional memory except input/output buffer, only a few variables that fit into CPU registers even on x86.
    Packer allows to write compression tokens (literals, lz matches, lz lengths, repeated lz lengths, flags) into separate files, for experiments with entropy coding.

    "#ifdef LZX" in source code were experiments, I was trying to combine length/distance codes like MS LZX does. That did not seem to improve compression ratio or decompression speed (but further experiments may be useful).

  32. The Following User Says Thank You to alef For This Useful Post:

    xcrh (12th January 2016)

  33. #25
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    I am interesting in pack2.c. Because its outbut is also compressible.
    Code:
           pack2  gzip   pack2+gzip lzma   pack2+lzma
    book1  329255 299755 258692     261275 250862
    enwik6 364300 341976 290746     288253 255542
    But there is not unpacker.

  34. The Following User Says Thank You to xezz For This Useful Post:

    Bulat Ziganshin (11th January 2016)

  35. #26
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    Quote Originally Posted by xezz View Post
    I am interesting in pack2.c. Because its outbut is also compressible.
    Code:
           pack2  gzip   pack2+gzip lzma   pack2+lzma
    book1  329255 299755 258692     261275 250862
    enwik6 364300 341976 290746     288253 255542
    But there is not unpacker.
    Now I am interested too. We need a Windows executable for the packer and unpacker for testing.

  36. #27
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by alef View Post
    Thanks to everyone for your interest in LZOMA code.
    Thanks for quite some extra fun. After so many years, LZ algos still manage to put a show in unexpected ways.

    This was primary LZOMA goal, I had some experimental code that decompressed and loaded Linux kernel. Another possible usage would be read-only filesystems like squashfs.
    Kernel compresses pretty well, even without E8E9, though in my test run it seems it did 2 independent blocks. I guess increasing MAX_BLOCK to make whole kernel fitting it would give more impressive result. But since there're few tables dependent on it, memory usage in compressor is going to increase as well. Though I think some hybrid approach can do better. E.g. I can see large areas full of zeros existing in modern kernel are remaining quite redundant. Seems Linux kernel going to like some RLE, etc. But just doing 2nd pass would kill decompression speed for sure. Hmm.

    But it was not usable due to very slow compresion speed and huge compressor memory requirements, and I did not have time to rewrite that yet (as LZOMA was written in spare time, my main job has nothing to do with data compression).
    Well, since people build kernels once in a while, it can make some sense. Sometimes. Though I would agree compression speed is slow and takes a lot of memory, so it good thing for Guinness records

    I will look at divsufsort version, but that is probably still slow as it still uses nearly-optimal parsing starting from the end of the buffer.
    TBH I initially failed to notice what has changed in divsufsort version, since it seems this parsing takes almost all time of compressor, and sorting hardly major contributor to overall time. So overall speed is almost unchanged for compressible data. Maybe speeds up incompressible things a bit, since parsing can happen quite fast in these cases.

    LZOMA compressed format is like simplified LZMA with static entropy coding instead of "real" entropy coding.
    Seems this idea getting popular. I.e. LZ5 uses something like huffman codes... without any real huffman, etc. Just some smarter/more compact encoding. Still mostly trying to stay on byte boundaries to avoid massive speed loss.

    It should be possible to adapt LZMA or Tornado/FreeARC code to make a much faster LZOMA packer.
    I wonder if it going to hurt ratio, etc? At the end of day, one can just grab LZ5, if they are ok with ratio around zlib and 2-3x better decompresion speed.

    It is possible to improve compression ratio by using real entropy coding, but at the cost of decompressor speed.
    Yep, and that's a problem. For example, my experiments on ARM board shown virtually any entropy coding kills speed a lot. Even ZSTD which looks nice on x86 downspirals to something barely better than zlib on ARM. Most of other schemes using entropy coding are even worse. And x86's are fast, unlike ARMs...

    [quote]Currently, decompression does not use additional memory except input/output buffer, only a few variables that fit into CPU registers even on x86.[/quite]
    At the end of day, that's what i like so much about many LZs around.

    Packer allows to write compression tokens (literals, lz matches, lz lengths, repeated lz lengths, flags) into separate files, for experiments with entropy coding.
    Thanks for explaination. I wondered why it have to be here and it explains it. But TBH I'm thinking entropy coding could make it seriously slower to decompress, killing all fun.

    "#ifdef LZX" in source code were experiments, I was trying to combine length/distance codes like MS LZX does. That did not seem to improve compression ratio or decompression speed (but further experiments may be useful).
    On all data I've attempted to use it actually killed compression ratio by noticeable margin.

    p.s. original compressor seems to have bug in main(): it does bres=pack(n); but then if (res==n), while res was never set in this scope. I bet it meant if (bres == n) for something like detecting incomperssible data, right? It also looks there is quite useless memcpy and buffer. One can just copy src to dest, unless I've seriously missed some point.

  37. #28
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    Quote Originally Posted by xezz View Post
    I am interesting in pack2.c. Because its outbut is also compressible.
    Code:
           pack2  gzip   pack2+gzip lzma   pack2+lzma
    book1  329255 299755 258692     261275 250862
    enwik6 364300 341976 290746     288253 255542
    But there is not unpacker.
    Actually, even output of "usual" version is compressible to some degree in some cases.
    Code:
    22799232 _x8664-linux-kernel
     6141476 _x8664-linux-kernel.lzoma
     5701446 _x8664-linux-kernel.lzoma.gz
    ...but doing something like this is not really interesing in terms of decompression speed. Decompressing lzoma + lzma is slower than lzma. Which is quite slow to decompress on its own.

  38. #29
    Member
    Join Date
    Jan 2016
    Location
    Russia, Samara
    Posts
    3
    Thanks
    2
    Thanked 2 Times in 2 Posts
    pack2.c was just experiments with byte-aligned encoding. It was made to estimate compression ratio only. Compressed stream it produces may be not correct and that may be the reason why it compresses by lzma so well. Please do not waste time on pack2.c, unless you want to continue that experiments. If you want to estimate possible improvements to compression ratio, it would be better to produce separate token streams by pack.c, then encode them separately by entropy encoders.

    I already fixed bug with res/bres in original pack.c, also planning to clean up code from older experiments soon, to make code more readable.

    pack.c is so slow, because it does very active search for repetitive LZ matches, like
    LZ<offset, length> | Literals | LZ<same offset, length2> | Literals | LZ <same offset, length3> ...
    If anyone knows fast algorithm to find such matches, please let me know.
    There are also some heuristics to limit that search (early versions without them was even slower).
    Also, minimal match length in pack.c is 2 for match, 1 for repeated match, which is yet another reason why compression is slow, but it increases compression ratio.

    As for kernel compression, my experiments show that e8 transform is very useful for x86 kernel, but I have not yet adapted it for x86-64.
    I also tried to compressing kernel with RLE first, it actually reduced compression ratio a bit.
    And yes, increasing lzoma block size to compress kernel in one block would compress your kernel better.

  39. #30
    Member
    Join Date
    Jan 2016
    Location
    Russia, Samara
    Posts
    3
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by xcrh View Post
    6141476 Jan 2 02:33 _x8664-linux-kernel.lzoma
    Latest lzoma pack.c version from git repository may compress files larger than MAX_SIZE a bit better, as now it uses previous block as history for the next blocks.
    In my tests, it compressed enwik8 to 28593071 bytes with default settings, compression time was about 40 minutes.
    So, the compression ratio was between brotli -q 9 and brotli -q 10.
    Decompression speed on my system (x86-64) is between gzip and brotli, zstd decompresses about twice faster.
    I have not compared speed on other platforms yet, will probably test on ARM and MIPS later.

    On binaries, without e8 filter, on my test linux kernel, brotli still compresses a bit better, while zstd had worse compression ratio than lzoma.

    lzoma still does not use any adaptive entropy coding (no coding for literals, static bit encoding for length, almost-static encoding for distances, that that depends only on maximal possible distance), but looking at the zstd decompression speed, I will probably try that.

  40. The Following User Says Thank You to alef For This Useful Post:

    xcrh (19th January 2016)

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •