Results 1 to 17 of 17

Thread: Very slow compression with very fast decompression?

  1. #1
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts

    Very slow compression with very fast decompression?

    I've been doing testing of many compressors to test compression and decompression speed but it seems like you can't really do better than DEFLATE and LZMA2, as far as text is concerned. For example, with my rockyou.txt as a test file, the gzip'ed file decompressed twice as fast as the xz'ed file with the filesize being ~4.6MB smaller on the xz file.

    Sizes and decompression speed:

    139921497 rockyou.txt
    043716819 rockyou.txt.gz 0m2.032s
    038785200 rockyou.txt.xz 0m5.409s
    032583172 rockyou.txt.tan 3m48.014s

    Is it possible to decode as fast as LZMA2 and yet have better compression? Compression speed is not an issue.

    .tan is TANGELO 2.4

  2. #2
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Have you tried zopfli?

  3. #3
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Well, you can do better than gzip without sacrificing decompression speed by using Zopfli. I think requiring high decompression speed places limits on the compression you can achieve, because you have to limit the complexity of the stream and the amount of data that needs to be touched during decompression. A sliding window is ideal for maximizing cache utilization, and gzip doesn't use arithmetic coding. I'm not that intimately familiar with LZMA2, but it represents another level of trade-off. Maybe you could try stripping away some parts of Tangelo, or trying to see if there's an even more efficient way to do sliding window LZ77 that doesn't add complexity to the decoder. I'm not sure that anyone has achieved the full potential of LZ77 compression, because you run into something like the traveling salesman problem when trying to compute the absolute optimal factorization into dictionary phrases. I saw a paper by Manzini and Ferragima on a flavor of optimal LZ77 that showed impressive results, but I couldn't find an implementation and I don't remember all the details.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i think that lzma is extensively optimized toward exactly this goal that made it world-best compression algorithm. so, if you find what you want, it will put lzma off the throne

  5. #5
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Mangix View Post
    Is it possible to decode as fast as LZMA2 and yet have better compression? Compression speed is not an issue.
    Not in my tests, although right now those only show the default compression levels. However, if you're willing to sacrifice a bit on compression ratio, you might want to take a look at Doboz. Also, LZ4 HC achieves pretty good compression ratio with extremely fast decompression.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    doboz: close to zlib compression ratio
    lz4: even worse

    these are absolutely opposite things to requested one

  7. #7
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    doboz: close to zlib compression ratio
    lz4: even worse

    these are absolutely opposite things to requested one
    Opposite would be lower compression ratio and slower decompression. Doboz and LZ4 HC both decompress much faster than LZMA2 and other algorithms with compression ratios competitive to their own by sacrificing compression speed. Also, I forgot to mention LZHAM, which is *very* close to LZMA2's compression ratio (97% on enwik9 according to the Large Text Compression Benchmark) but decompresses much faster (xz takes 4 times as long).

    It's pretty clear that he cares about two things: compression ratio and decompression speed. LZMA2 may have the best decompression speed at or above the compression ratio it provides, but for a relatively small decrease in compression ratio there are algorithms which decompress *much* faster. If compression ratio was all he cared about he wouldn't be interested in LZMA2, he would be interested in PAQ/ZPAQ.

  8. #8
    Member Grey Haze's Avatar
    Join Date
    May 2013
    Location
    United States
    Posts
    1
    Thanks
    1
    Thanked 0 Times in 0 Posts
    On The Bit-Complexity Of Lempel-Ziv Compression
    http://zola.di.unipi.it/rossano/wp-c...f/sicomp13.pdf

  9. #9
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    You can try Lzturbo https://sites.google.com/site/powturbo/home/

    Input RAM-disk:
    139,921,497 bytes, rockyou.txt password list

    Output RAM-disk:
    86,003,058 bytes, 0.538 sec. - 0.089 sec., LZ4 - 0
    83,815,234 bytes, 0.610 sec. - 0.115 sec., lzturbo - 10
    80,731,331 bytes, 4.312 sec. - 0.093 sec., LZ4 - 1
    78,926,464 bytes, 0.887 sec. - 0.142 sec., lzturbo - 11
    78,852,360 bytes, 2.463 sec. - 0.136 sec., lzturbo - 12
    77,990,468 bytes, 0.588 sec. - 0.176 sec., lzturbo - 20
    76,635,519 bytes, 5.994 sec. - 0.086 sec., LZ4 - 2
    75,838,736 bytes, 50.026 sec. - 0.145 sec., lzturbo - 19
    75,791,237 bytes, 1.751 sec. - x.xxx sec., eXdupe - 1
    74,883,081 bytes, 0.507 sec. - 0.416 sec., Qpress - 1
    73,088,582 bytes, 0.940 sec. - 0.473 sec., Qpress - 2
    71,358,053 bytes, 2.420 sec. - 0.322 sec., Qpress - 3
    70,283,041 bytes, 2.733 sec. - 0.189 sec., lzturbo - 22
    69,551,050 bytes, 0.874 sec. - 0.174 sec., lzturbo - 21
    66,931,499 bytes, 7.059 sec. - 2.319 sec., WinZpaq - 1
    62,907,353 bytes, 14.117 sec. - 2.365 sec., WinZpaq - 2
    62,672,792 bytes, 3.502 sec. - x.xxx sec., eXdupe - 2
    62,163,480 bytes, 53.843 sec. - 0.205 sec., lzturbo - 29
    61,216,816 bytes, 2.168 sec. - 2.648 sec., Bsc - 6
    60,224,563 bytes, 1.379 sec. - 1.206 sec., FreeArc - 1
    59,619,910 bytes, 1.844 sec. - 2.603 sec., Bsc - 5
    59,583,383 bytes, 9.518 sec. - x.xxx sec., eXdupe - 3
    58,460,982 bytes, 0.859 sec. - 0.558 sec., lzturbo - 30
    56,069,991 bytes, 2.402 sec. - 0.959 sec., WinRAR - 1
    55,982,598 bytes, 1.547 sec. - 2.256 sec., Bsc - 4
    53,384,030 bytes, 1.101 sec. - 0.589 sec., lzturbo - 31
    53,371,931 bytes, 3.664 sec. - 1.282 sec. zlite64
    53,371,931 bytes, 3.359 sec. - 1.332 sec. zlite
    53,272,405 bytes, 4.420 sec. - 1.440 sec., FreeArc - 2
    52,406,275 bytes, 1.131 sec. - 1.512 sec., NanoZip - F
    52,009,382 bytes, 3.606 sec. - 0.588 sec., lzturbo - 32
    51,984,577 bytes, 9.373 sec. - 0.898 sec., WinRAR - 2
    51,873,958 bytes, 0.865 sec. - 0.968 sec., NanoZip - f
    50,662,352 bytes, 1.424 sec. - 1.679 sec., Bsc - 3
    49,962,294 bytes, 17.753 sec. - 0.908 sec., WinRAR - 3
    48,860,271 bytes, 14.891 sec. - 2.295 sec., 7-Zip - 4
    48,593,712 bytes, 11.505 sec. - 2.311 sec., 7-Zip - 3
    48,309,789 bytes, 11.146 sec. - 2.964 sec., FreeArc - 3
    48,240,846 bytes, 10.057 sec. - 2.352 sec., 7-Zip - 2
    47,790,775 bytes, 8.857 sec. - 2.403 sec., 7-Zip - 1
    46,585,234 bytes, 61.924 sec. - 19.664 sec., WinZpaq - 4
    46,585,234 bytes, 29.853 sec. - 19.696 sec., WinZpaq - 3
    46,491,721 bytes, 5.453 sec. - 0.971 sec., NanoZip - dp
    46,230,853 bytes, 4.019 sec. - 0.967 sec., NanoZip - d
    45,909,309 bytes, 6.272 sec. - 0.960 sec., NanoZip - dP
    43,158,484 bytes, 60.337 sec. - 0.558 sec., lzturbo - 39
    42,933,767 bytes, 34.634 sec. - 11.866 sec., NanoZip - o
    42,335,648 bytes, 51.366 sec. - 15.344 sec., NanoZip - O
    42,139,185 bytes, 7.024 sec. - 1.106 sec., NanoZip - Dp
    41,847,491 bytes, 6.850 sec. - 1.101 sec., NanoZip - D
    41,826,758 bytes, 7.726 sec. - 1.104 sec., NanoZip - DP
    38,782,888 bytes, 35.672 sec. - 2.908 sec., FreeArc - 4
    38,705,050 bytes, 50.012 sec. - 2.285 sec., 7-Zip - 5
    37,210,666 bytes, 75.700 sec. - 2.174 sec., lzturbo - 49
    32,611,697 bytes, 44.294 sec. - x.xxx sec., ZCM - 7

    Single thread

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    FreeArc *xx modes are very asymmetric too. Weaker than LZMA(2) though.
    I know of only 2 highly asymmetric general purpose codecs that would be stronger than LZMA, both created by Shelwein.
    PLZMA - doesn't meet your goals because of slower decompression.
    An IIRC nameless modification to LZMA's CM that gave a tiny improvement over the original.

  11. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Grey Haze View Post
    On The Bit-Complexity Of Lempel-Ziv Compression
    http://zola.di.unipi.it/rossano/wp-c...f/sicomp13.pdf
    That's it. Thanks.

  12. #12
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Oh, forgot, there's also LZMH, faster but weaker LZMA's brother.

  13. #13
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    Quote Originally Posted by RichSelian View Post
    Have you tried zopfli?
    yes. the size I posted was actually created with KZIP + many random trials with zipmix and a conversion to gz using kzip2gz. i tried zopfli -i1000 but it gave a larger filesize. traditional gzip -9 gives a size of ~50MB.

    Quote Originally Posted by Sportman View Post
    I tried lzturbo and it is indeed very fast. It decompresses in 2.280 seconds on my PC while having a smaller filesize. Very competitive indeed. It would be really nice if it supported decompressing to stdout though. Plus I've heard that it contains a lot of stolen code.

    I guess I should explain my reasoning for wanting fast decompression. rockyou.txt is a wordlist which gets fed to a different program, either through stdout or through the file itself. My idea is instead of storing gigabytes of wordlists similar to rockyou uncompressed, I could compress them and then pipe the decompressed result like so: xzcat | program . This would be especially good on storage limited media like flash drives.

    I can do this with .gz, .xz, and .tan files(I modified the source code to do this) but not so much with most others. On the comparison, NanoZip is also very competitive but the same problem appears.

    So it seems that for my purposes, it's currently hard to beat DEFLATE and LZMA2 depending on the desired decompression speed.

  14. #14
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    If the wordlist is sorted, you could just suppress the common prefixes. There is a program that does this.

  15. #15
    Member
    Join Date
    Jul 2013
    Location
    Abu
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by nburns View Post
    If the wordlist is sorted, you could just suppress the common prefixes. There is a program that does this.
    What program?

  16. #16
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    .................................................. .................................................. .........
    Last edited by GOZARCK; 5th November 2013 at 20:26.

  17. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Sshingen View Post
    What program?
    I've seen the package in Ubuntu, and I don't have an Ubuntu machine handy, but this might be it, or equivalent to it: http://linux.die.net/man/1/word-list-compress

    Edit: here's another one that was linked to from that manpage: http://linux.die.net/man/1/prezip-bin

Similar Threads

  1. slow getc() / putc() in VC++
    By Matt Mahoney in forum The Off-Topic Lounge
    Replies: 9
    Last Post: 11th December 2012, 15:30
  2. Google released Snappy compression/decompression library
    By Sportman in forum Data Compression
    Replies: 11
    Last Post: 16th May 2011, 12:31
  3. Slow Visual 2010
    By Cyan in forum The Off-Topic Lounge
    Replies: 23
    Last Post: 24th May 2010, 02:03
  4. decompression is ~1000 times slower than compression?
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 17
    Last Post: 28th April 2010, 13:27
  5. Fast decompression of big files
    By SvenBent in forum Forum Archive
    Replies: 16
    Last Post: 8th March 2008, 20:17

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •