Results 1 to 20 of 20

Thread: lightweight ROLZ compression utility

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

    Talking lightweight ROLZ compression utility

    https://github.com/richox/zlite

    I made this toy when playing with petabytes of logging data in Baidu Inc. For our logging data (apache log, with many url-encoded Chinese words), it compresses much better than gzip, while compression time is almost halved.

    zlite also performs well on enwik, thought not as good as logging data. but it performs badly on binary data.

    simple enwik8 benchmark, on my Thinkpad-x220, Windows7.

    Code:
    Tool     Compressed Size     Encode     Decode
    zlite    33975840            3.283s     1.321s
    gzip     36518322            6.635s     1.268s

  2. The Following 4 Users Say Thank You to RichSelian For This Useful Post:

    Bulat Ziganshin (20th August 2013),GOZARCK (20th August 2013),Nania Francesco (20th August 2013),Sportman (21st August 2013)

  3. #2
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    forget my reply
    Last edited by just a worm; 20th August 2013 at 13:06.

  4. #3
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I wanted to alert you that the test on WCC 2013 is wrong with an .ISO file

  5. #4
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    there are some build version to test?

  6. #5
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    .................................................. ............
    Last edited by GOZARCK; 5th November 2013 at 20:27.

  7. #6
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Can you test FreeArc 2 and Bsc 0, 4, 5, 6 at your Thinkpad-x220 log file/enwik8?

    Input RAM-disk:
    656,657,589 bytes, IIS log file

    Output RAM-disk:
    8,779,571 bytes, 52.505 sec. - 51.036 sec., ZCM - 7
    10,654,711 bytes, 46.561 sec. - 45.316 sec., ZCM - 0
    11,476,287 bytes, 10.101 sec. - 6.386 sec., FreeArc - 3
    13,892,288 bytes, 1.892 sec. - 1.851 sec., Bsc - 6
    14,435,852 bytes, 1.392 sec. - 1.736 sec., Bsc - 5
    14,490,689 bytes, 15.992 sec. - 10.537 sec., FreeArc - 5
    14,500,254 bytes, 13.606 sec. - 8.864 sec., FreeArc - 4
    14,716,422 bytes, 169.555 sec. - 168.172 sec., WinZpaq - 4
    14,958,200 bytes, 3.659 sec. - 1.614 sec., Bsc - 0
    15,716,912 bytes, 1.080 sec. - 1.601 sec., Bsc - 4
    16,395,752 bytes, 67.246 sec. - 10.595 sec., NanoZip - o
    16,803,552 bytes, 3.047 sec. - 4.497 sec., FreeArc - 2
    19,903,130 bytes, 1.053 sec. - 1.522 sec., Bsc - 3
    25,086,555 bytes, 67.833 sec. - 1.776 sec., 7-Zip - 5
    25,153,225 bytes, 2.719 sec. - 1.289 sec., zlite
    25,153,225 bytes, 2.892 sec. - 1.239 sec., zlite64
    27,279,862 bytes, 41.922 sec. - 0.968 sec., WinRAR - 5
    27,336,258 bytes, 33.497 sec. - 0.962 sec., WinRAR - 4
    27,550,717 bytes, 24.459 sec. - 0.958 sec., WinRAR - 3
    28,697,183 bytes, 16.034 sec. - 0.962 sec., WinRAR - 2
    29,169,224 bytes, 39.097 sec. - 1.948 sec., NanoZip - D
    29,444,951 bytes, 12.770 sec. - 1.825 sec., 7-Zip - 4
    29,914,986 bytes, 29.475 sec. - 1.967 sec., NanoZip - Dp
    30,164,022 bytes, 1.920 sec. - 0.523 sec., lzturbo - 32
    30,238,600 bytes, 29.962 sec. - 1.996 sec., NanoZip - DP
    30,649,206 bytes, 1.655 sec. - 1.303 sec., NanoZip - F
    30,758,520 bytes, 11.015 sec. - 1.874 sec., 7-Zip - 3
    31,445,500 bytes, 791.419 sec. - 0.338 sec., lzturbo - 29
    32,610,215 bytes, 0.840 sec. - 0.532 sec., lzturbo - 31
    33,341,412 bytes, 9.598 sec. - 2.004 sec., 7-Zip - 2
    33,809,356 bytes, 18.852 sec. - 5.849 sec., WinZpaq - 2
    35,202,109 bytes, 21.908 sec. - 14.860 sec., WinZpaq - 3
    35,615,429 bytes, 10.700 sec. - 5.739 sec., WinZpaq - 1
    36,720,511 bytes, 9.067 sec. - 2.252 sec., 7-Zip - 1
    37,045,006 bytes, 2.180 sec. - 1.364 sec., FreeArc - 1
    37,277,217 bytes, 0.674 sec. - 0.529 sec., lzturbo - 30
    37,357,361 bytes, 4.263 sec. - 1.132 sec., WinRAR - 1
    37,559,992 bytes, 8.301 sec. - 2.043 sec., NanoZip - dP
    38,088,820 bytes, 6.423 sec. - 2.048 sec., NanoZip - d
    38,100,212 bytes, 7.613 sec. - 2.059 sec., NanoZip - dp
    38,847,607 bytes, 68.693 sec. - x.xxx sec., eXdupe - 3
    39,037,543 bytes, 1.667 sec. - 0.331 sec., lzturbo - 22
    42,242,008 bytes, 0.678 sec. - 0.328 sec., lzturbo - 21
    42,731,407 bytes, 1.500 sec. - 1.205 sec., NanoZip - f
    45,880,242 bytes, 613.189 sec. - 0.302 sec., lzturbo - 19
    49,380,202 bytes, 0.479 sec. - 0.295 sec., lzturbo - 20
    51,955,489 bytes, 1.873 sec. - 0.306 sec., lzturbo - 12
    53,444,842 bytes, 0.846 sec. - 0.310 sec., lzturbo - 11
    57,957,550 bytes, 0.514 sec. - 0.262 sec., lzturbo - 10
    58,735,496 bytes, 0.473 sec. - 0.193 sec., LZ4 - 0
    60,887,860 bytes, 1.445 sec. - 0.676 sec., Qpress - 2
    61,034,299 bytes, 7.113 sec. - x.xxx sec., eXdupe - 2
    61,811,329 bytes, 5.253 sec. - x.xxx sec., eXdupe - 1
    62,824,542 bytes, 5.297 sec. - 0.558 sec., Qpress - 3
    64,437,514 bytes, 0.856 sec. - 0.666 sec., Qpress - 1

    All 1 thread.
    Last edited by Sportman; 21st August 2013 at 12:00. Reason: Updated/added zlite/zlite64 with Bulat compiles

  8. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Very fast. I updated LTCB. http://mattmahoney.net/dc/text.html

    Edit: on the Silesia corpus, decompression of mozilla and sao did not verify. Output was the right size but contents differed.
    Last edited by Matt Mahoney; 21st August 2013 at 05:15.

  9. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    GCC 4.7.2 -O3 -funroll-all-loops -mtune=generic -msse2 compilations
    Attached Files Attached Files

  10. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    GOZARCK (21st August 2013),Sportman (21st August 2013)

  11. #9
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    .................................................. ......
    Last edited by GOZARCK; 5th November 2013 at 20:27.

  12. #10
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    My own compiles with bulat's settings show improvement in compression speed but not decompression speed. Haven't really messed with compiler settings.


    Code:
    mangix@Mangix ~/devstuff/zlite
    $ time ./zlite.exe d < rockyou.txt.zlite > /dev/null
    
    
    real    0m3.403s
    user    0m3.291s
    sys     0m0.078s
    
    
    mangix@Mangix ~/devstuff/zlite
    $ time gunzip -c < rockyou.txt.gz > /dev/null
    
    
    real    0m1.971s
    user    0m1.872s
    sys     0m0.093s

  13. #11
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

    bugs fixed, checkout newer version, please!

  14. The Following 2 Users Say Thank You to RichSelian For This Useful Post:

    GOZARCK (21st August 2013),Sportman (21st August 2013)

  15. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    gcc 4.7.2
    gcc -O3 -funroll-all-loops -mtune=generic -s -msse2 -o zlite.exe -m32 zlite.c -static
    gcc -O3 -funroll-all-loops -mtune=generic -s -o zlite64.exe -m64 zlite.c -static

    btw, you can install gcc-mingw by running http://citylan.dl.sourceforge.net/pr...ds-install.exe

    the version i've used to compile is x64-4.7.2-release-posix-sjlj-rev11 but you can try the newer 4.8.1
    Attached Files Attached Files
    Last edited by Bulat Ziganshin; 21st August 2013 at 21:02.

  16. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    GOZARCK (22nd August 2013),Sportman (21st August 2013)

  17. #13
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Input RAM-disk:
    100,000,000 bytes, enwik8

    Ouput RAM-disk:
    19,970,859 bytes, 32.809 sec. - 34.086 sec., ZCM - 7
    20,509,197 bytes, 6.937 sec. - 4.161 sec., NanoZip - o
    21,982,479 bytes, 40.677 sec. - 27.597 sec., WinZpaq - 4
    22,431,159 bytes, 18.785 sec. - 16.378 sec., FreeArc - 5
    22,439,870 bytes, 2.848 sec. - 1.616 sec., Bsc - 0
    22,921,637 bytes, 17.284 sec. - 15.143 sec., FreeArc - 4
    23,118,264 bytes, 1.579 sec. - 2.906 sec., Bsc - 6
    23,585,000 bytes, 1.383 sec. - 2.651 sec., Bsc - 5
    23,977,259 bytes, 10.332 sec. - 10.604 sec., FreeArc - 3
    24,809,062 bytes, 1.047 sec. - 2.236 sec., Bsc - 4
    25,740,361 bytes, 27.453 sec. - 28.848 sec., ZCM - 0
    25,899,690 bytes, 66.321 sec. - 1.175 sec., 7-Zip - 5
    26,831,928 bytes, 3.400 sec. - 6.914 sec., FreeArc - 2
    27,515,715 bytes, 6.948 sec. - 1.092 sec., NanoZip - DP
    27,735,684 bytes, 6.567 sec. - 1.076 sec., NanoZip - D
    27,853,726 bytes, 1.080 sec. - 1.721 sec., Bsc - 3
    27,866,058 bytes, 6.426 sec. - 1.091 sec., NanoZip - Dp
    29,225,075 bytes, 38.142 sec. - 0.596 sec., WinRAR - 5
    29,329,336 bytes, 30.837 sec. - 0.585 sec., WinRAR - 4
    29,671,168 bytes, 19.491 sec. - 0.592 sec., WinRAR - 3
    30,509,207 bytes, 4.830 sec. - 0.952 sec., NanoZip - dP
    30,564,697 bytes, 9.635 sec. - 0.584 sec., WinRAR - 2
    30,650,787 bytes, 4.269 sec. - 0.957 sec., NanoZip - dp
    30,800,309 bytes, 16.320 sec. - 9.105 sec., WinZpaq - 3
    30,940,496 bytes, 3.337 sec. - 0.950 sec., NanoZip - d
    31,266,016 bytes, 2.097 sec. - 0.386 sec., lzturbo - 32
    32,203,391 bytes, 13.985 sec. - 1.363 sec., 7-Zip - 4
    32,461,829 bytes, 7.658 sec. - x.xxx sec., eXdupe - 3
    32,651,694 bytes, 11.657 sec. - 1.376 sec., 7-Zip - 3
    32,919,788 bytes, 86.783 sec. - 0.170 sec., lzturbo - 29
    33,975,840 bytes, 2.349 sec. - 0.961 sec., zlite
    33,975,840 bytes, 2.511 sec. - 0.937 sec., zlite64
    33,204,163 bytes, 8.880 sec. - 1.669 sec., WinZpaq - 2
    33,416,867 bytes, 8.029 sec. - 1.415 sec., 7-Zip - 2
    34,707,520 bytes, 5.917 sec. - 1.533 sec., 7-Zip - 1
    35,632,652 bytes, 0.834 sec. - 0.338 sec., lzturbo - 31
    35,756,965 bytes, 4.872 sec. - 1.576 sec., WinZpaq - 1
    39,136,924 bytes, 0.557 sec. - 0.343 sec., lzturbo - 30
    39,487,914 bytes, 1.096 sec. - 0.815 sec., FreeArc - 1
    39,541,490 bytes, 1.884 sec. - 0.151 sec., lzturbo - 22
    40,234,135 bytes, 1.777 sec. - 0.685 sec., WinRAR - 1
    40,608,757 bytes, 0.937 sec. - 1.262 sec., NanoZip - F
    41,929,879 bytes, 79.371 sec. - 0.082 sec., lzturbo - 19
    42,628,330 bytes, 0.690 sec. - 0.131 sec., lzturbo - 21
    43,799,314 bytes, 2.735 sec. - x.xxx sec., eXdupe - 2
    44,421,925 bytes, 1.950 sec. - 0.082 sec., lzturbo - 12
    46,381,757 bytes, 0.675 sec. - 0.808 sec., NanoZip - f
    47,296,540 bytes, 1.857 sec. - 0.238 sec., Qpress - 3
    47,619,485 bytes, 0.734 sec. - 0.088 sec., lzturbo - 11
    48,741,972 bytes, 0.675 sec. - 0.325 sec., Qpress - 2
    49,812,562 bytes, 0.396 sec. - 0.130 sec., lzturbo - 20
    53,236,008 bytes, 0.402 sec. - 0.084 sec., lzturbo - 10
    53,415,225 bytes, 0.350 sec. - 0.065 sec., LZ4 - 0
    54,147,930 bytes, 1.642 sec. - x.xxx sec., eXdupe - 1
    54,294,428 bytes, 0.400 sec. - 0.317 sec., Qpress - 1

    All 1 thread.
    Last edited by Sportman; 21st August 2013 at 23:47.

  18. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Added zlite to Silesia benchmark. http://mattmahoney.net/dc/silesia.html

  19. #15
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    OK!
    WCC 2013 benkmark passed !

  20. #16
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    another new compressor here: https://github.com/richox/zling
    based on zlite, but ROLZ is changed to order-1. zling provides better compression and faster decompression than zlite.
    Last edited by RichSelian; 1st November 2013 at 13:20.

  21. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  22. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    RichSelian (20th November 2013)

  23. #18
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

  24. #19
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Zling not work .. compiled with gcc 4.8.1
    You can put online a working version compiled by you!

  25. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    zling crashed during decompression (window pops up: this program is not working). Compiled with MinGW gcc 4.8.0 -O2 in 32 bit Vista.
    Code:
    C:\tmp\zling-20131120>timer32 .\zling.exe e \res\enwik8 enwik8.zling
    zling:
       light-weight lossless data compression utility
       by Zhang Li <zhangli10 at baidu.com>
    
     16.78 MB =>   3.78 MB 22.55%, 2.497 sec
     33.55 MB =>   7.55 MB 22.50%, 5.052 sec
     50.33 MB =>  11.34 MB 22.52%, 8.405 sec
     67.11 MB =>  15.12 MB 22.53%, 11.504 sec
     83.89 MB =>  18.89 MB 22.52%, 14.454 sec
    100.00 MB =>  22.53 MB 22.53%, 16.831 sec
    
    encode: 100000000 => 22530992, time=16.834 sec
            time_rolz:  15.157 sec
            time_polar: 0.870 sec
    
    Commit   =     24848 KB  =     25 MB
    Work Set =     25880 KB  =     26 MB
    
    Kernel Time  =     0.249 =    1%
    User Time    =    14.227 =   84%
    Process Time =    14.476 =   85%
    Global Time  =    16.851 =  100%
    
    C:\tmp\zling-20131120>.\zling.exe d enwik8.zling enwik8
    zling:
       light-weight lossless data compression utility
       by Zhang Li <zhangli10 at baidu.com>

Similar Threads

  1. pcompress, a deduplication/compression utility
    By moinakg in forum Data Compression
    Replies: 152
    Last Post: 5th March 2015, 15:29
  2. Lightweight BWT construction
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 15th July 2013, 19:03
  3. Archiver and compression utility suggestions
    By TheEmptyMind in forum Data Compression
    Replies: 4
    Last Post: 8th July 2013, 17:06
  4. Remote diff utility
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 6th September 2009, 15:37
  5. RZM - a dull ROLZ compression engine
    By Christian in forum Forum Archive
    Replies: 178
    Last Post: 1st May 2008, 21:26

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •