Results 1 to 15 of 15

Thread: Fast Zlib compression

  1. #1
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post

    Fast Zlib compression

    Hi everybody,
    I'm a new user here. I want to offer my x86-assembly patch for Zlib compression function which provides 20-120% speedup with slightly better compression ratio.
    Library, test results and brief description are available here:
    http://www.gildor.org/en/projects/zlib
    Last edited by gildor; 21st November 2012 at 10:09.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    what is the license?

  3. #3
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    match.asm does faster:
    Code:
    H:\>timer minigzip.exe -9 20100202_dups.txt
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    
    Kernel Time  =     0.015 = 00:00:00.015 =   0%
    User Time    =     2.015 = 00:00:02.015 =  99%
    Process Time =     2.031 = 00:00:02.031 = 100%
    Global Time  =     2.031 = 00:00:02.031 = 100%
    
    H:\>timer minigzip2.exe -9 20100202_dups.txt
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    
    Kernel Time  =     0.046 = 00:00:00.046 =   4%
    User Time    =     0.937 = 00:00:00.937 =  95%
    Process Time =     0.984 = 00:00:00.984 = 100%
    Global Time  =     0.984 = 00:00:00.984 = 100%
    But it doesn't compress better from my test:
    Code:
    H:\>minigzip2.exe -8 20100202_dups.txt
    
    H:\>dir *.gz
     Volume in drive H has no label.
     Volume Serial Number is 20CD-37EF
    
     Directory of H:\
    
    15/03/2012  21:29         1,376,018 20100202_dups.txt.gz
                   1 File(s)      1,376,018 bytes
                   0 Dir(s)     665,772,032 bytes free
    
    H:\>minigzip.exe -8 20100202_dups.txt
    
    H:\>dir *.gz
     Volume in drive H has no label.
     Volume Serial Number is 20CD-37EF
    
     Directory of H:\
    
    15/03/2012  21:29         1,371,619 20100202_dups.txt.gz
                   1 File(s)      1,371,619 bytes
                   0 Dir(s)     665,513,984 bytes free
    Test data:
    http://roy.orz.hm/test/20100202_dups.txt.gz

    minigzip using zlib-1.2.5 with original match686.asm:
    http://roy.orz.hm/test/minigzip.exe

    minigzip using zlib-1.2.5 with fast_zlib.zip match.asm:
    http://roy.orz.hm/test/minigzip2.exe

    -9 level is patched using encode's lazy matching with 2 byte lookahead patch:
    http://roy.orz.hm/test/deflate_lm2.patch

  4. #4
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Quote Originally Posted by Bulat Ziganshin View Post
    what is the license?
    Free. If you need a "written license" any suggestions are welcome.

  5. #5
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Quote Originally Posted by roytam1 View Post
    minigzip using zlib-1.2.5 with original match686.asm:
    http://roy.orz.hm/test/minigzip.exe

    minigzip using zlib-1.2.5 with fast_zlib.zip match.asm:
    http://roy.orz.hm/test/minigzip2.exe
    I've compared these executables (after un-upx'ing) - these files are differs in 10 bytes in executable header, so you've did not compiled minizip with my patch.
    -9 level is patched using encode's lazy matching with 2 byte lookahead patch:
    http://roy.orz.hm/test/deflate_lm2.patch
    Did not seen that patch before, thanks for the information.

  6. #6
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by gildor View Post
    I've compared these executables (after un-upx'ing) - these files are differs in 10 bytes in executable header, so you've did not compiled minizip with my patch.

    Did not seen that patch before, thanks for the information.
    I did recompile both again, it is confirmed match.asm is used in minigzip2.exe, and producing same result.
    -8 level is untouched so it should reflect the performance of match finders:
    Code:
    H:\>timer minigzip2.exe -8 20100202_dups.txt
    Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
    
    Kernel Time  =     0.031 =    5%
    User Time    =     0.468 =   87%
    Process Time =     0.500 =   93%
    Global Time  =     0.532 =  100%
    
    H:\>ls.exe -l *.gz
    -rw-rw-rw-   1 user     group     1376018 Mar 16 15:39 20100202_dups.txt.gz
    
    H:\>timer minigzip.exe -8 20100202_dups.txt
    Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
    
    Kernel Time  =     0.015 =    1%
    User Time    =     0.921 =   98%
    Process Time =     0.937 =  100%
    Global Time  =     0.934 =  100%
    
    H:\>ls.exe -l *.gz
    -rw-rw-rw-   1 user     group     1371619 Mar 16 15:40 20100202_dups.txt.gz

  7. #7
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Could you post minigzip recompiled with my library?
    Edit: and please, if you compiled it with VisualC - please post pdb file too so I could verify which longest_match function is compiled in.
    Last edited by gildor; 16th March 2012 at 10:48.

  8. #8
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by gildor View Post
    Could you post minigzip recompiled with my library?
    Same URL as above. I did double checked it is using match.asm. Without match*.asm it fails in linking complaining the lost of longest_match and such.

  9. #9
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by gildor View Post
    Could you post minigzip recompiled with my library?
    Edit: and please, if you compiled it with VisualC - please post pdb file too so I could verify which longest_match function is compiled in.
    ICC 9.1 doesn't generate pdb for me (although I set /Zi and /DEBUG), it seems to be embedded. .map files are generated.
    http://roy.orz.hm/test/minigzip.exe
    http://roy.orz.hm/test/minigzip.map

    http://roy.orz.hm/test/minigzip2.exe
    http://roy.orz.hm/test/minigzip2.map

  10. #10
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Even I use the supplied zlibwapi.dll, same result returned:
    Code:
    E:\>cl -DZLIB_WINAPI minigzip.c zlibwapi.lib
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    minigzip.c
    Microsoft (R) Incremental Linker Version 10.00.40219.01
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    /out:minigzip.exe
    minigzip.obj
    zlibwapi.lib
    
    E:\>minigzip -8 20100202_dups.txt
    
    E:\>ls -l 2*.gz
    -rw-rw-rw-   1 user     group     1376018 Mar 16 16:42 20100202_dups.txt.gz

  11. #11
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Sorry, I've performed executable comparison with Microsoft's "comp.exe" utility which provides absolutely wrong result. I've compared them with a different tool and now I see that minigzip2.exe is really compiled with my matcher.

    I've performed some tests with your executables. My PC here is Pentium Dual-Core CPU E5700 @ 3.00 GHz, Windows 7 32 bit.

    Script for testing (I'm using GNU bash here, it has builtin "time" command):
    Code:
    #!/bin/bash
    
    # check(file, suffix, minigzip.exe)
    function check()
    {
        echo "Checking: compressing $1-$3 with $2 ..."
        cp tests/$1 $1-$3            # copy file (because minigzip will replace it with compressed version)
        time -p ./$2 -9 $1-$3        # compress
        ls -l $1-$3.gz
        echo
    }
    
    # check2(file)
    function check2()
    {
        echo "Original file is"
        ls -l tests/$1
        echo
        check $1 minigzip.exe orig
        check $1 minigzip2.exe fast
    }
    
    check2 src.txt
    check2 20100202_dups.txt
    check2 UDKGame.exe
    Files for testing:
    • src.txt, 77Mb, source code of the UDKGame.exe combined into the single file
    • 20100202_dups.txt, 4.7Mb, your directory listing from one of the posts above
    • UDKGame.exe, 66Mb, executable from Unreal Development Kit
    (sorry, I cannot provide these files, that's the first what I can get at my work for the quick test)

    Results:
    Code:
    Original file is
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 77129743 Dec 30 02:28 tests/src.txt
    
    Checking: compressing src.txt-orig with minigzip.exe ...
    real 8.73
    user 0.00
    sys 0.01
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 16748104 Mar 16 12:39 src.txt-orig.gz
    
    Checking: compressing src.txt-fast with minigzip2.exe ...
    real 5.94
    user 0.00
    sys 0.01
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 16748014 Mar 16 12:39 src.txt-fast.gz
    
    Original file is
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 4736230 Mar 16 11:42 tests/20100202_dups.txt
    
    Checking: compressing 20100202_dups.txt-orig with minigzip.exe ...
    real 1.87
    user 0.00
    sys 0.00
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 1359266 Mar 16 12:40 20100202_dups.txt-orig.gz
    
    Checking: compressing 20100202_dups.txt-fast with minigzip2.exe ...
    real 0.90
    user 0.00
    sys 0.01
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 1359266 Mar 16 12:40 20100202_dups.txt-fast.gz
    
    Original file is
    -rwxr-xr-x 1 Konstantin Nosov ?????????????? 66696544 Mar  1 21:12 tests/UDKGame.exe
    
    Checking: compressing UDKGame.exe-orig with minigzip.exe ...
    real 11.44
    user 0.00
    sys 0.01
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 22704811 Mar 16 12:40 UDKGame.exe-orig.gz
    
    Checking: compressing UDKGame.exe-fast with minigzip2.exe ...
    real 7.33
    user 0.00
    sys 0.00
    -rw-r--r-- 1 Konstantin Nosov ?????????????? 22704766 Mar 16 12:40 UDKGame.exe-fast.gz

  12. #12
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by gildor View Post
    Sorry, I've performed executable comparison with Microsoft's "comp.exe" utility which provides absolutely wrong result.
    If needed, another Microsoft utility - "fc" with "/b" switch - will do binary comparison of two files.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  13. #13
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Quote Originally Posted by Black_Fox View Post
    If needed, another Microsoft utility - "fc" with "/b" switch - will do binary comparison of two files.
    Thank you, I know. But I did not realized that MS's "comp.exe" performs text comparison (it's output looks like normal binary comparison - offset, byte 1 and byte 2)

  14. #14
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Gildor, nice to see you here

  15. #15
    Member gildor's Avatar
    Join Date
    Mar 2012
    Location
    Russia
    Posts
    7
    Thanks
    0
    Thanked 3 Times in 1 Post
    Hi,

    After long time without any news, this library got a major update.

    https://github.com/gildor2/fast_zlib

    The code was polished, fixed all found problems. Made C-version fully functional. It generates fully identical result to Asm version. C version is just a little bit slower than Asm version.
    Supported 32 and 64 bit platforms (64 bit via C). Added a new test application. Test results are here: https://github.com/gildor2/fast_zlib/wiki
    For those who wants to use this library but too lazy to build it - precompiled version is here: https://github.com/gildor2/fast_zlib/releases

  16. The Following 3 Users Say Thank You to gildor For This Useful Post:

    encode (21st February 2017),load (20th February 2017),nemequ (27th February 2017)

Similar Threads

  1. Zhuff - fast compression
    By Cyan in forum Data Compression
    Replies: 38
    Last Post: 5th February 2014, 11:27
  2. iz: New fast lossless RGB photo compression
    By cfeck in forum Data Compression
    Replies: 63
    Last Post: 4th December 2012, 12:21
  3. Replies: 23
    Last Post: 17th September 2011, 12:12
  4. Fast LZ compression
    By encode in forum Forum Archive
    Replies: 35
    Last Post: 25th April 2007, 01:35
  5. Fast arithcoder for compression of LZ77 output
    By Bulat Ziganshin in forum Forum Archive
    Replies: 13
    Last Post: 15th April 2007, 17:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •