Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: ZPAQ self extracting archives

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    ZPAQ self extracting archives

    I wrote a stub for making zpaq self extracting archives and added it to the zpaq/unzpaq distribution. http://mattmahoney.net/dc/zpaq103b.zip

    To create an archive, copy the stub (zpaqsfx.exe) and use zpaq to append to it ("a" command), for example:

    Code:
      copy zpaqsfx.exe calgary.exe
      zpaq a calgary.exe calgary\*
    The stub adds about 15 KB. To extract, run the program with the "x" argument, as if extracting with zpaq or unzpaq. If run with no arguments, the extractor lists its contents and gives instructions to extract. I figured this is more "polite" than just filling up your directory with files without any warning. The "x" command works like unzpaq.

    Code:
      calgary  (shows contents)
      calgary x  (extracts all 14 files, doesn't clobber)
      calgary x file1 file2  (extracts 2 files and renames them, clobbers)
    You can store filename paths (zpaq ra) but this isn't recommended because the self extractor won't create directories. There is also no check for absolute paths and such, because, well, you are running a .exe and I'm assuming you trust the source

    (Maybe I'll write a separate program to extract from them, just in case you don't).

    The stub is a modified version of unzpaq 1.03 that looks for the archive in argv[0] instead of argv[2]. It that doesn't work, it will try adding ".exe" to the filename, but won't search your PATH. To find the start of the archive, it searches for a 16 byte random string (actually a 128 bit hash of it). When you compile zpaqsfx.cpp to zpaqsfx.exe, you have to append this string, which is in the 16 byte file zpaqsfx.tag. The program doesn't need to know its own size. It should work with different compilers, compiler options, and .exe packers. It will probably work in Linux (I'd be surprised if it didn't) but I have only tested it in Windows (Vista) and upx. It was created with g++ 4.4.0:

    Code:
      g++ -DNDEBUG -O2 -s -fomit-frame-pointer -march=pentiumpro zpaqsfx.cpp
      upx a.exe
      copy/b a.exe+zpaqsfx.tag zpaqsfx.exe
    Enjoy.

  2. #2
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello Dr.Matt

    Well I think it would be much great if it was GUI, and I think using pure WINAPI will result in an executable of much more smaller size (uncompressed) + the zunpaq binary size. Would you kindly allow others to give it a try (using same zunpaq source)? Secondly I would like to embed the compressed files and the list in resources of the executable rather than appending to the end of the file.

    Some quick test:

    zpaq a test.exe test\*

    test\!test.seq 3834740 -> 45161 -> 60537

    Memory utilization:
    0 icm 5: 24/2048 (1.17%)
    1 isse 13 0: 90/524288 (0.02%)
    2 isse 17 1: 316/8388608 (0.00%)
    3 isse 18 2: 1190/16777216 (0.01%)
    4 isse 18 3: 4656/16777216 (0.03%)
    5 isse 19 4: 18450/33554432 (0.05%)
    6 match 22 24: buffer=3834741/16777216 index=15519/4194304 (0.37%)
    7 mix 16 0 7 24 255: 503/458752 (0.11%)
    Used 15.77 seconds

    ZPAQSFX 1.03 self extracting archive. Contents:

    !test.seq 3834740 <- 60537

    To extract all files: test x
    To extract and rename: test x new_names...
    111.425 MB memory required to extract.

    regards

  3. #3
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by Scientist View Post
    Would you kindly allow others to give it a try (using same zunpaq source)?
    Matt released it under GPL, so there's no problem as long as you stay open-sourced.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I'd be interested in techniques to make the stub smaller. You can save about 2KB taking out the SHA1 code and a little more taking out some of the error handling. g++ compiles "int main(){return 0;}" to 3584 bytes after upx using -s (strip symbols) and usual speed optimizations. So there is some overhead in the runtime. There are better exe packers like upack but they trigger false alarms in virus detectors. A GUI might be nice but it wouldn't help with size.

  5. #5
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post

    I'd be interested in techniques to make the stub smaller. You can save about 2KB taking out the SHA1 code and a little more taking out some of the error handling. g++ compiles "int main(){return 0;}" to 3584 bytes after upx using -s (strip symbols) and usual speed optimizations. So there is some overhead in the runtime.
    Will surely give try to all possiblities. As far I looked at the code randomly inline int ZPAQL::execute() seems to be heart of the algorithm (I might be wrong), and it uses SHA1 class, when you say discarding this class, what are the exact replacements? Since code wont compile.

    There are better exe packers like upack but they trigger false alarms in virus detectors.
    Exactly - keeping this in mind, I planned to use executable resource rather than just append.

    Secondly, would you also be interested if entire ZPAQ was stranslated to C language, so that there are much more chances of reduction of the final binary as well as some speed improvements. Do you think porting ZPAQ to C will be good step. I will start working on it if you think it can produce better results.

    A GUI might be nice but it wouldn't help with size.
    It think it might, is worth a try.

    regards

  6. #6
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    As long as you don't use exceptions and virtual functions try:

    -fno-exceptions -fno-rtti

  7. #7
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    TRUE
    Yes, I tried it once and it reduces the size of g++ / mingw-g++ produced executable files. I am waiting for people to promote my idea of porting these best compressors / decompressors to C.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by toffer View Post
    As long as you don't use exceptions and virtual functions try:

    -fno-exceptions -fno-rtti
    Thanks. That works. It reduces the stub size to 14864.

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by Scientist View Post
    Will surely give try to all possiblities. As far I looked at the code randomly inline int ZPAQL::execute() seems to be heart of the algorithm (I might be wrong), and it uses SHA1 class, when you say discarding this class, what are the exact replacements? Since code wont compile.
    SHA1 is only used in a few places, like the OUT instruction, to compute the checksum of the output. It is possible to remove this, and the code in decompress() that verifies the checksum and just skip over those 20 bytes. The only purpose of the code is to print a warning if the checksum doesn't verify.

    Also converting to C should be possible and maybe not too hard. It uses only C library functions and headers. But I saw no reason to do it because we have good, portable C++ compilers and I find it easier to write code in. Mainly because I can organized the code into classes and declare variables when I first use them. I find the Array template more convenient than using pointer casts in C for each array. But you are welcome to do it.

  10. #10
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    SHA1 is only used in a few places, like the OUT instruction, to compute the checksum of the output. It is possible to remove this, and the code in decompress() that verifies the checksum and just skip over those 20 bytes. The only purpose of the code is to print a warning if the checksum doesn't verify.
    On it rightnow.

    Also converting to C should be possible and maybe not too hard. It uses only C library functions and headers. But I saw no reason to do it because we have good, portable C++ compilers and I find it easier to write code in. Mainly because I can organized the code into classes and declare variables when I first use them. I find the Array template more convenient than using pointer casts in C for each array. But you are welcome to do it.
    Sensible suggestion - thanks, btw templates finally make 3 types of classes, i.e. for types U8, U16, U32.



    Experimenting more with upx after finishing up with g++ and mingw32-g++

    upx a.exe --all-methods --all-filters

    NEW STUB SIZE
    without tag : 13824
    with 16b tag: 13840

    Attached is the stub executable.

    regards
    Attached Files Attached Files
    Last edited by Scientist; 15th September 2009 at 01:24.

  11. #11
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post

    Also: Choosing -O with g++ results in final:

    NEW STUB SIZE: 11792

    Attached is the stub executable.

    regards
    Attached Files Attached Files

  12. #12
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    After removing SHA1 class and its usage

    NEW STUB SIZE: 11280

    Attached is the stub executable.

    regards
    Attached Files Attached Files
    Last edited by Scientist; 15th September 2009 at 06:05.

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Thanks for testing. Here are some stub sizes (without the 16 byte tag) and times to extract the Calgary corpus with default compression using MINGW g++ 3.4.5 and g++ 4.4.0 with various optimizations.

    Code:
    g++ zpaqsfx.cpp -O2 -s -fomit-frame-pointer -march=pentiumpro -DNDEBUG -fno-exceptions -fno-rtti -o zpaqsfx.exe
    upx --all-methods --all-filters zpaqsfx.exe
    
    g++ 3.4.5      12288  31.6s
    g++ 3.4.5 -O   11776  18.2s
    g++ 3.4.5 -O2  13824  15.9s
    g++ 3.4.5 -O3  16384  15.7s
    g++ 3.4.5 -Os  11776  17.4s
    g++ 4.4.0      13824  28.3s
    g++ 4.4.0 -O   13312  13.0s
    g++ 4.4.0 -O2  14848  11.3s (what I'm using)
    g++ 4.4.0 -O3  17408  12.1s
    g++ 4.4.0 -Os  12800  13.4s
    Test machine: Gateway M-7301U laptop with 2.0 GHz dual core Pentium T3200 (1MB L2 cache), 3 GB RAM, Vista, 32 bit, MINGW g++.

  14. #14
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    g++ 4.4.0 with -O2 seems the best option, ofcourse the most important is the time required to decompress not few kbytes of stub, but I have doubt (I am not sure if I should ask), are the all stubs tested at cold-start?
    Last edited by Scientist; 17th September 2009 at 00:43.

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Some results with other compilers:

    Borland 5.5.1 bcc32 -6 -O -d 44032 31.4s
    Digital Mars 8.42n dm -6 -o 46080 25.2s

    These were the best results I could get in both size and speed. I was surprised how much g++ outperformed them.

    -6 means P6 or higher processor (as high as they go). -O in Borland means optimize jumps. -O1 (optimize for size) or -O2 (for speed) helps neither. -d removed duplicate strings. In Mars -o means optimize (helps both size and speed).

    Edit: Forgot -DNDEBUG Difference was not so big as I thought.

    bcc32 -6 -O -d -DNDEBUG 39424 16.9s
    dm -6 -o -DNDEBUG 42946 18.4s
    Last edited by Matt Mahoney; 17th September 2009 at 01:11.

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Each stub was compiled, packed, appended, and run once, if that's what you mean by cold start. I used process times (Timer 3.01).

  17. #17
    Member
    Join Date
    Sep 2009
    Location
    APITRC
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I was surprised how much g++ outperformed them.
    GCC remains kings - atlest for this self extractor.
    Dr.Matt try Intel C++, they claim best performing programs generated by their compiler. Last I tried Intel C++ was 10.xx I think and was trial version.

    As far DMC - atleast I do not use it, it has outdated libraries and is rarely updated these days. WIN32 API libraries didn't get updated with this compiler since ages, am not sure about all else.

    Thanks for update.

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I guess I could get the free Intel C++ for Linux, but I am too lazy to install Linux on my PC. I understand Intel is faster, but it cripples optimization on AMD and I would need to hack the CPUID check. (I think Shelwien figured out how to do that).

    (Also see my last edit. Forgot -DNDEBUG)

  19. #19
    Member
    Join Date
    Jan 2010
    Location
    at home
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question SFX + run file

    Hi, I was wondering how I could use this ZPAQ SFX approach with an included run command for after extraction is complete. That would be a dream come true.

  20. #20
    Member
    Join Date
    Jan 2010
    Location
    at home
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here is a little script I wrote to simplify the process of making the self-extracting ZPAQ archive with the use of the following included files:
    • ZPAQ 1.3 (CRC32: b06eafc6)
    • ZPAQSFX 1.3 stub build posted by Scientist [post #11] (CRC32: e54dc01e)




    README and full sourcecode are included in the archive.

    I hope this is useful for somebody.
    Attached Files Attached Files

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    You can also make self extracting archives by appending to a copy of zpaqsfx.exe. Download zpaqsfx v1.06 from http://mattmahoney.net/dc/
    Example:
    Code:
      copy zpaqsfx.exe calgary.exe
      zpaq oamax.cfg calgary.exe calgary\*
    "o" means fast compress (requires C++ compiler installed), "a" means append, and "max.cfg" is compression model (must be in current directory). You can append multiple times. Then when you run "calgary.exe" it will list its contents and give instructions to extract. For example "calgary.exe x" will extract contents to current directory.

    Unfortunately extraction is not optimized, so it is slower than extracting with "zpaq ox calgary.exe" and same speed as "zpaq x calgary.exe". Maybe I will add the capability to produce optimized SFX to the next version of zpaq.

    EDIT: I guess run command would be useful too.

  22. #22
    Member
    Join Date
    Jan 2010
    Location
    at home
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Unfortunately extraction is not optimized, so it is slower than extracting with "zpaq ox calgary.exe" and same speed as "zpaq x calgary.exe". Maybe I will add the capability to produce optimized SFX to the next version of zpaq.

    EDIT: I guess run command would be useful too.
    Hi Matt, I was just wondering if any steps have been taken toward this goal...

  23. #23
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Actually, no. I have been wasting my time writing a book on data compression instead. But it is still something I want to do.

    Also, I am considering a parallel extractor. Each block is compressed independently, so they could be extracted in separate threads.

  24. #24
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Also, I am considering a parallel extractor. Each block is compressed independently, so they could be extracted in separate threads.
    Does this mean that a model can't learn on a block that it doesn't encode?

    I was less asking a question I could look up in the spec, but more asking 'why not?'
    Last edited by willvarfar; 19th March 2010 at 11:08.

  25. #25
    Member elektronika's Avatar
    Join Date
    Mar 2010
    Location
    Indonesia
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Memory Usage

    Dear Matt,

    How much memory the ZPAQ takes? From what I read on the readme.txt there is a line like this :

    For
    example:

    zpaq cmax.cfg calgary.zpaq calgary\*

    will compress the Calgary corpus (14 files) as follows
    in 45 seconds on a 2 GHz Pentium T3200. The file names are
    stored in the archive as given on the command line.

    278.474 MB memory required.
    So it uses huge memory , is it true? Can I use zpaq for embedded application? (where only several MB Ram exists)..

  26. #26
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Memory usage depends on choice of model. Section 7 of http://mattmahoney.net/dc/zpaq1.pdf shows how to calculate memory usage.

  27. #27
    Member elektronika's Avatar
    Join Date
    Mar 2010
    Location
    Indonesia
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Memory usage depends on choice of model. Section 7 of http://mattmahoney.net/dc/zpaq1.pdf shows how to calculate memory usage.
    Is it about the decompressor? How about the compressor?

  28. #28
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The model uses the same memory for compression and decompression. It is possible for an external preprocessor to use different memory from the postprocessor. If you list an archive contents with zpaq l, it will show you how much memory is needed to decompress each block.

    The config files mid.cfg and max.cfg take an argument to control memory usage. Use a negative value to decrease. Some results for the Calgary corpus compressed with zpaq ocmid.cfg,N calgary\* (14 files) for N=0,-2,-4,-6,-8

    Code:
     0 699,474 111.425 MB
    -2 700,445  29.636 MB
    -4 708,711   9.189 MB
    -6 746,654   4.077 MB
    -8 841,521   2.799 MB
    Compression time is about 8 seconds on a 2 GHz T3200.

  29. #29
    Member elektronika's Avatar
    Join Date
    Mar 2010
    Location
    Indonesia
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The model uses the same memory for compression and decompression. It is possible for an external preprocessor to use different memory from the postprocessor. If you list an archive contents with zpaq l, it will show you how much memory is needed to decompress each block.

    The config files mid.cfg and max.cfg take an argument to control memory usage. Use a negative value to decrease. Some results for the Calgary corpus compressed with zpaq ocmid.cfg,N calgary\* (14 files) for N=0,-2,-4,-6,-8

    Code:
     0 699,474 111.425 MB
    -2 700,445  29.636 MB
    -4 708,711   9.189 MB
    -6 746,654   4.077 MB
    -8 841,521   2.799 MB
    Compression time is about 8 seconds on a 2 GHz T3200.
    Thank you Matt the min.cfg even only use around 4MB memory

  30. #30
    Member toi007's Avatar
    Join Date
    Jun 2011
    Location
    Lisbon
    Posts
    35
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ZPAQ self extracting archive that is an excelent idea because zpaq does rule now a days
    im going to test it!!
    thanks a lot sir matt

Page 1 of 2 12 LastLast

Similar Threads

  1. zpaq updates
    By Matt Mahoney in forum Data Compression
    Replies: 2527
    Last Post: 4th May 2019, 12:33
  2. MS CAB archives
    By nanoflooder in forum Data Compression
    Replies: 0
    Last Post: 10th April 2010, 00:58
  3. zpaq 1.02 update
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 10th July 2009, 00:55
  4. Multi-Volume Archives
    By osmanturan in forum Data Compression
    Replies: 12
    Last Post: 13th June 2009, 01:46
  5. GUI for creation of 7z-SFX-archives
    By Vacon in forum Forum Archive
    Replies: 0
    Last Post: 8th June 2007, 15:16

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •