Results 1 to 24 of 24

Thread: NEW - plzip - a massively parallel OPEN-SOURCE-compressor based on LZMA

  1. #1
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts

    plzip 1.4 - a massively parallel OPEN-SOURCE-compressor based on LZMA/lzlib

    anyone knows this new plzip - program ?

    but i can not find a windows 32 binary ...

    ---
    http://www.nongnu.org/lzip/plzip.html

    Plzip uses the lzip file format;
    the files produced by plzip are fully compatible with lzip-1.4 or newer.

    http://www.nongnu.org/lzip/lzip.html

    Plzip is intended for faster compression/decompression of big files on multiprocessor machines, which makes it specially well suited for distribution of big software files and large scale data archiving. On files big enough, plzip can use hundreds of processors.
    ---

    it seems interesting ...

    can someone here please build a win32 binary ?


    best regards

    Joerg

    CHANGELOG:

    2015-07-09 Antonio Diaz Diaz <antonio@gnu.org>

    * Version 1.4 released.
    * Option '-0' now uses the fast encoder of lzlib 1.7.

    2015-01-22 Antonio Diaz Diaz <antonio@gnu.org>

    * Version 1.3 released.
    * dec_stream.cc: Do not use output packets or muxer when testing.
    * Make '-dvvv' and '-tvvv' show dictionary size like lzip.
    * lzip.h: Added missing 'const' to the declaration of 'compress'.
    * Added chapters 'Memory requirements' and 'Minimum file sizes'
    to manual.
    * Makefile.in: Added new targets 'install*-compress'.

    2014-08-29 Antonio Diaz Diaz <antonio@gnu.org>

    * Version 1.2 released.
    * main.cc (close_and_set_permissions): Behave like 'cp -p'.
    * dec_stdout.cc dec_stream.cc: Make 'slot_av' a vector to limit
    the number of packets produced by each worker individually.
    * plzip.texinfo: Renamed to plzip.texi.
    * plzip.texi: Documented the approximate amount of memory required.
    * License changed to GPL version 2 or later.
    Attached Files Attached Files
    Last edited by joerg; 28th October 2015 at 21:17.

  2. The Following User Says Thank You to joerg For This Useful Post:

    Simorq (9th March 2018)

  3. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    515
    Thanks
    182
    Thanked 163 Times in 71 Posts
    Quote Originally Posted by joerg View Post
    can someone here please build a win32 binary ?
    There's a Win32 binary available for version 1.8 (link).

    I also tried to compile 1.10-rc2 using G++ 3.4.5 with this command line:

    Code:
    g++ -O2 -Os -s -DPROGVERSION=\"1.10-rc2\" decoder.cc encoder.cc lziprecover.cc arg_parser.cc main.cc
    It almost works, but some errors appear (f.e. S_IRGRP, fchmod, fchown). Guess these can be fixed either by changing some of the code or using a newer/other compiler.
    http://schnaader.info
    Damn kids. They're all alike.

  4. #3
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    @schnaader: thank you very much for your post

    (http://mirrors.zerg.biz/nongnu/lzip/....8-win32-1.zip)

    but it seems to be lzip ..

    i am searching for a binary of plzip = "parallel-lzip"

    maybe with using of "pthreadGC2.dll" ?

    best regards

    Joerg

  5. #4
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    515
    Thanks
    182
    Thanked 163 Times in 71 Posts
    Quote Originally Posted by joerg View Post
    but it seems to be lzip ..

    i am searching for a binary of plzip = "parallel-lzip"
    Ah, I see, my fault. There are no binaries for plzip, just one for lzip. I'll try building plzip later.
    http://schnaader.info
    Damn kids. They're all alike.

  6. #5
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    515
    Thanks
    182
    Thanked 163 Times in 71 Posts
    Sorry for the late reply, but I just checked my e-mails of the last month and saw that I missed one from Michael Ortmann who sent a win32 binary of plzip 0.5 (based on lzip 0.9). It includes the binary, pthreadGC2.dll and two source code diffs.

    So, thanks Michael! Joerg, I hope this helps
    Attached Files Attached Files
    http://schnaader.info
    Damn kids. They're all alike.

  7. The Following User Says Thank You to schnaader For This Useful Post:

    Simorq (9th March 2018)

  8. #6
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    there is a new version 0.8 (18-JAN-2012):

    http://download.savannah.gnu.org/rel...zip-0.8.tar.gz

    a windows - binary does not exist for now

    @schaader:

    thanks for publishing the windows-binary of version 0.5
    would it be possible to do an update of the windows-binary to version 0.8 ?

    best regards
    ---
    Plzip is a massively parallel (multi-threaded), lossless data compressor
    based on the lzlib compression library, with very safe integrity
    checking and a user interface similar to the one of bzip2, gzip or lzip.
    Plzip uses the lzip file format; the files produced by plzip are fully
    compatible with lzip-1.4 or newer, and can be rescued with lziprecover.

    Plzip is intended for faster compression/decompression of big files
    on multiprocessor machines, which makes it specially well suited for
    distribution of big software files and large scale data archiving. On
    files big enough, plzip can use hundreds of processors.

    ---
    changelog plzip - Antonio Diaz Diaz <ant_diaz@teleline.es>
    ---
    2012-01-17 - Version 0.8

    * main.cc: Added new option '-F, --recompress'.
    * decompress.cc (decompress): Show compression ratio.
    * main.cc (close_and_set_permissions): Inability to change output
    file attributes has been downgraded from error to warning.
    * Small change in '--help' output and man page.
    * Changed quote characters in messages as advised by GNU Standards.
    * main.cc: Set stdin/stdout in binary mode on OS2.
    * compress.cc: Reduce memory use of compressed packets.
    * decompress.cc: Use Boyer-Moore algorithm to search for headers.

    2010-12-03 - Version 0.7

    * Match length limits set by options -1 to -9 have been changed
    to match those of lzip 1.11.
    * decompress.cc: A limit has been set on the number of packets
    produced by workers to limit the amount of memory used.
    * main.cc (open_instream): Do not show the message
    " and '--stdout' was not specified" for directories, etc.
    * main.cc: Fixed warning about fchown return value being ignored.
    * testsuite: 'test1' renamed to 'test.txt'. Added new tests.

    2010-03-20 - Version 0.6

    * Small portability fixes.
    * Added chapter 'Program Design' and description of option
    '--threads' to manual.
    * Debug stats have been fixed.
    Last edited by joerg; 27th January 2012 at 12:32. Reason: changelog added

  9. #7
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    515
    Thanks
    182
    Thanked 163 Times in 71 Posts
    Quote Originally Posted by joerg View Post
    @schaader:

    thanks for publishing the windows-binary of version 0.5
    would it be possible to do an update of the windows-binary to version 0.8 ?
    I tried to apply the diffs from Michael (see my post above) to version 0.8 and compiling it together with the newest lzlib (1.3) - but though the compiler errors go away it gives a ton of linker errors and I don't know how to fix them. Perhaps some of the others can have a look at it and try to compile it.
    http://schnaader.info
    Damn kids. They're all alike.

  10. #8
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    @schaader:
    thank you very much for trying to build a new binary

    best regards

  11. #9
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    now the new version 1.0 for plzip is available at

    http://download.savannah.gnu.org/rel...zip-1.0.tar.lz

    but for now there is not a windows binary ...

  12. #10
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    a windows binary for the version 1.1 can be downloaded from

    http://download.savannah.gnu.org/rel...lzip-1.1-w.zip
    Last edited by joerg; 28th October 2015 at 21:21.

  13. #11
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    there is a new release: plzip 1.5 (parallel lzip)
    ---
    released source code: http://download.savannah.gnu.org/rel...zip-1.5.tar.gz
    windows binary for testing: http://download.savannah.gnu.org/rel...c2.w32-w64.zip
    ---
    manual for plzip: http://www.nongnu.org/lzip/manual/plzip_manual.html
    ---
    whats new: 2016-05-14 Antonio Diaz Diaz <antonio@gnu.org>

    * Version 1.5 released.
    * main.cc: Added new option '-a, --trailing-error'.
    * main.cc (main): Delete '--output' file if infd is a terminal.
    * main.cc (main): Don't use stdin more than once.
    * lzip.texi: Added chapters 'Trailing data' and 'Examples'.
    * configure: Avoid warning on some shells when testing for g++.
    * Makefile.in: Detect the existence of install-info.
    * testsuite/check.sh: A POSIX shell is required to run the tests.
    * testsuite/check.sh: Don't check error messages.
    ---
    first run (best compression):

    plzip -9 -k ..\dta\backup.dat

    produces a file ..\dta\backup.dat.lz

    backup.dat has 64.442.763.264 bytes
    backup.dat.lz has 6.400.767.782 bytes

    backup-7-old.7z has 6.410.398.144 bytes (for referenz: 7zip 15.x resulting archivefile)

    plzip 1.5 runs 270 minutes on a 4-core-AMD-Processor


    @Matt Mahoney: can you please test this new program within your benchmark ?

    best regards

  14. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts

  15. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    joerg (1st July 2016)

  16. #13
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    @matt_mahoney: thank you very much for your wonderful test!
    -
    in the silesia-test the new plzip 1.5 beats the 7zip slightly in size of the compressed archivfile
    and plzip do faster compression and do faster decompression ...

    ---
    48335267 2828 13327 2746 1441 2424 2843 1313 3727 4413 8361 4475 432 plzip 1.5 -9
    -
    48792760 2830 13366 2749 1738 2426 2849 1317 3764 4423 8384 4486 454 7zip -mx=9
    ---

    i think if you test with "7zip -mx=7 -mmt1" maybe 7zip produces an even smaller file on the cost of more time
    but i think the plzip better deals with many cores ...

    best regards

  17. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Not a big difference. I didn't time it since the benchmark is size only.
    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
    --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------
     48792760  2830 13366 2749  1738 2426  2849 1317  3764 4423  8384  4486  454 7zip -mx=9
     48796550  2830 13371 2750  1739 2427  2850 1317  3764 4416  8386  4488  453 7zip -mx=7 -mmt1
     48797240  2830 13373 2750  1739 2427  2850 1317  3764 4416  8386  4488  453 7zip -mx=7

  18. #15
    Member przemoc's Avatar
    Join Date
    Aug 2011
    Location
    Poland
    Posts
    44
    Thanks
    3
    Thanked 23 Times in 13 Posts
    I've built recent plzip 1.7 for Windows upon request by joerg:
    http://binaries.przemoc.net/#plzip

  19. The Following 5 Users Say Thank You to przemoc For This Useful Post:

    Darek (24th February 2018),joerg (26th February 2018),load (24th February 2018),Simorq (9th March 2018),Stephan Busch (24th February 2018)

  20. #16
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    408
    Thanks
    36
    Thanked 60 Times in 37 Posts
    @przemoc: thank you very much for building a windows binary

    in a first test it works well

    best regards

  21. #17
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    This mod will utilize full cores when decompressing from/to <stdio>, similar to 4x4. Original plzip have big limitations here. You can read more about it here:
    http://fileforums.com/showthread.php?t=101534
    Exe is 64bit, plzip is 1.7 and lzlib 1.9. Also please, as I mentioned there I would love to implement passing of uncompresseable chunks like 4x4 does, but for now I lack knowledge in compression coding. Maybe some quick example of code(or lib implementation) on what "order 0" fast compression may look like would be very handy. I intend to make plzip full 64bit replacement of 4x4:lzma in FA, if I can that is.

    EDIT: new attached file further below
    Last edited by elit; 22nd March 2018 at 21:46.

  22. The Following User Says Thank You to elit For This Useful Post:

    load (2nd March 2018)

  23. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,483
    Thanks
    719
    Thanked 653 Times in 349 Posts
    i don't undersatnd what it is your problem? i measure order-0 entropy to detect incompressible data. this works good enough for practical purposes, althiough you can go further using quasi-o1 entropy tester from zpaq or fast lz engine (f.e. from lz4) limited to even positions

    each block in 4x4 output is preceded by one extra byte: 1 for compressed data, 0 for stored block

  24. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    elit (2nd March 2018)

  25. #19
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    Mr. Ziganshin, first of all its a great honor to finally speak with you personally here on the forum, also greetings to all others honorable members.

    Now to the topic, basically my current problem is lack of general understanding regarding terms(names) and implementations concerning compression. Even though I can code in C/C++.
    Specifically in this case, I think I have idea what "entropy" may be, but no idea what "order 0-n" would mean for example.

    For that reason, assuming if code itself would be up to few lines at most, I thought simple example in C would make me understand better and faster. With that said though, your info about lz4 having something called "o1-entropy tester" is already *very* helpful because I am slightly familiar with lz4 already(I modded its compressor to pack correctly certain game files to 1:1 crc match of original in data pack, including additionally switching of position of block's crc from end to beginning). So I will definitely look at it again and try to find that entropy tester somewhere in the code and if I can figure out, implement it in plzip. It also seems from your words that o1 in lz4 may be even better than o0. So thank you for the tip.

    Also when you say "each block in 4x4.." do you mean full real block, like if example 4x4:b32m would mean you actually test full block of 32m whether to compress or not? Because I though you would only test on small chunks of say 64k *within* full block and decide whether or not to compress each small piece individually?

  26. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,483
    Thanks
    719
    Thanked 653 Times in 349 Posts
    so, you need to learn how to detect incompressible data. first, i suggest you to read some book on Compression, in particular http://mattmahoney.net/dc/dce.html is online & free

    second - the usual way to detect incompressible data is to check whether they are compressible by some quick-and-dirty algo. Order-0 compression, in particular, is compression of individual chars with huffman/arithmetic compressor. DataSmoke was my attempt in this direction, which mostly failed

    You can use order-0 entropy checker from this project however: https://github.com/Bulat-Ziganshin/D.../smoke.cpp#L76

    Quasi-order1 entropy checker is in zpaq code, i will look into extracting it for you. It may be more reliable than order-0 approach

    Finally, one can use LZ model for the same needs. And since LZ4 is fastest LZ compressor, you can build such algo based on LZ4 sources. But I suggest you to postpone that until you will try with existing algorithms. There is nothing order-1 or datatype detection related in LZ4

    UPDATE: this is zpaq quasi-o1 code, you can find it in zpaq sources by 314159265 constant:
          int c=0;  // current byte
    int c1=0; // previous byte
    unsigned char o1[256]={0};
    unsigned hits=0;
    while (true) {
    c=in.get();
    if (c!=EOF) {
    if (c==o1[c1]) ++hits;
    o1[c1]=c;
    c1=c;
    }
    else break;
    }


    at the end of cycle hits ~= insize/256 if data are incompressible

  27. The Following 3 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    avitar (9th March 2018),elit (3rd March 2018),Simorq (9th March 2018)

  28. #21
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    Somehow "hits ~= insize/256" did not matched my results. I got numbers out of place not reflecting true entropy. So I did this:
    (100*(packet->size - hits)) / packet->size

    This seems to reflect correctly gain in percentage of each packet - aka block between 0-100%, where lower is better, but just to be sure is it proper math for this or am I missing something?

    Click image for larger version. 

Name:	entropy.png 
Views:	66 
Size:	31.6 KB 
ID:	5831

  29. #22
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    Ok so finally after weeks of researching plzip code I made in a stable entropy check(and raw data skip/copy upon certain percentage limit):
    Click image for larger version. 

Name:	entropy.png 
Views:	87 
Size:	96.3 KB 
ID:	5847

    It seems stable now but I need to make it 100% before releasing it.
    I also like to add more options to user like lzma's mc option at least, also to chose decompression memory limits/slots directly by user to control mem usage, and entropy limit.

    The thing I want to ask is, since I changed header and this is not anymore compatible with plzip, is it appropriate to give it another name? Basically fork it? I dont think "plzip_mod" is anymore right, but I dont want to offend original author's and have no experience with publicly releasing work yet. License and available open source would remain same, but any reference to plzip/lzip name would change. Of course I would refer to original project and also entropy check is from zpaq which would be mentioned in some sort of readme at least.

    But, I need to be sure if this is ok practice or there is something I should know before as I want to respect original contributors. So any insights regarding this are welcome. When released, special thanks would be mentioned, at least to original authors of plzip, Matt Mahoney for zpaq's entropy code and Bulat for all help he provided. But still let me know, i dont want to be perceived as someone stealing others code and taking credits as I saw something similar happened here before. Thanks everyone!

    PS(if you are curious about naming, I was thinking about "xnlz" or "xnlzma" which was inspired by 4x4:lzma = NxN:lzma = (xnlzma or xnlz) )

  30. #23
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    Ok guys so this is one with finished bad entropy check/compression skipping feature, set at 97.9% currently. Anything above is copy. Raw data are still CRC checked. This part is considered stable by me now so let me know if you find anything. When I release new one it should be newly named compressor/fork, I will ask original author about it.

    EDIT: re-uploaded file now include source code to be in sync with open source license

    Attached Files Attached Files
    Last edited by elit; 22nd March 2018 at 21:45.

  31. #24
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    28
    Thanks
    16
    Thanked 15 Times in 7 Posts
    Bulat if I may ask you, how does FreeArc handle <stdio> stream flow with external compressors when user cancel process? Many external archivers including plzip keep hanging in processes and never quit properly unless you manually force them through task manager. I would like to make it seamless so that if user cancel compression(through Freearc GUI), plzip also quit properly.

    Right now I am not sure yet if this happen when external compressor try to read data from stdin or if it try to write back to FA through stdout, but I suspect later case. Is there any sort of communication, special byte, end of stream mark or pattern that FA send to external program if it detect that user canceled operation?

    EDIT: It may have been cygwin issue, now compiled under mingw so far it always quit(in few tests I did)
    Last edited by elit; 25th March 2018 at 21:33.

Similar Threads

  1. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 17:47
  2. LZMA source
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 29th March 2010, 18:45
  3. Implementation of JPEG2000 LLC based on LZMA
    By Raymond_NGhM in forum Data Compression
    Replies: 0
    Last Post: 19th March 2010, 01:14
  4. PeaZip - open source archiver
    By squxe in forum Data Compression
    Replies: 1
    Last Post: 3rd December 2009, 22:01
  5. New fast open-source paq-based jpeg compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 13th September 2007, 13:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •