Results 1 to 23 of 23

Thread: pzpaq Parallel ZPAQ compressor released

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    pzpaq Parallel ZPAQ compressor released

    http://mattmahoney.net/dc/zpaq.html

    I posted an initial, experimental release of pzpaq for Windows and Linux. It compresses files in parallel on multiple threads. If necessary it splits files into independent blocks, losing some compression and costing memory to get better speed. It will use parallel decompression if decompressing more than one file or a file with more than one block. It has options to select block size, but by default it divides the input evenly among threads. The default is 2 threads but you can select any number. Some calgary.tar benchmarks on a dual core 2 GHz T3200 under Linux (real times).

    Code:
        Program        Size     Compress Decompress   Memory
        -------      ---------  -------- ----------  --------
        Uncompressed 3,152,896
        compress     1,325,806    0.2 sec  0.05 sec  <1    <1 MB
        gzip -9      1,022,810    0.5      0.05      <1    <1
        bzip2          860,097    0.6      0.3        7     4
        pbzip2         857,292    0.4      0.3       25    16
        p7zip          824,573    1.5      0.1      192    18
        pzpaq -1 -t1   806,972    2.4      2.5       38    38
        pzpaq -1 -t2   818,007    1.5      1.6       76    76
        pzpaq -2 -t1   699,191    7.4      7.5      112   112
        pzpaq -2 -t2   708,526    4.6      4.8      224   224
        pzpaq -3 -t1   644,203   19.6     19.8      247   247
        pzpaq -3 -t2   653,100   13.2     13.5      494   494
    The interface is like compress, gzip, bzip2, pbzip2, etc.

    pzpaq * (compress, replace * with *.zpaq)
    pzpaq -d *.zpaq (decompress)

    With no filename arguments it reads from stdin to stdout. You can also create and extract archives:

    pzpaq -c * >archive.zpaq (compress)
    pzpaq -x archive.zpaq (extract to original directory)
    pzpaq -e archive.zpaq (extract to current directory)

    This will create a solid archive with 2 blocks (with default -t2 threads), usually better than single files. You can also change the block size (or no blocks) and number of threads. Files and archives are compatible with zpipe, zpaq, unzpaq, etc. File compression produces a single file archive with 1 or more blocks and saves the filename. You can concatenate .zpaq files as usual.

    The Windows version won't work on files over 2 GB. The Linux version should but have not tested yet. There are probably other bugs. Expect updates.

    I included a Windows exe and dll from pthreads-win32 that you need to run it. To compile, see the source instructions. You will need libzpaq and for windows other files from pthreads-win32.

    Don't run with no args. Type "pzpaq -h" for quick help.

  2. #2
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Nice improvement till 3 threads, at -3 above 2 threads results become worst.

    Same config as:
    http://encode.ru/threads/586-bsc-new...mpressor/page5

    Input file html/text 807,821,312 bytes

    zpaq c1 5:19.437 93,310,722 bytes

    pzpaq -1 -k -t1 05:19.281 93,310,738 bytes
    pzpaq -1 -k -t2 02:50.266 93,539,196 bytes
    pzpaq -1 -k -t3 01:56.297 93,797,309 bytes
    pzpaq -1 -k -t4 01:28.187 94,042,938 bytes


    zpaq c2 16:11.188 33,172,689 bytes

    pzpaq -2 -k -t1 16:00.750 33,172,705 bytes
    pzpaq -2 -k -t2 08:16.906 33,519,799 bytes
    pzpaq -2 -k -t3 05:34.125 33,878,441 bytes
    pzpaq -2 -k -t4 04:15.157 34,221,391 bytes


    zpaq c3 41:10.910 28,750,140 bytes

    pzpaq -3 -k -t1 41:21.296 28,750,156 bytes
    pzpaq -3 -k -t2 20:51.063 29,227,543 bytes
    pzpaq -3 -k -t3 27:36.203 29,700,782 bytes
    pzpaq -3 -k -t4 21:00.047 30,135,759 bytes

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Thanks for testing. pzpaq limits memory by default to -m500 = 500 MB which only allows 2 threads to run even if you specify -t4 because each thread at -3 uses 247 MB. If you use -m1000 it will allow all 4 threads to run. I wasn't seeing linear speedup on i7 M620. It is dual core with hyperthreading to allow 4 parallel threads. I think in the next version I will try to detect number of cores and available memory.

  4. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Much better now:

    pzpaq -1 -k -t1 -m1000 05:19.797 93,310,738 bytes
    pzpaq -1 -k -t2 -m1000 02:50.188 93,539,196 bytes
    pzpaq -1 -k -t3 -m1000 01:55.343 93,797,309 bytes
    pzpaq -1 -k -t4 -m1000 01:29.437 94,042,938 bytes

    pzpaq -2 -k -t1 -m1000 16:03.172 33,172,705 bytes
    pzpaq -2 -k -t2 -m1000 08:21.328 33,519,799 bytes
    pzpaq -2 -k -t3 -m1000 05:35.750 33,878,441 bytes
    pzpaq -2 -k -t4 -m1000 04:13.500 34,221,391 bytes

    pzpaq -3 -k -t1 -m1000 41:00.141 28,750,156 bytes
    pzpaq -3 -k -t2 -m1000 20:49.468 29,227,543 bytes
    pzpaq -3 -k -t3 -m1000 14:06.266 29,700,782 bytes
    pzpaq -3 -k -t4 -m1000 10:40.062 30,135,759 bytes

    Yes hyper-threading is not always helpful, special not when using a single multi-core CPU intensive application, mostly I switch it off in BIOS.

    I'm happy you finally found a way to make PAQ compression multi-core because it's the best replacement for ZIP in the future when CPU power increase an other 40-200 times. For now it hurt compression size a little but it speed up compression time so much that more CPU intensive compression models can be applied and still be quicker with better compression size results.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I added some test results for enwik8/9. http://mattmahoney.net/dc/text.html#1659

    I listed pzpaq as a separate program from zpaq that just happens to compress to the same format, kind of like zip and kzip are different programs.

  6. #6
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    I did the same test at a slower system but with 8 cores:

    pzpaq -3 -k -t1 -m1000 48:28.203 28,750,167
    pzpaq -3 -k -t2 -m1000 24:42.031 29,227,554
    pzpaq -3 -k -t3 -m1000 16:53.312 29,700,793
    pzpaq -3 -k -t4 -m1000 12:42.672 30,135,770

    pzpaq -3 -k -t5 -m1000 19:39.031 30,553,061
    pzpaq -3 -k -t6 -m1000 16:31.766 30,944,111
    pzpaq -3 -k -t7 -m1000 14:22.563 31,365,208
    pzpaq -3 -k -t8 -m1000 12:42.282 31,700,995

    pzpaq -3 -k -t5 -m2000 10:16.922 30,553,061
    pzpaq -3 -k -t6 -m2000 08:46.718 30,944,111
    pzpaq -3 -k -t7 -m2000 07:27.203 31,365,208
    pzpaq -3 -k -t8 -m2000 06:39.313 31,700,995

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Nice, that's almost linear speedup.

    I added a few more results for enwik9 to http://mattmahoney.net/dc/text.html#1659 after an overnight run. Not much improvement in speed over -t2 on either machine. (Hyperthreading doesn't help much). Also, using a block size more than a few times larger than the memory requirement for the model doesn't help much with compression.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    pzpaq v0.02 should now handle large files over 2 GB in Windows (as well as Linux). http://mattmahoney.net/dc/zpaq.html#pzpaq

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    pzpaq v0.03 now speeds up decompression of archives compressed with non-default levels by other programs (like zpaq) by translating the block headers to C++ and recompiling itself (like zpaq). This is done automatically by a compile time option so there is no change to the command line options. To enable this option you need a g++ compiler installed (does not work with others), so it is turned off in the pre-compiled Windows pzpaq.exe. (You won't see any difference from v0.02). To enable this option, you can recompile the source with -DOPT="..." where the string is a command to recompile itself. This is explained in the pzpaq.cpp source comments. There is no change for archives compressed with built in levels -1, -2, or -3 since the optimization code is already built in.

    This change should make pzpaq the fastest ZPAQ decompresser since unlike zpaq it also uses multiple cores if the input has multiple blocks.

    There is no corresponding acceleration for compression because all of the built in compression levels are already optimized and pzpaq doesn't take config files.

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    pzpaq v0.04 now handles threads natively in Windows so you no longer need pthreads-win32. This simplifies installation. The Windows version will also now compile with either g++ or Visual C++ but I recommend g++ if you want to use wildcards. Both support binary mode stdin/stdout however.

    The download includes pzpaq.exe compiled with g++ without optimized decompression of nonstandard compression levels. If you want that you need to have (preferably) g++ installed and recompile according to directions in the source code. But either way you no longer need any .dll files. But if you really want to use pthreads-win32 you can compile with -DPTHREAD. The only reason you might want to do that is if you find any bugs or you want to use more than 64 threads in Windows. Windows has a function that lets you wait on up to 64 threads at once so I used that instead of condition variables to signal when a thread was done. Windows supports condition variables only starting with Vista, but the code I used goes back to Windows 2000 Pro.

    Future plans:
    - Models for text, .exe, .bmp, .wav, .jpeg, etc. with auto detection. Speed probably around -2.
    - -r option to recurse directories.
    - Extraction will create directories as needed (like unzip, etc).
    - Maybe someday a JIT compiler instead of external g++.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    http://mattmahoney.net/dc/zpaq.html#pzpaq

    pzpaq 0.05 puts all temporary files in the temporary directory ($TMPDIR, %TEMP%, or /tmp) so you no longer have to worry that a temporary file might clobber something you want to keep. There is no longer a -s option to name the temp file suffix. Files are named /tmp/pzpaqtmpPID_N (depending on the temp directory). I also updated the documentation (mostly the NOTES section) and added install scripts for Windows and Linux and a readme file on how to use them. The scripts set up pzpaq to optimize decompression of nonstandard compression levels produced by zpaq using g++ to compile the ZPAQL code (like zpaq). I also updated it so that if the compile fails, it falls back to interpreted (slower) decompression instead of giving up.

    The included pzpaq.exe lacks this optimization which is good enough for most Windows users that don't want to install g++.

  12. #12
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Matt Mahoney
    Speed probably around -2
    Do you mean speed will be 2 times lower?
    Quote Originally Posted by Matt Mahoney View Post
    Code:
        Program        Size     Compress Decompress   Memory
        -------      ---------  -------- ----------  --------
        Uncompressed 3,152,896
        ....
         p7zip          824,573    1.5      0.1      192    18
        pzpaq -1 -t2   818,007    1.5      1.6       76    76
        pzpaq -2 -t2   708,526    4.6      4.8      224   224
    If there's little gain in compression quality, compression speed is approximately the same, while decompression speed is only 16 times slower.
    If there's a significant gain in quality, compression speed is roughly 3 times slower, but decompression speed is 48 times worse...
    Looks like pzpaq should be used for data that will hardly ever be decompressed.
    Quote Originally Posted by Matt Mahoney
    - Models for text, .exe, .bmp, .wav, .jpeg, etc. with auto detection.
    Also .jp2k, mp3, pdf, avi, .DOCs with images. Good luck...
    Last edited by Alexander Rhatushnyak; 11th February 2011 at 20:24.

  13. #13
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Speed probably around -2.
    Quote Originally Posted by Alexander Rhatushnyak View Post
    Do you mean speed will be 2 times lower?
    That means speed around current speed of "pzpaq -2"
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  14. #14
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Black_Fox View Post
    That means speed around current speed of "pzpaq -2"
    regardless of memory option, for any -N ?

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yes by "-2" I meant moderate speed using 6-8 components. One purpose of pzpaq is to allow fast decompression of archives produced by other programs that might specialize on one file type. 7zip is fast but you could not introduce new algorithms or improve compression without breaking compatibility with older versions. Max compression (level 3) combines several different models (order-n, text, sparse) but you could achieve the same compression at level 2 speed by using data dependent models.

    Specialized models have the #1 spot on the Kodak .bmp set, the generic benchmark, and pi.txt. I know they aren't fast, but if you want fast then disk is cheap enough to not even bother with compression most of the time.

  16. #16
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Specialized models have the #1 spot on the Kodak .bmp set, the generic benchmark, and pi.txt.
    I think all three have little in common with what people usually compress. Kodak images are ex-PCDs, i.e. like ex-JPEGs were compressed with losses. Compressing pi is like compressing a file containing [0 1 2 0 1 2 0 1 2 0 1 2 ... ]. The model for files used in the generic benchmark is quite special, afair.

    Where is pzpaq on the Pareto frontier for enwik9, MFC from www.maximumcompression.com, images from www.imagecompression.info? That's what it would be nice to see.

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    That would take work

    The problem with many high end formats is that each new version breaks compatibility with the last. There are over 160 versions of PAQ. Also many formats are closed source like durilca, ppmonstr, nanozip, WinRK, slim, hook, bcm, etc. Who knows if you will be able to find a decompresser to read your compressed files years from now. And if you need a Linux version, too bad. Some like WinRK and compressia can't be found any more. Some versions of nanozip had timers so that after a certain date you could not decompress your files with that version and newer versions could not read the files either. Why would anyone use a compressor that warned you not to use it for anything important, even if it was top rated on compressionratings.com?

    Most people compress using zip or deflate, and to a lesser extent gzip, bzip2, and 7zip. The reason is that these formats have been around for a long time, run on most OS's, and have free, open source code. ppmd would be a good candidate but like deflate it did not define an archive format. So you have zip and gzip using the same codec in incompatible formats, and likewise various archives that use ppmd in incompatible ways (WinZIP, freearc, 7zip, rar, etc.) even when they use the same incompatible ppmd version (H, I, J) which they don't.

    So that is the problem that I want ZPAQ to solve.

  18. #18
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Matt Mahoney View Post
    That would take work
    Something similar to Pareto frontier: http://www.maximumcompression.com/data/summary_mf2.php
    ZPAQ is in line 147, WinZIP is in line 23. I guess pzpaq would be in line 100+-20. Considering lines 1...11: five are occupied by open-source programs (including lines 1...3) and three other by a program with an open-source unpacker.

    There are over 160 versions of PAQ.
    The only additional disadvantage is that you should bother with giving your compressed files proper extension each time, e.g. archive.paq8px instead of just archive.paq. Otherwise, it's exactly as with zpaq: either keep the executable (if not moving to another platform), or find the source file, find a C/C++ compiler, compile.
    I know it's a big inconvenience, to remember switching .paq8px, .paq8py, .paq8pz etc, but if you want maximum convenience then disk is cheap enough to not even bother with compression
    Also I guess some of us believe that after 2021 it will be easier to find zpaq sources than paq8* sources.
    Last edited by Alexander Rhatushnyak; 14th February 2011 at 21:47.

  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The top ranked programs in MFC all use auto detection of file type and specialized models for .bmp, .tiff, .exe, text, etc. Jan Ondrus wrote some models that do better than the default models on SFC for various types including #1 on rafale.bmp. What I still need to do is update pzpaq with these models (or ones like them) and add auto detection.

    I could improve enwik9 from .1493 to .1461 by using dictionary preprocessing like DRT and write the decoder as a postprocessor. This would move it up 1 spot beating paq8px_v60 (the latest version tested) while being 25 times faster. I have that result listed but didn't include it in the main table because my policy is to list each program once. DRT improves most of the other compressors too.

    I don't expect zpaq to take the top spot because the format is not completely unrestricted in order to maintain compatibility.

    Also one could argue about the formula used for MFC efficiency. How important is size vs. compression speed vs. decompression speed vs. other features like knowing you can decompress your data later? compressionratings.com also uses a formula you can adjust to make whatever compressor you want score higher.

  20. #20
    Member
    Join Date
    Feb 2011
    Location
    Indian
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    hey guys can you help me...
    I am in search that how could i compress my files without Quality loss and i need my own extension for compressed file
    Is it possible..........

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    You could try one of these. http://mattmahoney.net/dc/text.html

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I have been experimenting with a BWT based algorithm in ZPAQ format. Jan Ondrus wrote some BWT based models (available at http://mattmahoney.net/dc/zpaq.html#configurations ) based on my BBB program for both fast mode (5x memory) and slow mode (1.25x memory). The forward BWT uses a simple std::stable_sort() for suffix sorting, which suffers from O(n^2 log n) sorting time on highly redundant data. The inverse BWT is written in ZPAQL. The BWT data is modeled using an adaptive order 1-2-4 ICM/ISSE chain with a mixer and SSE stage, similar to BBB.

    I have been experimenting with a faster model which I may make the default compression mode for pzpaq. It has speed similar to -1 and compression ratio similar to -2 at least on text. It uses libdivsufsort for the forward transform and a simple order 0 ICM for modeling. The inverse BWT is a bit more complex (and slower) because libdivsufsort uses a virtual end of block symbol, unlike BBB which sorts contexts with wraparound.

    Some tests on enwik8 on a 2 GHz T3200 under 64 bit Linux (using 1 of 2 cores):

    zpaq ocbwt_j4,17 enwik8 -> 20,756,878, 373 + 187 sec. (fast mode, 640 MB memory)
    zpaq ocbwt enwik8 -> 21,965,887, 90 + 69 sec. (new code, 640 MB memory)

    zpaq tests the pre/postprocessor before compression. Eliminating this test would improve compression time. The time to compute the transform and inverse (as a single block) is as follows:

    bwtpre, zpaq oprbwt_j4,17 -> 190 + 27.4 sec.
    bwt, zpaq opbwt -> 30 + 36 sec.

    bwt uses libdivsufsort by Yuta Mori. I wrote an inverse BWT in C++ which improves to 23.4 sec which I may use in place of the automatically optimized code. libdivsufsort also has an inverse function which takes 20.0 sec. but is slower than my code on small blocks. It is faster on large blocks because during the linked list traversal it uses a binary search on the array of counts to avoid indexing the input buffer to pick the output character. It's a lot more instructions but avoids one random memory access (cache miss). I didn't try to implement it in ZPAQL. I did see a similar improvement where I used an array of 5 byte elements each containing one input byte and one linked list pointer. That would also be more complex to implement in ZPAQL rather than the H and M arrays which are perfectly suited for this purpose.

    These times are for a single thread. pzpaq, of course, will be able to compress and decompress blocks in parallel depending on the number of cores. There would be the usual size/memory tradeoff because each thread requires 5 times the block size in memory. Blocks would have to be a power of 2 minus 256 bytes to best fit the ZPAQ model.

    Anyway, here is the preprocessor. It includes the inverse transform I wrote for testing. It will fail on files bigger than 1/5 of available memory. pzpaq will instead split the file into blocks.

    Code:
    /* bwt.cpp v0.01 - Program to compute Burrows-Wheeler transform and inverse.
    
    BWT transform: bwt c input [output]
    Inverse:       bwt d input [output]
    
    The forward transform creates a single block. It
    requires 5 times the file size in memory and fails if the
    file is too big. The output format is the BWT with the
    virtual EOF squeezed out. The 4 byte index pointing to
    its former location is appended to the end in little-endian
    (LSB first) format. The inverse transform assumes the
    same format. [output] defaults to standard output in Linux.
    
    To compile: g++ -O3 bwt.cpp divsufsort.c -o bwt
    
    You need divsufsort.c and divsufsort.h from
    libdivsufsort-lite from http://code.google.com/p/libdivsufsort/
    
    bwt.cpp is (C) 2011, Dell Inc. Written by Matt Mahoney.
    It is licensed under GPL v3. http://www.gnu.org/licenses/gpl.html
    
    */
    
    #include "divsufsort.h"
    #include <stdio.h>
    #include <stdlib.h>
    
    // Read n bytes of BWT and invert to out. buf should be allocated
    // to 5*n bytes with the BWT string in the first n bytes.
    // idx is original index of removed end of block symbol.
    void ibwt(unsigned char* buf, FILE* out, size_t n, size_t idx) {
      unsigned int count[256]={0};
      unsigned int* list=(unsigned int*)(buf+n);
      for (size_t i=0; i<n; ++i)
        ++count[(buf[i]+1)&255];
      count[0]=0;
      for (size_t i=1; i<256; ++i)
        count[i]+=count[i-1];
      for (size_t i=0; i<n; ++i) {
        unsigned int j=count[buf[i]]++;
        list[j]=i+(i>=idx);
      }
      for (size_t p=idx; p;) {
        p=list[p-1];
        putc(buf[p-(p>=idx)], out);
      }
    }
    
    int main(int argc, char** argv) {
    
      // check args
      if (argc<2) {
        fprintf(stderr, "BWT transform and inverse\n"
          "Forward: bwt c input [output]\n"
          "Inverse: bwt d input [output]\n");
        return 1;
      }
    
      // Open files
      FILE* in=stdin;
      FILE* out=stdout;
      if (argc>2 && (in=fopen(argv[2], "rb"))==0)
        return perror(argv[2]), 1;
      if (argc>3 && (out=fopen(argv[3], "wb"))==0)
        return perror(argv[3]), 1;
    
      // Read input into buf and get size n
      fseek(in, 0, SEEK_END);
      long n=ftell(in);  // input size
      rewind(in);
      unsigned char* buf=(unsigned char*)malloc(n*5);
      if (!buf) return fprintf(stderr, "Out of memory\n"), 1;
      if ((long)fread(buf, 1, n, in)!=n)
        return fprintf(stderr, "Read error: size %ld\n", n), 1;
    
      // Forward BWT
      if (argv[1][0]=='c') {
        int idx=divbwt(buf, buf, (saidx_t*)(buf+n), n);
        fwrite(buf, 1, n, out);
        for (int i=0; i<4; ++i) putc(idx>>(i*8), out);
      }
    
      // Inverse BWT
      else if (argv[1][0]=='d') {
        n-=4;
        if (n<0)
          return fprintf(stderr, "Input too small n=%ld\n", n), 1;
        int idx=buf[n]|buf[n+1]<<8|buf[n+2]<<16|buf[n+3]<<24;
        ibwt(buf, out, n, idx);
      }
    
      // Clean up
      free(buf);
      if (out!=stdout) fclose(out);
      if (in!=stdin) fclose(in);
      return 0;
    }
    And here is the the config file. You can pass it an argument to control the memory usage. The default will accept files up to 128 MB and use 640 MB. Arguments 1,2,3... will increase max file size and memory usage by 2,4,8... and you can use negative arguments.

    Code:
    (bwt.cfg v0.01 (C) 2011, Dell Inc. Written by Matt Mahoney.
    Licensed under GPL v3, http://www.gnu.org/licenses/gpl.html )
    
    (Requires 640 MB memory for files up to 128 MB - 256 bytes.
    Use argument 1 to double, -1 to halve)
    comp 0 0 $1+27 $1+27 1
      0 icm 5
    hcomp
      halt
    pcomp ./bwt c ;
    
      (read BWT, index into M, size in b)
      a> 255 ifnot
        *b=a b++
    
      (inverse BWT)
      else
    
        (index in last 4 bytes, put in c)
        b-- a=*b
        b-- a<<= 8 a+=*b
        b-- a<<= 8 a+=*b
        b-- a<<= 8 a+=*b c=a
    
        (save size in R2)
        a=b r=a 2
    
        (count bytes in H[~1..~255, ~0])
        do
          a=b a> 0 if
            b-- a=*b a++ a&= 255 d=a d! *d++
          forever
        endif
    
        (cumulative counts: H[~i=0..255] = count of bytes before i)
        d=0 d! *d=0 a=0
        do
          a+=*d *d=a d--
        d<>a a! a> 255 a! d<>a until
    
        (build linked list in H[0..size-1])
        b=0 do
          d=*b d! *d++ d=*d d-- *d=b
          a=c a>b ifnot *d++ endif
        b++ a=r 2 a>b while
    
        (traverse list and output)
        d=c do
          a=d a== 0 ifnot
            d-- d=*d
            a=d a<c ifnot a-- endif
            b=a a=*b out
          forever
        endif
      endif
      halt
    end
    The construct "do...if...forever...endif" works like a while-loop with the test at the top.

  23. #23
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I made a small speed improvement on the inverse BWT. Instead of squeezing out the EOF symbol, I include it in the transform, increasing its size by 1 byte. EOF should have the value -1 but since byte values have to be 0..255, I use 255. This requires no changes to the first loop counting characters because the 255 count is discarded anyway. In the second big loop building the linked list, the EOF symbol has to be treated separately but that is possible by splitting the loop into 2 parts, avoiding an if-statement inside the loop. The final loop scanning the linked list also avoids an if-statement.

    The result is that the optimized ZPAQL inverse transform time is improved from 36 to 32 seconds on enwik8. The C++ version is still 24 seconds. Overall compression time improves from 90 to 86 seconds. Decompression improves from 69 to 67 seconds with optimization or 102 to 99 when the ZPAQL is interpreted. enwik8.zpaq is 20 bytes larger.

    Here are the new forward transform inserting the EOF symbol.

    Code:
      // Forward BWT
      if (argv[1][0]=='c') {
        int idx=divbwt(buf, buf, (int*)(buf+n), n);
        fwrite(buf, 1, idx, out);
        putc(255, out);
        fwrite(buf+idx, 1, n-idx, out);
        for (int i=0; i<4; ++i) putc(idx>>(i*8), out);
      }
    Here is the new inverse BWT in C++ (for testing)

    Code:
    // Read n bytes of BWT and invert to out. buf should be allocated
    // to 5*n bytes with the BWT string in the first n bytes.
    // idx is original index of NOT removed end of block symbol.
    void ibwt(unsigned char* buf, FILE* out, size_t n, size_t idx) {
      unsigned int count[256]={0};
      unsigned int* list=(unsigned int*)(buf+n);
    
      // Count bytes
      for (size_t i=0; i<n; ++i)
        ++count[(buf[i]+1)&255];
    
      // Cumulative counts including EOF
      count[0]=1;
      for (size_t i=1; i<256; ++i)
        count[i]+=count[i-1];
    
      // Build linked list including EOF at idx
      for (size_t i=0; i<idx; ++i)
        list[count[buf[i]]++]=i;
      list[0]=idx;
      for (size_t i=idx+1; i<n; ++i)
        list[count[buf[i]]++]=i;
    
      // Scan linked list and output
      for (size_t p=idx; p;) {
        p=list[p];
        putc(buf[p], out);
      }
    }
    Here is the new faster bwt.cfg

    Code:
    (bwt.cfg (C) 2011, Dell Inc. Written by Matt Mahoney.
    Licensed under GPL v3, http://www.gnu.org/licenses/gpl.html )
    
    (Requires 640 MB memory for files up to 128 MB - 256 bytes.
    Use argument 1 to double, -1 to halve)
    comp 0 0 $1+27 $1+27 1
      0 icm 5
    hcomp
      halt
    pcomp ./bwt c ;
    
      (read BWT, index into M, size in b)
      a> 255 ifnot
        *b=a b++
    
      (inverse BWT)
      else
    
        (index in last 4 bytes, put in c and R1)
        b-- a=*b
        b-- a<<= 8 a+=*b
        b-- a<<= 8 a+=*b
        b-- a<<= 8 a+=*b c=a r=a 1
    
        (save size in R2)
        a=b r=a 2
    
        (count bytes in H[~1..~255, ~0])
        do
          a=b a> 0 if
            b-- a=*b a++ a&= 255 d=a d! *d++
          forever
        endif
    
        (cumulative counts: H[~i=0..255] = count of bytes before i)
        d=0 d! *d= 1 a=0
        do
          a+=*d *d=a d--
        d<>a a! a> 255 a! d<>a until
    
        (build first part of linked list in H[0..idx-1])
        b=0 do
          a=c a>b if
            d=*b d! *d++ d=*d d-- *d=b
          b++ forever
        endif
    
        (middle of list)
        d=0 *d=b
        
        (rest of list in H[idx+1..n-1])
        b=c b++ c=r 2 do
          a=c a>b if
            d=*b d! *d++ d=*d d-- *d=b
          b++ forever
        endif
    
        (traverse list and output)
        d=r 1 do
          a=d a== 0 ifnot
            d=*d
            b=d a=*b out
          forever
        endif
      endif
      halt
    end
    Last edited by Matt Mahoney; 22nd February 2011 at 06:17.

Similar Threads

  1. Replies: 23
    Last Post: 24th March 2018, 17:57
  2. Introducing zpipe, a streaming ZPAQ compatible compressor
    By Matt Mahoney in forum Data Compression
    Replies: 0
    Last Post: 1st October 2009, 06:32
  3. LZTURBO 0.91 Parallel Compressor (Win32/Linux)
    By donotdisturb in forum Forum Archive
    Replies: 26
    Last Post: 19th April 2008, 20:15
  4. LZTURBO 0.9 parallel compressor
    By donotdisturb in forum Forum Archive
    Replies: 18
    Last Post: 6th March 2008, 01:23
  5. LZTURBO 0.1 parallel compressor
    By donotdisturb in forum Forum Archive
    Replies: 5
    Last Post: 7th October 2007, 22:44

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •