Results 1 to 7 of 7

Thread: enwik9 benchmark nanozip, bliz, m99, dark

  1. #1
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post

    enwik9 benchmark nanozip, bliz, m99, dark

    I compiled a version of nanozip for large text compression benchmark. It is nanozip 0.02 with 1min of worth hacks/tunings for LTCB. It is at http://nanozip.net/nanozipltcb.zip. It requires 1670mb of free memory. I didn't bother to rip off other nanozip code from the executable, so it is bloated. The real size of the executable would probably be about 60kb.

    Here is a quick benchmark with enwik9.

    Code:
    program               archive size / c.time
    nanozipltcb            166.251.135 / 140 s
    bliz 0.24b c 262144000 175.404.703 / 375 s
    bliz 0.24b f 262144000 177.936.934 / 317 s
    dark v0.51 p-b250mfi2  178.141.689 / 234 s
    m99 v2.2 -m 250m       178.945.536 / 364 s
    m99 v2.2 -f 250m       186.995.547 / 280 s
    (I didn't test decompression, except for nanozipltcb which I verified.)

    I run these under winxp64/2gb and killed all background programs (including explorer.exe) from taskmanager. It showed 1.8gb free memory. I wanted to run dark with 334mb block, but I was unable because all it said was out of memory. m99 and bliz are relatively memory inefficient, requiring 6n blocksize, so I run them with 250mb block as well.

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Sami!

    Matt has tested nanozipltcb.

    http://www.cs.fit.edu/~mmahoney/comp...text.html#1664

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yes, and nanozipltcb knocked cmm4, ccmx, dark, and 7zip/ppmd off the pareto frontier, and also rings for decompression.

    I think maybe speed and size could both be improved even further using dictionary preprocessing. I believe it already uses a small dictionary internally.

    Compression is also very close to BBB, which uses a single 1GB BWT block but a much slower algorithm to allow blocks that almost fill memory.

  4. #4
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    There are two factors for the decompression speed. The de/compression stage (after bwt) in nanozip is very fast, almost free. Secondly, the dictionary reduces the size, so there are fewer bytes to unbwt. Large x/wrt like dictionary, heavy filtering would speed up decompression even more and increase compression ratio, but it may hurt compression speed a bit. The compression speed is mostly due to a fast sorting algorithm.

    If Matt or someone wants to speed up BBB (I mean something like 2x or more) without losing any compression, something like this will help: http://www.cs.helsinki.fi/u/tpkarkka...07-revised.pdf (paper forwarded to me by Szymon)

  5. #5
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by Sami View Post
    Large x/wrt like dictionary, heavy filtering would speed up decompression even more and increase compression ratio, but it may hurt compression speed a bit.
    And that just the way i like it. Better compression ratio. faster decompression. to hell with compression speed.
    Last edited by SvenBent; 26th July 2008 at 19:47.

  6. #6
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    Quote Originally Posted by Sami View Post
    I compiled a version of nanozip for large text compression benchmark. It is nanozip 0.02 with 1min of worth hacks/tunings for LTCB. It is at http://nanozip.net/nanozipltcb.zip. It requires 1670mb of free memory. I didn't bother to rip off other nanozip code from the executable, so it is bloated. The real size of the executable would probably be about 60kb.

    Here is a quick benchmark with enwik9.

    Code:
    program               archive size / c.time
    nanozipltcb            166.251.135 / 140 s
    bliz 0.24b c 262144000 175.404.703 / 375 s
    bliz 0.24b f 262144000 177.936.934 / 317 s
    dark v0.51 p-b250mfi2  178.141.689 / 234 s
    m99 v2.2 -m 250m       178.945.536 / 364 s
    m99 v2.2 -f 250m       186.995.547 / 280 s
    (I didn't test decompression, except for nanozipltcb which I verified.)

    I run these under winxp64/2gb and killed all background programs (including explorer.exe) from taskmanager. It showed 1.8gb free memory. I wanted to run dark with 334mb block, but I was unable because all it said was out of memory. m99 and bliz are relatively memory inefficient, requiring 6n blocksize, so I run them with 250mb block as well.
    What conclusions are you trying to reach by comparing a compressor that is both customized for the specific test set and use more memory than the competition?

    Write an app that randomly re-assigns the alphabet in the test file. Use the same amount of memory for each app. Then your results will be respectable. Until then, at least for me, the results are meaningless. Anyone can write filters to achieve better results on a known test set. I'm about as impressed with this as I am with an apps inability to compress pic when every 216th symbol is deleted. (^:

    I'm not trying to attack your work. But let's be realistic. You're not comparing apples to apples.

    - Michael Maniscalco

  7. #7
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by michael maniscalco View Post
    What conclusions are you trying to reach
    ...
    I'm not trying to attack your work.
    A sharp observant of compression psychology will immidiately notice the apparent similarities between Michael's writing and Christian Martelock's. Notice the question "what conclusions are you trying to reach". For the egoist author knows that his work must outperform something else, therefore he aims at or "tries to reach" certain conclusion before making a test. This is the reversed science, a contrast to making a test and then drawing conclusion, but all this must be too obvious for non-author readers of this forum.

    Michael's own work m99 is fairly ok, especially his sort is fairly fast, in my tests, it's often only second to nz sort (even though nz needs less memory). But as nz shows, totally superior methods exist that should inspire further work, not bitter and frustrated forum posts and desperate pleas not to compare compressors or to look at the results (or at least manipulate data until we can arrive the predefined conclusion).

    My position is the same as that of Matt Mahoney. That is, is we wish to compress english text, then we choose english text file and then compress it with various compressors and look at the results. I think this position is a conservative one, but readers know it's the most extreme radical doctrine in this forum and supported only by very few authors.

Similar Threads

  1. NanoZip - a new archiver, using bwt, lz, cm, etc...
    By Sami in forum Data Compression
    Replies: 280
    Last Post: 29th November 2015, 11:46
  2. Nanozip decompression data troubles
    By SvenBent in forum Data Compression
    Replies: 11
    Last Post: 12th January 2009, 23:25
  3. NanoZip huge efficiency issue
    By m^2 in forum Data Compression
    Replies: 9
    Last Post: 10th September 2008, 21:51
  4. M99 v.2.2 is ready
    By michael maniscalco in forum Data Compression
    Replies: 11
    Last Post: 22nd July 2008, 22:24
  5. Dark v0.51
    By LovePimple in forum Forum Archive
    Replies: 55
    Last Post: 16th January 2007, 23:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •