+ Reply to Thread
Page 1 of 3 123 LastLast
Results 1 to 30 of 82

Thread: HFCB: Huge Files Compression Benchmark

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732

    HFCB: Huge Files Compression Benchmark

    http://freearc.org/HFCB.aspx

    i plan to add tests on huge games (prototype, COD) later

  2. #2
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    China, Beijing
    Posts
    160
    Ha~! Can you give me results of my CSC3.1 with -m0/m1/m2 -d7?

  3. #3
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    382
    Since almost all utilities there are used in multi-threaded mode, how about pbzip2? (Win32 compile download)
    Original idea with the test BTW
    I am... Black_Fox... my discontinued benchmark

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Black_Fox, i can't download it

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Quote Originally Posted by Fu Siyuan View Post
    Ha~! Can you give me results of my CSC3.1 with -m0/m1/m2 -d7?
    decompression cmdline?

  6. #6
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    320
    Quote Originally Posted by Bulat Ziganshin View Post
    Black_Fox, i can't download it
    Does this link work? From there, use the "Win32-Binaries" link. Otherwise, I could attach it, it's only 100 KB in size.

    BTW, since it's an Ubuntu image that perhaps contains some GZip/BZip2 streams, did you try FreeArc modes (Precomp + srep) or other compressors together with Precomp (Precomp + srep + 7-Zip)? This could be very slooow in compression (especially if slow mode is used), but could also reduce the compressed size (at least most Linux distribution ISOs can be reduced to 50-70% with it, don't know if it works for VM images as well).
    Last edited by schnaader; 1st December 2009 at 22:53.
    http://schnaader.info
    Damn kids. They're all alike.

  7. #7
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    438
    Quote Originally Posted by schnaader View Post
    BTW, since it's an Ubuntu image that perhaps contains some GZip/BZip2 streams, did you try FreeArc modes (Precomp + srep) or other compressors together with Precomp (Precomp + srep + 7-Zip)? This could be very slooow in compression (especially if slow mode is used), but could also reduce the compressed size (at least most Linux distribution ISOs can be reduced to 50-70% with it, don't know if it works for VM images as well).
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing.

    EDIT: I'm now trying to PreComp that VM splitted into 10 TAR volumes and also got crash on 4th volume. Also seems that there is not too much help because:
    Code:
    01.dat	424 673 280
    02.dat	424 673 280
    03.dat	424 673 280
    
    01.pcf	424 787 282
    02.pcf	438 558 203
    03.pcf	427 621 443
    Last edited by Skymmer; 1st December 2009 at 23:06.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,895
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Quote Originally Posted by Skymmer View Post
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing
    are you tried with -t-j? most times it crashes due to packjpg.dll

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Quote Originally Posted by Shelwien View Post
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).
    original: http://174.36.1.2/bagluxpe.7z

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Quote Originally Posted by schnaader View Post
    Does this link work? From there, use the "Win32-Binaries" link. Otherwise, I could attach it, it's only 100 KB in size
    thanks, will test both pigz and pbzip2

    i don't yet tried precomp

  12. #12
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    320
    Quote Originally Posted by Skymmer View Post
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing.

    EDIT: I'm now trying to PreComp that VM splitted into 10 TAR volumes and also got crash on 4th volume. Also seems that there is not too much help because:
    Code:
    01.dat	424 673 280
    02.dat	424 673 280
    03.dat	424 673 280
    
    01.pcf	424 787 282
    02.pcf	438 558 203
    03.pcf	427 621 443
    Slow mode (for more matches) and -t-j (perhaps combined with -v, against the crashes) would be worth a (last) try. Ah, right, it's an open testset. Will make some experiments myself

    Quote Originally Posted by Shelwien View Post
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).
    This would be a good idea, indeed. 7-Zip can open almost everything , so there should be a good chance. Another idea would be to check if there's special software (or a function inside the virtual machine) for converting the image file.
    http://schnaader.info
    Damn kids. They're all alike.

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Ah, right, it's an open testset. Will make some experiments myself
    if you will do, please post results here so i can use best modes and know how much time it will need and how much compression is

  14. #14

  15. #15
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    China, Beijing
    Posts
    160
    Quote Originally Posted by Bulat Ziganshin View Post
    decompression cmdline?
    The same with compression.The header of file1 decides the behavior.

  16. #16
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    438
    Quote Originally Posted by Bulat Ziganshin View Post
    i plan to add tests on huge games (prototype, COD) later
    Bulat, if by COD you mean Call of Duty: Modern Warfare or Call of Duty: Modern Warfare 2 game then I can give you advice: don't waste your time. All content of these games is compressed. For example MW2 consist of: 1.96 GB of movies in BIK format, 4.73 GB of IWD files (actually ZIP files), 4.36 GB of FF files, which are ZLIB packed.

    EDIT: Also, how about adding PAQ8px to competitors?
    Last edited by Skymmer; 2nd December 2009 at 06:07.

  17. #17
    Member
    Join Date
    Jun 2008
    Location
    Berlin
    Posts
    12
    console games should be okay to test :-)

  18. #18
    Member
    Join Date
    May 2009
    Location
    France
    Posts
    20
    Hello,

    'Seems you mixed testing time and extraction time in reporting decompression time for your short table. Or did I miss the point ?

    Interesting benchmark starting here, for me at least!

    AiZ

  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    HFCB: added CSC and more FreeArc modes

    Seems you mixed testing time and extraction time in reporting decompression time for your short table
    decompression time = testing time. my HDD is slow, CPU is fast so extraction time has much more overhead than average system

    how about adding PAQ8px
    who will volunteer to test? i can do it myself later if it will finish overnight

    For example MW2 consist of: 1.96 GB of movies in BIK format, 4.73 GB of IWD files (actually ZIP files), 4.36 GB of FF files, which are ZLIB packed.
    one more reason to add stdin-to-stdout mode to precomp+freearc

  20. #20
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    China, Beijing
    Posts
    160
    Quote Originally Posted by Bulat Ziganshin View Post
    HFCB: added CSC and more FreeArc modes

    Thanks the same though I originally mean "-m0/m1/m2" -d7

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    1,294
    Results for vm
    zpaq ocmid.cfg (111 MB) -> 982439831, 11251 sec.
    zpaq ocmax.cfg,3 (1861 MB) -> 888920458, 27373 sec.
    Decompression not tested yet. Will test tonight.

    Test machine: Dual core T3200, 2.0 GHz, 3 GB, Vista 32 bit,
    ZPAQL compiled with MinGW g++ 4.4.0 -O2 -s -fomit-frame-pointer -march=pentiumpro -DNDEBUG -DOPT

    Since zpaq runs on 1 core, I ran both compression programs at the same time overnight. Times are wall times.

    EDIT: decompression OK. mid.cfg = 10097 sec, max.cfg,3 = 25952 sec, wall times, done one at a time.
    Last edited by Matt Mahoney; 4th December 2009 at 19:11.

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    HFCB: added more zip/bzip2 modes; 7-zip results updated, now these are much better since i've renamed vm to vm.dll

  23. #23
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    320
    Quote Originally Posted by Bulat Ziganshin View Post
    HFCB: added more zip/bzip2 modes; 7-zip results updated, now these are much better since i've renamed vm to vm.dll
    Interesting, didn't know 7-Zip had some detection based on the file extension. Doesn't feel right though, it should better depend on the actual content than on filenames or extensions... How big is the difference?
    http://schnaader.info
    Damn kids. They're all alike.

  24. #24
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    438
    I have 806 MB result on HFCB (VM) I haven't measured the time of the whole process cause its anyway useless due the completely different CPU but proccess is asymmetric and requires only 700 MB on compression and 64 MB on decompression. But let's go by numbers.
    The idea of PreComp-ing that VM file hooked me completely. Turning off the JPEG recompression was not a solution for me and I even didn't try it. Furthemore most of the problems were related to ZLIB and GIFs. So my strategy was to find out all the problematic offsets and create the ignore list. After the whole night of testing I finally was managed to make PreComp 0.3.8 work without crashes. So the final command line looks like this:

    precomp -v -slow -i619716256 -i620119742 -i733687954 -i733280138 -i733841552 -i734911416 -i1212229222 -i1319302591 -i1319303624 -i1325620736 -i1623902430 -i1637846002 -i2231172608 vm

    Then SREP and 7z -mx=9

    Some stats:
    Code:
      Original: 4 244 176 896
     PreComped: 4 946 444 864
    After SREP: 3 391 459 732
      Final 7z:   845 818 619
    Some notes:
    - Results of Precomp 0.4.0 can be much better due recursion but I'm not going to repeat same search scheme with it. But you have a clue so free to try
    - Precomping took more than 2 hours on my AMD64 4000+. Noticeable slowdown on Multi-PNG files.
    - "Bad" offsets have been appearing very chaotically. For example, I found bad offset 734911416. In the next pass bad offset 733280138 appeared. Its a mystery for me why its not appeared in the previous pass.

    Bulat, if you gonna include Precomp based results then I'm really curious if the command line given above will work on your system.

  25. #25
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    320
    Really nice results, good work

    I had some attempts, too and I managed to get a Precomp 0.4.1 development version to run without a crash (there's no big difference to 0.4, so this should work for it, too). Using -t-j crashed, but using both -t-j and -v seemed to work. But I recognized halfway through the file (which took very long, blame recursion, slow drives and still using the PC for other things, I guess) that the output drive I'm using is formatted using FAT32, so output would be corrupted as there's a 2 or 4 GB limit on this filesystem and output will surely be more than 4 GB.

    By the way, recursion really should help a bit, highest level I saw in the log file so far is level 2 which occurs quite often.

    I'm splitting the input file into 700 MB parts now and will try again.
    http://schnaader.info
    Damn kids. They're all alike.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    Quote Originally Posted by schnaader View Post
    Interesting, didn't know 7-Zip had some detection based on the file extension. Doesn't feel right though, it should better depend on the actual content than on filenames or extensions... How big is the difference?
    Code:
    7-zip 9.07 [64]
     -mx                          987261165 1329.529 73.671 149.803
     -mx -md128m                  978210220 1446.626 72.579 151.746
     -mx -md256m                  971648644 1573.823 72.525 156.832
    7-zip 9.07 [64] (with BCJ)
     -mx                          960869320 1210.284 87.194 107.865
     -mx -md128m                  951249799 1313.346 86.421 105.929
     -mx -md256m                  945521722 1417.122 86.278 106.142

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,732
    HFCB: updated results of rar, csc, nz

  28. #28
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Kce, PL
    Posts
    1,037
    Bulat...I see the benchmark. But where are the huge files?
    Seriously, I nowadays wouldn't call a file <1 TB "huge".

  29. #29
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    382
    Since "large" files are 100 MB - 1 GB and most used benchmark corpora weight in order of megabytes, then yes, this is "huge".
    I am... Black_Fox... my discontinued benchmark

  30. #30
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Kce, PL
    Posts
    1,037
    Quote Originally Posted by Black_Fox View Post
    Since "large" files are 100 MB - 1 GB and most used benchmark corpora weight in order of megabytes, then yes, this is "huge".
    "Huge" suggests something unusual.
    There's nothing unusual about 4 GB files. Everybody dealt with hundreds of them.
    And actually I'm surprised that you call a 100 MB file large. For me it's well within average. It seems our scales differ a lot.

+ Reply to Thread
Page 1 of 3 123 LastLast

Similar Threads

  1. convert swf files to avi files
    By Jabilo in forum Off-Topic
    Replies: 9
    Last Post: 29th November 2011, 12:38
  2. New benchmark for generic compression
    By Matt Mahoney in forum Data Compression
    Replies: 20
    Last Post: 29th December 2008, 10:20
  3. MONSTER OF COMPRESSION - New Benchmark -
    By Nania Francesco in forum Forum Archive
    Replies: 222
    Last Post: 5th May 2008, 11:04
  4. Compression speed benchmark
    By Sportman in forum Forum Archive
    Replies: 104
    Last Post: 23rd April 2008, 17:38
  5. Synthetic compression benchmark
    By giorgiotani in forum Forum Archive
    Replies: 6
    Last Post: 3rd March 2008, 13:14

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts