Results 1 to 13 of 13

Thread: An interesting test set

  1. #1
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts

    An interesting test set

    I've been making a smaller distribution of Acronis Disk Director, and after I precomped all the files I found some strange compressing statistics: NanoZip -cO performed better than -cc and, more amazingly, better than paq8p1 -8! I would like you to test it as well Here it is: http://narod.ru/disk/7607841000/mx9.7z.html

    Here are some of my results:
    Code:
    30 375 586	NanoZip 0.06a	BWT	-cO -m2g -forcemem -se -nm
    30 531 433	PAQ8p1-exp04	CM	-8
    30 849 931	NanoZip 0.06a	CM	-cc -m2g -forcemem -se -nm
    31 478 034	FreeArc 0.50	LZMA	-m9 -s -dsgerpn
    31 849 842	FreeArc 0.50	LZMA	-m9x -s -dsgerpn
    31 860 200	WinRK 3.1.2	ROLZ	ROLZ3
    32 611 335	7-Zip 4.66a	LZMA	-t7z -mx9 -mmt
    32 617 090	NanoZip 0.06a	BWT	-co -m2g -forcemem -se -nm
    33 184 531	WinRK 3.1.2	LZP+CM	FPW
    34 941 950	NanoZip 0.06a	LZ77	-cD -m2g -forcemem -se -nm
    36 967 497	NanoZip 0.06a	LZ77	-cd -m2g -forcemem -se -nm
    40 915 624	WinRK 3.1.2	CM	PWCM
    40 764 052	UHARC 0.6b	ALZ	-m3 -md32768 -b32768
    40 899 603	NanoZip 0.06a	LZP	-cF -m2g -forcemem -se -nm
    44 903 076	BCM 0.07a	BWT	-b400000
    47 639 438	UHARC 0.6b	PPM	-mx -md32768 -b32768
    54 434 937	WinRAR 3.80	LZH	-m5 -s
    55 484 223	WinRAR 3.80	PPM	-mc35:128t+ -s
    63 869 098	BZip2		BWT	-9
    63 952 105	7-Zip 4.66a	Deflate	-tzip -mx9 -mfb=258 -mpass=15
    64 454 272	GZip		Deflate	-9
    127 746 048	TAR
    Last edited by nanoflooder; 12th April 2009 at 10:55.

  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    At first sight, there is a high probability that the data could be highly fragmented (mixed data). So, PAQ and the other codecs fail to adapt it. Anyway, I should look at the real data before talking about it.

    BTW, there is a small typo again FPW is exactly LZP+CM. But, if you want to consider LZP="ROLZ with only one offset", then it's ok

    Edit: There is another typo WinRAR uses RarVM+LZH+PPM with in best mode (-m5). Surely not LZW. Are you sure you didn't drink too much!?
    Last edited by osmanturan; 12th April 2009 at 01:42.
    BIT Archiver homepage: www.osmanturan.com

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    What about splitting the file into fragments with something like
    http://shelwien.googlepages.com/seg_file.rar
    and repeating the nanozip/paq8 comparison for them?

  4. #4
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by osmanturan View Post
    Are you sure you didn't drink too much!?
    Weeeeeeellllll...

    Check out WinRK's PWCM failure! I have added it today. I'm sure WinRK has a lot of codecs, but still...

  5. #5
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    What about splitting the file into fragments with something like
    http://shelwien.googlepages.com/seg_file.rar
    and repeating the nanozip/paq8 comparison for them?
    Hmm... How about multiple Tarballs?

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I meant tar it and then process with seg_file and compress the segments separately.
    Though in fact its better to use almost anything else than tar...
    like rar -m0 maybe... long headers are long.

  7. #7
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you please tell me the usage of seg_file?..
    And yes, anyone can download the package from the first post and try it

  8. #8
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Shelwien View Post
    What about splitting the file into fragments with something like
    http://shelwien.googlepages.com/seg_file.rar
    and repeating the nanozip/paq8 comparison for them?
    I tried your seg_file a while a ago, but IIRC, it can't handle +120MB files like this one..
    (error :"file too big" or similar)

    edit : successfully used an older Durilca with -l to segment the file back then.
    Last edited by pat357; 12th April 2009 at 19:29.

  9. #9
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by nanoflooder View Post
    Could you please tell me the usage of seg_file?..
    What about
    Code:
     "seg_file <input>"
    Works only for files upto 15-20MB for me...
    You have to chop the +120 MB file in smaller chunks first and then use seg_file for each chunk.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I tried your seg_file a while a ago, but IIRC,

    Well, its not mine... I just patched Shkarin's source a little,
    because it can't be compiled as it is, probably intentionally.
    (http://compression.ru/ds/seg_file.rar)

    > it can't handle +120MB files like this one..
    > (error :"file too big" or similar)

    Ok, there was some kinda arbitrary restriction, so I
    patched it some more
    http://shelwien.googlepages.com/seg_file.rar
    But also it does all the processing in memory, and requires 7N
    of memory at that, so I guess it still won't work for files >~270M.
    (or ~437M with /3GB).

  11. #11
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I packed all files with 7z-store. And I analyzed this file quickly. There are some part which are highly redundant under short context (i.e order 0 to 2). Dozens of repeated patterns exist in these regions. We are all know that PAQ8 fails on very very redundant data (for example, even GZip becomes a competitor for huge file which only consists single character). Seems PAQ8 fails at these regions. I also tried several combination to figure out what else causes redundancy. E8 filter does not help (at least for BIT).

    Code:
    CCMX (5)              32,710,257
    BIT 0.7 (-p=4)        34,586,410
    BIT 0.7+E8 (-p=4)     34,638,948
    BBB+E8                41,980,834
    DURILCA (-t1 -m256)   44,249,620
    DURILCA (-t3 -m256)   50,639,088
    7z-Store             127,705,277
    BBB was started without E8 4 hours ago (still running!). Though BBB+E8 was finished within only ~1 hour. Anyway, I also started most craziest thing: M1 optimization! The score is 65713045 bytes so far on Q6600 (4 threads). According to current context mask, it seems I'm right. Next step is segmentation. Both DURILCA -l and seg_file failed. Now, I'll use new seg_file compilation which provided by Shelwien.
    BIT Archiver homepage: www.osmanturan.com

  12. #12
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thank you For doing all this

    I'm good only at making self-extracting packages

  13. #13
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    After 6 hours 10 minutes later (exactly 22169.74 seconds), finally BBB finished it's job! The score is 42,826,997 bytes. Seems E8 really helps BBB.

    Anyway, I have successfully fragmented the file with seg_file. And I've run NZ (-cO), BCM, BIT and 7z separately on these fragments. I made a report about test (see attachment). There are considerable parts which are almost incompressible (i.e. 168th part in 11.966.952 bytes). Also, 1st part makes really a difference for each compressor:
    Code:
    NZ       11.635.875
    7z       11.807.235
    BIT      12.661.375
    BCM      16.969.926
    Unpacked 26.900.549
    According to my personal tests, 7z only wins over BIT when data is highly redundant or optimal parsing really makes difference. Any comment?

    P.S.: The attachment is in XLS format. But, I've used Office 2007 for exporting the XLS. If you could not see as it should be, please let me know.
    Attached Files Attached Files
    BIT Archiver homepage: www.osmanturan.com

Similar Threads

  1. Test set: bookstar
    By m^2 in forum Data Compression
    Replies: 5
    Last Post: 11th February 2009, 17:49
  2. Test set: installer
    By m^2 in forum Data Compression
    Replies: 10
    Last Post: 11th February 2009, 14:47
  3. Test set: bitmap
    By m^2 in forum Data Compression
    Replies: 28
    Last Post: 13th January 2009, 17:44
  4. I found an interesting test set...
    By m^2 in forum Data Compression
    Replies: 7
    Last Post: 3rd January 2009, 15:22
  5. Test set: backup
    By m^2 in forum Data Compression
    Replies: 1
    Last Post: 23rd October 2008, 22:16

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •