Results 1 to 15 of 15

Thread: The 7-zip compression selection optimization thread

  1. #1
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts

    The 7-zip compression selection optimization thread

    Since it seems that we are a couple of forum users that brute force 7-zip compression with m7repacker. I am putting this thread here, where we can dump results for the best chosen compression method.
    Hopefully once enough data is collected we will see some pasterns emerge of what compression method to use for certain file extension/data.

    https://encode.ru/threads/1201-m7zRepacker
    - Use only -M1 or -m3 method for this
    - Only report double extension if you have used a third party preprocessing/filter
    - Report size in approximate 100mb of a time. small files below 1mb reports as "<1mb"
    - Report total size of all files for an extension if in the same block (-m1 method)
    - make one post per archive made

    Optional extended info would be compressed size compared to original size
    This can help narrow down files that should not be compressed.


    Example post:
    Code:
    Version 7-zip: 16.04
    Settings: -m1 -d1536 -mem1024
    
    .iso (1.2GB)     : BCJ2 LZMA2:30 LZMA:19:lc0:lp2 LZMA:19:lc0:lp2
    .mds (<1mb)      : Delta4 LZMA:16:lc0:lp1
    .mdf.ecm (700MB) : LZMA:768m:lc9:pb0
    .exe (<1mb)      : BCJ2 LZMA:24m:lc7 LZMA:19:lc0:lp2  LZMA:19:lc0:lp2
    .ogg (<100mb)	 : LZMA:24m:lc8:pb0 (Ratio: 98.5%)
    This is a little bad example since it contains iso files which are just a container format so i can have a lot of different stuff in it.
    I hope ppl will join in with some info and that it can help speed up m7repacker and/or help 7-zip basic to improve its compression rate
    Last edited by SvenBent; 26th February 2017 at 23:40.

  2. #2
    Member
    Join Date
    Mar 2016
    Location
    Croatia
    Posts
    181
    Thanks
    74
    Thanked 10 Times in 10 Posts
    Sven, how do i test this, i mean where do i put these Settings.....in Parameter box? Click image for larger version. 

Name:	LTxsx3W.png 
Views:	204 
Size:	33.4 KB 
ID:	4835

  3. #3
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by dado023 View Post
    Sven, how do i test this, i mean where do i put these Settings.....in Parameter box? Click image for larger version. 

Name:	LTxsx3W.png 
Views:	204 
Size:	33.4 KB 
ID:	4835
    You dont. you use "m7repacker -m1 -d1024 -mem1024 Filename7z"
    and the m7repacker will try out a bunch of different methods

    https://encode.ru/threads/1201-m7zRepacker

    once the .7z hase been optimized you just open the new .7z file and can see the method used in the GUI

  4. #4
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Code:
    Version 7-zip: 16.04
    Settings: -m1 -d1536 -mem1024
    
    .cue (<1mb)      : PPMD:O14:mem16
    .bin.ecm (1.2GB) : Delta:4 LZMA:1536m:lc0:lp1

  5. #5
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Code:
    Version 7-zip: 16.04
    Settings: -m1 -d768 -mem1024
    
    .exe (<100MB) : BCJ2 LZMA:1536k:lc8:pb0 LZMA:20:lc0:lp2 LZMA:20:lc0:lp2
    .ISO ( 600MB) : LZMA2:768m:lc8:pb0

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    that's not the best possible result for bcj2:
    Code:
    26,604,497 10.7z
    26,482,726 11.7z
    26,367,414 12.7z
    
    7z.exe a -bb3 -mx=9 -myx=9 ^
    10.7z D:\tmp7\D\*
    
    7z.exe a -bb3 -mx=9 -myx=9 ^
    -mf=bcj2 ^
    -m0=lzma:mt2:d26:lc8:pb2:lp0:fb273:mc999 ^
    11.7z D:\tmp7\D\*
    
    7z.exe a -bb3 -mx=9 -myx=9 ^
    -m0=bcj2 ^
    -m1=lzma:mt2:d26:lc8:pb2:lp0:fb273:mc999 ^
    -m2=lzma:mt2:d22:lc0:pb2:lp2:fb273 ^
    -m3=delta:4 ^
    -m4=lzma:mt2:d22:lc0:pb2:lp2:fb273 ^
    -m5=lzma2:mt2:lc0:pb0:lp0:fb273 ^
    -mb00s0:1 -mb00s1:2 -mb00s2:3 -mb00s3:5 -mb03s0:4 ^
    12.7z D:\tmp7\D\*

  7. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by Shelwien View Post
    that's not the best possible result for bcj2:
    I'm not sure i understand?
    These are the optimal compression found from brute forcing different methods on diffret data with m7repacker.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Check the 3rd commandline in my example.
    bcj2 is not a simple filter - its a whole filter graph with multiple streams.
    So unless you (or the optimizer) already do what I suggested, its not the best possible way.

  9. #9
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by Shelwien View Post
    Check the 3rd commandline in my example.
    bcj2 is not a simple filter - its a whole filter graph with multiple streams.
    So unless you (or the optimizer) already do what I suggested, its not the best possible way.
    Are you saying thats the optimal way for all .exe filse in the world ?
    i dont have a list of the different methods m7zrepacker uses, but i belive it has 57 different presets that it runs through

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Its better than default one at least, as you can see in my stats.
    As to presets, you only posted "BCJ2 LZMA:1536k:lc8:pb0 LZMA:20:lc0:lp2 LZMA:20:lc0:lp2" where secondary lzma instances are kinda default for bcj2,
    so I think that m7zrepacker doesn't have any optimizations for bcj2 extra streams.

    The idea with bcj2 is that it outputs 3 extra streams, aside from main stream with filtered data.
    They are address tables for calls and jumps, and a compressed flag stream.
    But experiments show that jump table can be compressed better with added delta:4, and the flag stream can be also additionally compressed a little.
    Unfortunately the 7z filter tree system and switches that control it are not well known...

  11. #11
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    ah i see where you are going adn yes. i only seen 3 straem from BCJ2 beeing handled with m7repacker. or maybe the 7-zip gui does not show more than that. im unaware of that ?
    Can you confirm if the gui shows more than 3 streams compression settings ?

  12. #12
    Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    116
    Thanks
    18
    Thanked 32 Times in 11 Posts
    BCJ2 can do other magic things, for example BCJ2+LZMA can compress some 8Bit-PCM/WAV files way better than Delta+LZMA.
    Indeed, m7zRepacker does have only a few BCJ2 presets, so Shelwiens example is definitiv the better way.

    m7zRepacker is written in 2009, released 2011, so the presets and strategy how data is compressed,
    is based on 7-8 year old knowledge. m7zRepacker also bruteforce only some presets, not all possible parameters.
    So bruteforcing one type of data with m7zRepacker results not to the best possible compression.
    It results only to the best matching preset. But real bruteforcing is imho really waste of time for a few Bytes more compression.

    I currently completely rewrite m7zRepacker with an analyse part of the most used filetypes, so there is no more time waste for algorithms,
    that doesn't matching, e.g. ppmd for mp3. You can spend more presets that should match for this kind of data.
    On the other side, this way kills the detection of lucky combination e.g. BCJ2+LZMA on some wavs, that is really uncommon.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Here's the -m page from 7-zip.chm. Note also the :d param - http://nishi.dreamhosters.com/u/method.htm#BCJ2

  14. #14
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by Biozynotiker View Post
    BCJ2 can do other magic things, for example BCJ2+LZMA can compress some 8Bit-PCM/WAV files way better than Delta+LZMA.
    Indeed, m7zRepacker does have only a few BCJ2 presets, so Shelwiens example is definitiv the better way.

    m7zRepacker is written in 2009, released 2011, so the presets and strategy how data is compressed,
    is based on 7-8 year old knowledge. m7zRepacker also bruteforce only some presets, not all possible parameters.
    So bruteforcing one type of data with m7zRepacker results not to the best possible compression.
    It results only to the best matching preset. But real bruteforcing is imho really waste of time for a few Bytes more compression.

    I currently completely rewrite m7zRepacker with an analyse part of the most used filetypes, so there is no more time waste for algorithms,
    that doesn't matching, e.g. ppmd for mp3. You can spend more presets that should match for this kind of data.
    On the other side, this way kills the detection of lucky combination e.g. BCJ2+LZMA on some wavs, that is really uncommon.
    Would Tou implant "store" as one of the method it goes through ?

  15. #15
    Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    116
    Thanks
    18
    Thanked 32 Times in 11 Posts
    Quote Originally Posted by SvenBent View Post
    Would Tou implant "store" as one of the method it goes through ?
    Yep, "store" as method will be added. I thought, that lzma2 should handle this, but it seems, that store method can help sometimes.
    I also plan a switch to skip methods, at least for unknown filetypes.

Similar Threads

  1. Thread title changeable? - No Problem!
    By Simon Berger in forum The Off-Topic Lounge
    Replies: 13
    Last Post: 7th January 2016, 11:53
  2. Multi-Thread and compression speed
    By BetaTester in forum Data Compression
    Replies: 6
    Last Post: 11th September 2013, 04:45
  3. PAQ multi thread
    By frede_sch in forum Data Compression
    Replies: 12
    Last Post: 1st November 2011, 01:29
  4. Forum does not let me to post in some thread
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 3
    Last Post: 9th April 2011, 14:14
  5. Subset model selection problem
    By Alexandre Mutel in forum Data Compression
    Replies: 28
    Last Post: 17th July 2009, 09:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •