Page 1 of 2 12 LastLast
Results 1 to 30 of 37

Thread: Practical compression benchmarks

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    Practical compression benchmarks

    I wrote a tool to analyze the files that have accumulated on my computer over the last 4 years. It seems to me that most compression benchmarks don't reflect the kinds of files that we might actually compress for backups or distributions. For example, here is an analysis of the Calgary corpus, Maximum Compression corpus, Silesia, and LTCB.

    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
        14       3.141622  315  766    0 256  1=174  4=59   3=42  calgary/
         1       0.768771  148 1000    0  82  4=66   5=59   3=57  book1
         1       0.610856  165 1000    0  96  4=60   3=60   5=50  book2
         1       0.513216  852   18    0 159  1=852  2=18   3=11  pic
         1       0.377109  185 1000    0  98  1=62   4=47   3=45  news
         1       0.246814  414  439    2 256  4=68   8=65   1=54  obj2
         1       0.111261  194 1000    0  81  8=53   6=43   5=36  bib
         1       0.102400  366  338    4 256  4=402  1=41   3=15  geo
         1       0.093695  306  910    0  99  1=89   3=47   7=40  trans
         1       0.082199  152 1000    0  91  3=60   4=58   5=56  paper2
         1       0.071646  296 1000    0  87  1=156  3=44   5=37  progl
         1       0.053161  176 1000    0  95  3=55   4=54   5=49  paper1
         1       0.049379  370 1000    0  89  1=155  3=37   6=34  progp
         1       0.039611  261 1000    0  92  1=76   3=44   4=36  progc
         1       0.021504  341  309    2 256  1=207  2=37   3=30  obj1
    
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
        10      53.134726  279  670    5 256  3=73   1=63   2=46  maxcomp/
         1      20.617071  199 1000    0 106  1=54   2=48   4=34  fp.log
         1       4.526946   75  511    7 256  7=23   8=17   2=17  FlashMX.pdf
         1       4.168192  375  293    2 256  3=98  91=91   1=56  ohs.doc
         1       4.149414  509  208    8  73  3=493  1=59   6=58  rafale.bmp
         1       4.121418  361  153    0 256  4=234  1=104  2=82  vcfiu.hlp
         1       4.067439  551 1000    0  41 10=94  11=93   9=82  english.dic
         1       3.870784  350  364   22 256  1=172  2=59   4=45  acrord32.exe
         1       3.782416  258  310   24 256  1=101  2=43   4=34  mso97.dll
         1       2.988578  218 1000    0  97  6=43   5=41   2=40  world95.txt
         1       0.842468   11  377    9 256  1=6    2=5    3=5   a10.jpg
    
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
        12     211.938580  308  715    2 256  1=136  2=86   3=54  silesia/
         1      51.220480  390  357    4 256  1=183  8=55   4=51  mozilla
         1      41.458703  227 1000    0  98  4=57   6=51   5=46  webster
         1      33.553445  315 1000    0  62  1=333  2=192  3=188 nci
         1      21.606400  317  907    1 256  1=133 38=38   5=37  samba
         1      10.192446  174 1000    0 100  4=66   5=61   3=57  dickens
         1      10.085684  141  801    1 256  2=82   1=59   6=22  osdb
         1       9.970564  612   90    2 256  2=372  1=283  4=14  mr
         1       8.474240  207  274    4 256  2=455  4=21   6=13  x-ray
         1       7.251944  173  414    6 256 28=172 56=34  84=23  sao
         1       6.627202  329  996    0 256  7=87   5=77   8=52  reymont
         1       6.152192  302  355   31 256  1=105  4=43   2=27  ooffice
         1       5.345280  379  996    0 104  7=77   6=64   8=46  xml
    
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
         1    1000.000000  192  996    0 206  1=63   4=48   3=46  enwik9
    The first line is the total for the corpus. It gives the number of files, the total size in MB, the predictability of each file in an order-1 context (out of 1000), the fraction of ASCII text, fraction of E8 and E9 bytes, alphabet size, and the 3 highest points in an auto-correlation. Predictability ("Pred") is measured by counting bytes correctly predicted by an order 1 context as being the same that occurred previously in that context. It is 4 (1/256) for random data. Note that the lowest value is 11 for a10.jpg, indicating that it is slightly compressible, and 75 for flashmx.pdf, which contains mostly compressed images as well. The highest value is 852 for pic.

    "Text" gives the fraction of printable characters out of 1000 from the set {9, 10, 13, 32..126}. It would be 383 for random data and 1000 for pure ASCII text.

    "e8e9" gives the fraction of 0xE8 and 0xE9 bytes out of 1000. These are CALL and JMP instructions in x86 code. Compression can be improved by a transform that converts the 4 byte address that follows from relative to absolute form. Typical values for 20..30 for x86 code (acrord32.exe, mso97.dll), 8 for random data, and 0 for text.

    "Alp" is the alphabet size, counting any byte value that occurs at least once. It is normally 256 for large random files.

    The top 3 periods are determined by measuring the gap between successive occurrences of the same byte value and counting gaps in the range 1..99. Initially I counted gaps up to 2048, but this failed to find large record sizes, such as scan lines in images like pic or rafale.bmp. PAQ uses more sophisticated heuristics, such as finding 3 successive identical gaps for a byte value. However we do find record sizes such as 4 in geo and 3 in rafale.bmp as we might expect. In text, the most common gap size is about the length of a word. In random data, we would expect 1=4, 2=4, 3=4, following an exponentially decaying distribution starting at 1/256 and declining at rate 1/256.

    My point is that such benchmarks are not representative of the files we typically have on our computers. You might recall Microsoft's deduplication research. http://static.usenix.org/event/fast1...pers/Meyer.pdf

    According to this study, my computer is pretty typical. Earlier I found I could compress by 25% by deduplication alone. Also the file distribution is typical. The most common file type is .dll. For example, in the directory c:\Program Files I have 68 applications. Here are the top 100 files, directories, and file types, ordered by total size. For file types I consider both the extension and the extension combined with the first 4 bytes of the file. The 4 bytes are coded either as a printable character and a space or as 2 hex digits, for example "M Z 9000.DLL" or "89P N G .png".

    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
     44097    5317.978880  281  394   10 256  1=130  2=59   4=41  
     44097    5317.978880  281  394   10 256  1=130  2=59   4=41  C:\
     44097    5317.978880  281  394   10 256  1=130  2=59   4=41  C:\Program Files\
      3856    1981.234791  329  344   15 256  1=153  2=65   4=48  .dll
      3839    1961.440359  329  344   15 256  1=153  2=65   4=48  M Z 9000.dll
      1345     769.367243  306  395   16 256  1=106  2=56   4=50  C:\Program Files\Google\
     20079     751.138800  141  356    8 256  1=87   2=42   4=17  C:\Program Files\Gateway Games\
       725     650.670335  307  395   16 256  1=106  4=52   2=52  C:\Program Files\Google\Chrome\
       725     650.670335  307  395   16 256  1=106  4=52   2=52  C:\Program Files\Google\Chrome\Application\
      1460     636.385851  270  332   10 256  1=118  2=70   4=57  C:\Program Files\Common Files\
       764     435.115228  254  329    8 256  1=105  2=69   4=61  C:\Program Files\Common Files\microsoft shared\
      2227     398.787183  303  480    6 256  1=147  2=71   8=33  C:\Program Files\Microsoft Visual Studio 10.0\
      3133     389.177374  326  500   13 256  1=128  4=68   2=43  C:\Program Files\OpenOffice.org 3\
       503     349.780098  387  288   16 256  1=230  4=59   2=43  .exe
       988     346.931017  271  351   14 256  1=117  2=65   4=56  C:\Program Files\Microsoft Office\
       490     332.829996  391  285   16 256  1=234  4=60   2=43  M Z 9000.exe
       406     322.176099  282  350   14 256  1=119  2=69   4=57  C:\Program Files\Microsoft Office\Office12\
      7122     316.896407  172  326    7 256  1=91   2=83   4=22  C:\Program Files\Gateway Games\FATE\
       259     252.254200  362  315   18 256  1=160  4=86   2=65  .DLL
       259     252.254200  362  315   18 256  1=160  4=86   2=65  M Z 9000.DLL
       144     211.036220  306  398   16 256  1=106  4=52   2=52  C:\Program Files\Google\Chrome\Application\22.0.1229.94\
      1857     206.373998  400  552    3 256  1=225  2=61   8=46  C:\Program Files\Microsoft Visual Studio 10.0\VC\
       359     203.680188  407  380   21 256  1=161  4=109  2=54  C:\Program Files\OpenOffice.org 3\program\
       146     192.908293  178  288    6 256  2=87   1=84   4=52  C:\Program Files\Common Files\microsoft shared\ink\
       201     172.953044  403  515    1 256  1=254  2=63   4=42  ! < a r .lib
       201     172.953044  403  515    1 256  1=254  2=63   4=42  .lib
       206     164.845162  221  369   17 256  1=101  2=38   4=30  C:\Program Files\Adobe\
       204     164.206186  221  369   17 256  1=101  2=38   4=30  C:\Program Files\Adobe\Reader 10.0\
      1051     163.596217  204  343    8 256  1=120  3=33   2=29  C:\Program Files\Microsoft Works\
        60     160.797703  402  502    1 256  1=257  2=63   3=43  C:\Program Files\Microsoft Visual Studio 10.0\VC\lib\
      5459     152.748223   11  376    7 256  1=10   2=6    3=6   .png
      5459     152.748223   11  376    7 256  1=10   2=6    3=6   89P N G .png
       751     141.019274  454  220   14 256  1=322  3=101  2=37  C:\Program Files\CyberLink\
        12     138.507946    7  384    8 256  1=6    2=5    5=5   .cab
      2351     136.058524  370  854    1 256  1=168  2=48   4=36  C:\Program Files\Microsoft SDKs\
      2351     136.058524  370  854    1 256  1=168  2=48   4=36  C:\Program Files\Microsoft SDKs\Windows\
      2351     136.058524  370  854    1 256  1=168  2=48   4=36  C:\Program Files\Microsoft SDKs\Windows\v7.0A\
       640     127.826043  346  496    7 256  1=111  2=71   3=50  C:\Program Files\Java\
       639     126.890193  349  496    7 256  1=111  2=72   3=51  C:\Program Files\Java\jre7\
       952     125.857657   51  372    8 256  1=42   4=6    2=6   C:\Program Files\Gateway Games\The Price is Right\
       323     123.559654  396  465    1 256  2=139  1=114  4=44  C:\Program Files\Reference Assemblies\
       323     123.559654  396  465    1 256  2=139  1=114  4=44  C:\Program Files\Reference Assemblies\Microsoft\
       323     123.559654  396  465    1 256  2=139  1=114  4=44  C:\Program Files\Reference Assemblies\Microsoft\Framework\
        35     118.501159  119  408    8 256  2=93   1=19   8=9   C:\Program Files\Microsoft Visual Studio 10.0\Microsoft Visual C++ 2010 Express - ENU\
         5     115.448311    7  385    8 256  1=6    2=5    5=5   M S C F .cab
       146     114.054215  307  389   16 256  1=104  4=52   2=50  C:\Program Files\Google\Chrome\Application\21.0.1180.89\
       146     113.393827  307  388   16 256  1=104  4=51   2=50  C:\Program Files\Google\Chrome\Application\21.0.1180.83\
       135     110.276172  309  363   22 256  1=139  2=50   4=37  C:\Program Files\Adobe\Reader 10.0\Reader\
         2     107.062718  307  397   16 256  1=107  2=53   4=52  C:\Program Files\Google\Chrome\Application\22.0.1229.94\Installer\
      2622     106.573271  245  562    4 256  1=132  2=32   3=28  C:\Program Files\OpenOffice.org 3\Basis\
         4     105.632493  306  397   16 256  1=107  4=52   2=51  .7z
         4     105.632493  306  397   16 256  1=107  4=52   2=51  7 z bcaf.7z
       143     105.552022  306  399   16 256  1=106  2=52   4=52  C:\Program Files\Google\Chrome\Application\22.0.1229.92\
         1     105.484198  306  397   16 256  1=107  4=52   2=51  C:\Program Files\Google\Chrome\Application\22.0.1229.94\Installer\chrome.7z
       143     105.394227  306  399   16 256  1=106  2=52   4=52  C:\Program Files\Google\Chrome\Application\22.0.1229.79\
       931     105.149118  220  300    6 256  2=119  1=106  3=38  C:\Program Files\Gateway Games\FATE\TOWN\
        78     103.515065  294  580    3 256  2=71   1=60   3=59  .jar
        78     103.515065  294  580    3 256  2=71   1=60   3=59  P K 0304.jar
       285     100.117386  293  577    2 256  3=118  6=81   2=48  .pak
       285     100.117386  293  577    2 256  3=118  6=81   2=48  04000000.pak
       452      99.401994  497  134    1 256  3=405  1=306  2=222 .bmp
       396      99.269230  178  344   11 256  1=74   2=45   3=30  C:\Program Files\Movie Maker\
       460      96.539951  302  398   17 256  1=105  2=74   4=41  C:\Program Files\Google\Google Earth\
       153      93.922319   83  395    7 256  1=37   4=17   2=16  C:\Program Files\Microsoft Office\Office12\1033\
        37      93.391609  124  391    7 256  1=67   2=27   4=18  C:\Program Files\Microsoft Games\
      1917      92.669902  357 1000    0 109  1=158  3=35   4=34  C:\Program Files\Microsoft SDKs\Windows\v7.0A\Include\
       233      90.357227  415   79    2 256  1=375  3=272  2=240 C:\Program Files\Camera Assistant Software for Gateway\
         1      88.543445    8  385    8 256  1=6    2=5    5=5   C:\Program Files\Microsoft Visual Studio 10.0\Microsoft Visual C++ 2010 Express - ENU\vs_setup.cab
       263      87.146469  401  488    1 256  2=158  1=111  4=45  C:\Program Files\Reference Assemblies\Microsoft\Framework\.NETFramework\
      1886      84.903155  355 1000    0 107  1=154  4=36   3=35  .h
       648      84.697601  313  390   13 256  1=130  4=53   2=43  C:\Program Files\VideoLAN\
       648      84.697601  313  390   13 256  1=130  4=53   2=43  C:\Program Files\VideoLAN\VLC\
       537      83.588359  326  577    3 256  2=80   3=67   1=60  C:\Program Files\Java\jre7\lib\
       272      80.758101  554  158   16 256  1=431  3=133  2=32  C:\Program Files\CyberLink\LabelPrint\
        30      76.343821  336  387    3 256  4=96   1=91   3=59  .LEX
       677      75.947372  362  346   10 256  1=210  2=50   4=45  C:\Program Files\QuickTime\
       272      73.228115  325  393   13 256  1=134  2=64   4=38  C:\Program Files\Microsoft Visual Studio 10.0\Common7\
       951      71.226461   72  389    8 256  1=49   2=9    4=9   C:\Program Files\Gateway Games\Build-a-lot 2\
        90      71.101411  209  767    3 256  6=35   9=33  10=27  C:\Program Files\OpenOffice.org 3\share\
        89      69.560038  354   26    1 256  3=327  1=322  2=304 C:\Program Files\Camera Assistant Software for Gateway\Effect\
        74      68.202544  351   26    1 256  3=329  1=318  2=305 B M 8 10.bmp
      1630      67.171770  119  274    8 256  2=165  4=20   1=14  C:\Program Files\Gateway Games\FATE\SOUNDS\
       207      67.112475  398  496    1 256  2=160  1=104  4=45  C:\Program Files\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0\
         1      66.854174    5  379    8 256  1=5    3=5    2=4   .u00
         1      66.854174    5  379    8 256  1=5    3=5    2=4   C:\Program Files\Gateway Games\The Price is Right\Update.u00
         1      66.854174    5  379    8 256  1=5    3=5    2=4   P K 0304.u00
        38      65.349860  367  390    6 256  1=143  4=78   2=52  C:\Program Files\Common Files\microsoft shared\Works Shared\
        96      65.142883  289  340   15 256  1=110  4=70   2=65  C:\Program Files\Common Files\microsoft shared\OFFICE12\
       361      64.572530   38  362    8 256  1=28   2=7    5=7   C:\Program Files\Movie Maker\Shared\
      2439      63.781908  201  362    8 256  1=145  6=19   4=17  C:\Program Files\Gateway Games\Polar Pool\
       706      63.646790  139  237    8 256  2=192  4=29   1=23  .wav
       706      63.646790  139  237    8 256  2=192  4=29   1=23  R I F F .wav
       442      63.206094  371  348   11 256  1=153  2=104  4=69  C:\Program Files\Common Files\Apple\
       442      63.206094  371  348   11 256  1=153  2=104  4=69  C:\Program Files\Common Files\Apple\Apple Application Support\
       350      63.025996   38  361    8 256  1=28   2=7    5=7   C:\Program Files\Movie Maker\Shared\DvdStyles\
       282      60.946008  441  315   14 256  1=221  3=91   2=74  C:\Program Files\Microsoft Money 2007\
        21      60.754605   36  396    7 256  1=21   2=9    6=6   .zip
        19      60.754605   36  396    7 256  1=21   2=9    6=6   P K 0304.zip
         1      58.734965   25  393    7 256  1=13   2=7    3=7   .rez
         1      58.734965   25  393    7 256  1=13   2=7    3=7   0d0aR e .rez
    We note that e8e9 for .exe and .dll is around 15, indicating that only about half of the contents is x86 code. We note that these are mostly compressible, thus not packed with UPX or such. The only uncompressible types seem to be .png, .cab, and .u00. Note that .jar, .7z, and even some .zip files are compressible.

    We also note that there is not much text. Most of the text is in Microsoft SDKs/, 92 MB out of 5.3 GB, or about 2%. This is quite different than most benchmarks.

    Here is the Windows directory (32 bit Vista), showing only file types. Again, not much text, and not all of the .exe, .dll, .sys files are x86 code.
    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
     86863   24096.793131  315  375    9 256  1=132  2=89   4=72  c:\windows\
     12517   10832.356881  385  334   11 256  1=141  4=110  2=102 .dll
       109    2302.864384  128  385   11 256  1=61   2=38   4=19  .msp
      1991    1129.190128  392  314   12 256  1=208  2=109  4=49  .DLL
      2464    1040.012296  327  322   16 256  1=194  2=50   4=36  .exe
      1027     634.430960  191  375    1 256  2=80   1=77   4=60  .ttf
     25348     541.580504  252 1000    0 139  1=74   5=37   8=36  .manifest
        36     507.592824  182  391    2 256  2=96   1=68   4=51  .ttc
      1770     478.745859  285  310   16 256  1=128  2=40   4=33  .sys
       163     393.866994  190  393    3 256  3=106  2=71   4=57  .bin
       113     372.503818   25  375    7 256  1=21   2=7    3=6   .wmv
        16     231.147376  140  417    4 256  4=185  8=78  12=27  .ngr
         2     220.217344  521  466    0 256  2=472  1=50   8=20  .edb
        69     210.294669  186  331    4 256  4=187  2=116  1=50  .IMD
        62     194.262365   10  384    8 256  1=6    2=5    4=5   .cab
      6346     186.627856  516  423    1 256  2=360  1=146  8=19  .mui
       141     174.744438  461  337    1 256  1=209  2=194  6=45  .sdb
       107     159.056222  310  297    5 256  1=230 12=83   2=51  .dat
         2     159.012280  381  395    7 256  1=122  2=112  4=46  .mzz
         7     152.466748   58  356    7 256  1=50   3=9    2=9   .dvr-ms
       244     149.295788  407  167    2 256  1=283  4=127  2=122 .tlb
       444     133.671739  127  260    4 256  1=520  5=143  3=13  .ICM
       127     130.613120  372  257    7 256  1=289  4=83   2=50  .cpl
      3632     126.788059  474 1000    0 111  1=276  7=26   4=25  .GPD
    Finally, the entire 74 GB c:\ partition, again by type. Some types like .fastq and .fa are probably not typical unless you are doing work in genomics. This and test data (a couple copies of enwik9) make up most of the text.

    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
    212825   79642.171698  228  535    6 256  1=143  2=67   4=45  c:\
     17600   13246.168342  377  335   11 256  1=145  4=99   2=95  .dll
     10223    9028.684659   25  377    8 256  1=11   2=6    4=5   .JPG
        11    8472.926426  381 1000    0  48  1=368  2=129  3=84  .fastq
       118    6341.950858  342 1000    0  40  1=338  2=176  3=114 .fa
       559    3126.352380   88  400    6 256  1=71  12=9    2=8   .dat
      4854    2996.950314  232  340   14 256  1=127  4=34   2=33  .exe
       930    2469.057808   17  385    8 256  1=11   3=7    4=7   .zip
       112    2332.042240  127  385   11 256  1=61   2=37   4=19  .msp
      2386    1431.297955  388  314   13 256  1=200  2=99   4=56  .DLL
       114    1010.711119    7  382    8 256  2=6    1=6    3=5   .cab
       106     959.471645   13  384    7 256  2=7    4=7    3=6   .zpaq
       650     786.228337  241  455    1 256  3=198  1=125  2=50  .bmp
         1     728.754176   17  373    8 256  1=7    3=6    2=6   .iso
       167     673.926152  544  999    0 256  1=525  2=82   3=70  .tmp
        22     669.479488  308  711    3 256  1=139  2=52   3=44  .tar
      4269     651.649169   30  379    7 256  1=17   2=7    4=6   .jpg
      1036     647.442995  192  373    1 256  2=79   1=77   4=60  .ttf
        40     587.230919    4  382    8 256  1=4    2=4    3=4   .pmd
    Here is the program. You can run it either like "dir/s/b | analyze" to recursively scan directories, or pass filenames as command line arguments.

    Code:
    // Analyze files: analyze * (or) dir/s/b | analyze
    // (C) 2012, Dell Inc. Written by Matt Mahoney, Oct. 25, 2012.
    // Free under GPL v3.
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    #include <string.h>
    #include <string>
    #include <vector>
    #include <map>
    #include <algorithm>
    #include <assert.h>
    using namespace std;
    
    // Statistics about a file or set of files
    struct ST {
      enum {MAXGAP=100};
      int64_t bytes;     // total size
      int64_t hits;      // number of correct order-1 predictions
      int64_t ascii;     // number of bytes in {9, 10, 13, 32..126}
      int64_t e8e9;      // number of bytes in {0xe8, 0xe9} (232..233)
      int64_t files;     // number of files
      unsigned chars[8]; // bitmap of which bytes occurred
      int64_t rep[MAXGAP+1]; // gap -> count
      ST(): bytes(0), hits(0), ascii(0), e8e9(0), files(0) {
        memset(chars, 0, sizeof(chars));
        memset(rep, 0, sizeof(rep));
      }
      void inc(int c, int pred, int64_t gap) {
        assert(c>=0 && c<256);
        ++bytes;
        if (c==9 || c==10 || c==13 || (c>=32 && c<=126)) ++ascii;
        if (c==0xe8 || c==0xe9) ++e8e9;
        if (pred) ++hits;
        if (gap>MAXGAP) gap=MAXGAP;
        chars[c>>5]|=1<<(c&31);
        ++rep[gap];
      }
      void print();
    };
    
    void ST::print() {
      const double s=(bytes+(bytes==0))/1000.0;
      int alp=0;  // alphabet size
      for (int i=0; i<256; ++i)
        if (chars[i>>5]&(1<<(i&31))) ++alp;
      printf("%6.0f %14.6f %4.0f %4.0f %4.0f %3d",
             double(files), bytes/1000000.0,
             hits/s, ascii/s, e8e9/s, alp);
    
      // Print top 3 periods
      int b1=0, b2=0, b3=0;
      for (int i=1; i<MAXGAP; ++i)
        if (!b1 || rep[i]>rep[b1]) b1=i;
      for (int i=1; i<MAXGAP; ++i)
        if (i!=b1 && (!b2 || rep[i]>rep[b2])) b2=i;
      for (int i=1; i<MAXGAP; ++i)
        if (i!=b1 && i!=b2 && (!b3 || rep[i]>rep[b3])) b3=i;;
      printf(" %2d=%-3.0f %2d=%-3.0f %2d=%-3.0f",
              b1, rep[b1]/s, b2, rep[b2]/s, b3, rep[b3]/s);
    }
      
    map<string, ST> st;  // file,dir/,.ext -> statistics
    
    void scan(const char* filename) {
    
      // Open file
      FILE* in=fopen(filename, "rb");
      if (!in) return;
    
      // Read first 4 bytes as 8 char ASCII or hex string
      string hdr;
      for (int i=0, c; i<4 && (c=getc(in))!=EOF; ++i) {
        if (c>=32 && c<=126) {
          hdr+=char(c);
          hdr+=' ';
        }
        else {
          const char* hex="0123456789abcdef";
          hdr+=hex[c>>4&15];
          hdr+=hex[c&15];
        }
      }
    
      // Get st keys
      vector<string> keys;
      keys.push_back(filename);  // get stats for filename
      keys.push_back("");        // and total for all files
      int j=-1;
      for (int i=0; filename[i]; ++i)  // and by filename extension
        if (filename[i]=='.') j=i;
      if (j>=0) {
        keys.push_back(filename+j);  // extension only
        keys.push_back(hdr+(filename+j));  // first 4 bytes and extension
      }
      for (int i=0; filename[i]; ++i)  // and for each directory
        if (filename[i]=='/' || filename[i]=='\\')
          keys.push_back(string(filename).substr(0, i+1));
      const int n=keys.size();
      vector<ST*> stp(n);
      for (int i=0; i<n; ++i)
        st[keys[i]].files++;
      for (int i=0; i<n; ++i)
        stp[i]=&st[keys[i]];
    
      // Count byte values and order-1 predictions.
      fseeko64(in, 0, SEEK_SET);
      int64_t i=0;  // offset
      vector<unsigned char> o1(256);  // c1 -> c which last followed it
      vector<int64_t> pos(256);  // c -> last offset
      int c, c1=0;  // current, previous byte
      for (i=1; (c=getc(in))!=EOF; ++i) {
        assert(c>=0 && c<256);
        for (int j=0; j<n; ++j)
          stp[j]->inc(c, c==o1[c1], pos[c] ? i-pos[c] : 0);
        pos[c]=i;
        o1[c1]=c;
        c1=c;
      }
      fclose(in);
    
      // uncomment to print while scanning
    //  stp[0]->print();
    //  printf(" %s %s\n", hdr.c_str(), filename);
    }
    
    // compare by size, then by key
    bool by_size(map<string, ST>::iterator a, map<string, ST>::iterator b) {
      if (a->second.bytes!=b->second.bytes)
        return a->second.bytes > b->second.bytes;
      return a->first < b->first;
    }
    
    int main(int argc, char** argv) {
    
      // Print table header
      printf(
      " Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext\n"
      "------ -------------- ---- ---- ---- --- ------ ------ ------ --------------\n");
    
      // Scan files from command line args
      for (int i=1; i<argc; ++i)
        scan(argv[i]);
    
      // Scan files from stdin (e.g. piped from "dir/s/b")
      if (argc==1) {
        string s;
        int c;
        while ((c=getchar())!=EOF) {
          if (c<' ') scan(s.c_str()), s="";
          else s+=char(c);
        }
      }
    
      // Sort the map iterators by total size and print the top 100
      vector<map<string, ST>::iterator> v;
      for (map<string, ST>::iterator p=st.begin(); p!=st.end(); ++p)
        v.push_back(p);
      sort(v.begin(), v.end(), by_size);
      for (int i=0; i<100 && i<int(v.size()); ++i) {
        v[i]->second.print();
        printf(" %s\n",v[i]->first.c_str());
      }
      return 0;
    }

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Interesting.
    I'll look into the code and see my results.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Matt, I think it has a bug on 64-bit versions of Windows from Vista on. C:\Windows\SysNative is a symlink (or smth) and will probably make the code count some files twice.A
    And on XP x64, when compiled as 32-bit, it is wrong because it calculates SysWOW64 twice and skips system32.
    Last edited by m^2; 25th October 2012 at 23:30.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Wouldn't that be a bug in the DIR command?

    Windows-64 uses redirection when you run a 32 bit application. For example, system("notepad") would call the 32 bit version instead of the 64 bit version. More commonly, it is used to load the right version of DLL files. There is a way to disable this when searching directories ( http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx ) which is something I should probably add to zpaq if it didn't cause other problems.

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    My system drive, with the XP bug mentioned above:
    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
    126227   19176.825389  310  428    8 256  1=153  2=62   4=38
     11267    5114.003491  375  342   13 256  1=169  2=73   4=50  .dll
      3059    1652.155414  289  357   14 256  1=126  4=48   2=39  .exe
      1219    1276.966122  413  468    2 256  1=276  2=70   3=42  .lib
       126    1000.066176   51  378    7 256  1=18   2=15   4=12  .cab
        78     770.657280   27  382    8 256  1=11   4=11   2=9   .msp
       273     638.274048  417  409    1 256  1=252  2=54   3=54  .pdb
      2041     601.456290  431  459    4 256  1=264  2=61   3=50  .a
       139     405.430272  283  384   11 256  1=150  2=47   4=37  .msi
     14468     353.423857  349 1000    0 227  1=141  4=36   3=33  .h
      2862     346.016490  238  300    5 256  1=205 25=17   2=15  .dat
       522     337.435338  252  525    4 256  2=67   3=51   1=46  .jar
       453     318.799890  131  400    8 256  1=55   2=30   4=21  .zip
        22     288.871277    4  383    8 256  1=4    2=4    3=4   .arc
    Sadly, the code is too slow to be run on any of my other drives. I may need to add some suspend-resume code.
    I'll try to run it on my development system tomorrow.

    As to dir command, I don't have Vista to try, it's just my suspicion. It might be considered a dir bug if it happens, but doing so wouldn't make the script more usable...

    As to problems with disabling redirection - it causes plenty. Like system dll loading failing because the system tries to give you the x64 version...I wouldn't be surprised to see it happening in the middle of a system call because of dalay-loading. Like MS suggests, turn it on for as short period as you can and then turn if off. And test the code inside very carefully, on as many Windows versions as you can. And don't call any 3rd-party code during that period.
    Or, better, ask users to use an x64 version.
    Last edited by m^2; 25th October 2012 at 23:54.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Process Explorer (sysinternals.com) allow to suspend/resume any process

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The program is slow because it reads every file on the disk. When I scanned c:\* I let it run overnight. Anyway, it is interesting that you have compressible compressed files (zip, jar, cab).

  8. #8
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Here are builds for Windows 32 and 64 bit. I modified the source code to use file buffering and so that it can be built with Visual Studio. The 64-bit version runs around 2x faster than the 32-bit version.
    Attached Files Attached Files

  9. #9
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
     47785   27518.086391  205  318    8 256  1=136  3=68   2=39  
       607    6396.141208  183  216   10 256  1=117  2=101  4=83  .psd
       355    3408.715264  474  285    3 256  1=338  3=180  6=30  .pub
      4292    1859.045158   45  374    8 256 17=14   1=12   4=11  .jpg
       143    1682.399161  497  233   11 256  1=314  3=265  2=45  .bmp
        10    1570.540604    4  383    8 256  1=4    2=4    3=4   .7z
       954    1455.477219   12  381    7 256  1=6    2=5    3=5   .JPG
       121    1190.643829  142  358    6 256  1=141  2=4    4=4   .dat
      1498    1047.043349   78  369    8 256  4=41   2=32   3=30  .pdf
         3     838.250656   12  377    8 256  1=10  16=6    7=5   .avi
      1486     803.129340  413  238    6 256  1=320  3=79   4=43  .doc
      1145     603.627946    6  380    8 256  1=6    2=5    3=5   .png
        66     553.422584   36  378    8 256  1=19   4=10   2=8   .exe
        88     451.721074  533  446    2 256  3=440  1=34   6=30  .tif
      1970     424.126101  555  202    5 256  1=461  3=69   4=50  .tmp
         5     281.247107   20  385    7 256  2=14   1=12   4=8   .map


    is this normal that the output produces also some lines with RIFF.avi and ae4e43.jpg etc? So that i have to select manuall all lines only with file extentions?

    more data will come later the tool is still running for other pcs i will update this post
    Last edited by thometal; 26th October 2012 at 17:10.

  10. #10
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by thometal View Post
    is this normal that the output produces also some lines with RIFF.avi and ae4e43.jpg etc? So that i have to select manuall all lines only with file extentions?
    That's intentional, from Matt's original post:

    Quote Originally Posted by Matt Mahoney View Post
    For file types I consider both the extension and the extension combined with the first 4 bytes of the file. The 4 bytes are coded either as a printable character and a space or as 2 hex digits, for example "M Z 9000.DLL" or "89P N G .png".
    If you can compile custom executables, you can disable this by commenting out the following line in the "scan" function:

    Code:
        keys.push_back(hdr+(filename+j));  // first 4 bytes and extension
    http://schnaader.info
    Damn kids. They're all alike.

  11. #11
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Quote Originally Posted by schnaader View Post
    That's intentional, from Matt's original post:



    If you can compile custom executables, you can disable this by commenting out the following line in the "scan" function:

    Code:
        keys.push_back(hdr+(filename+j));  // first 4 bytes and extension

    Thx i will test it on the next partition. HM possibly I read to fast the first post, and in the source for me it wasnt clear if this is the right line.

  12. #12
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Here are additional builds for Windows 32 and 64 bit. I changed the code to use better file buffering, resulting in a 2x speedup.

    note: The builds marked with "Hdr" check for file headers, so are slower.
    Attached Files Attached Files
    Last edited by david_werecat; 26th October 2012 at 18:39. Reason: updated builds

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Thanks. I didn't think using fread() would make that big a difference. (getc() is buffered internally).

    I was curious about the contents of all these files and which types could be reliably detected from the first few bytes. It might be even faster if you took out that part of the code rather than reading 4 bytes and backing up.

  14. #14
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    The builds above are now updated to include versions with and without header checking.

    I think the reason why fread is faster is because of function call overhead. Although, some compilers might inline fgetc in static builds to achieve similar performance to fread.
    Last edited by david_werecat; 26th October 2012 at 19:06.

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Process Explorer (sysinternals.com) allow to suspend/resume any process
    I guess not across reboots. I'm not going to leave my PC on overnight.

    After some thinking I noticed that the results I provided are for exactly the least important part of my computer's data. It is dispensable and the only such part.
    The rest would have much different stats. Size-wise, almost all that I care about is multimedia. Already compressed and not duplicated. And most people that I know have sizeable multimedia collections. That's different from what MS analysed - workstations. Something tells me that servers would be yet another kind.


    I run the test on my Debian workstation today. First impression - "Oh, it compiles". And it run too. However Unix's "Everything is a file" caused me troubles because the process was getting stuck on some weird things. In the end I wrote a shim that filtered out non-files, TTYs, files that would block on open() and the process worked. Still, I had to exclude some things like /sys and /proc. I left the shim at work, sorry.
    Results:
    Code:
      Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file  .ext 
    ------ -------------- ---- ---- ---- --- ------ ------ ------ -------------- 
    314634   11169.722986  256  479    7 256  1=151  4=36   2=24 
      1099    1461.975672   13  376    8 256  3=6    4=5    5=5   .deb 
     49285     763.892161  191  958    0 256  4=64   1=50  11=41  .svn-base 
      3575     756.406081  397  300   17 256  1=231  4=42   8=36  .so 
      1226     512.336330   90  440    6 256  1=31   2=28   3=16  .jar 
        74     492.667118   64  208    6 256  4=128  2=69   8=17  .wav 
      1063     308.277113  407  329   17 256  1=233  8=36   4=35  .0 
     13104     302.232254   15  379    8 256  5=6    2=6    3=5   .gz 
        19     268.934071  999    0    0 256  1=999  2=0    4=0   .data 
         3     268.797572    4  383    8 256  1=4    2=4    3=4   .rar 
       633     266.258786  441  313   13 256  1=258 24=42   2=39  .1 
      6271     238.622372  310  599    1 256  1=102  3=90   2=57  .mo 
     18835     194.948016   27  377    7 256  1=28   3=7    2=6   .png 
        39     150.083584  618  283    1 256  1=472  2=41  10=41  .sqlite 
        26     123.302590  566   99    7 256  1=455  4=258  2=73  .cache 
       211     120.734913  356  312   21 256  1=181  8=40   4=34  .4 
         7     120.603888  796   88    1 256  1=731  2=24   8=18  .gch
    It shows yet another problem with the tester on Unices. Binaries are missing because they normally don't have extensions. I think that extensionless files are worth being supported, they would probably make a top 3. And that machine has compressible jars too.

    I left work when it was parsing my company's NFS share. I might censor the results somewhat, but I think I can publish some data of the experiment. And I intend to run it on some server.

    Also, I think it would be very useful to get duplication info. Especially both with fixed-block and file levels.
    Last edited by m^2; 26th October 2012 at 20:27.

  16. #16
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    ok i decided to make a new post for more readablity

    Code:
     Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
     ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
     77406  316059.837911  115  366    5 256  1=93   4=47   3=31  
      3662  135861.722437   80  380    4 256  1=80   4=67   2=58  .psd
     42567   96638.094431   13  376    6 256  1=6    2=5    3=5   .JPG
     15974   19035.743831   22  375    8 256 17=9    1=7    2=6   .jpg
       142    9382.829899  535  589    1 256  4=508  8=62   1=58  .tif
       133    6928.513024  411  376    3 256  1=226  3=215  6=35  .pub
       952    6602.762040    9  381    7 256  1=8    2=7    3=7   .X3F
       205    3565.635520    5  382    8 256  1=5    2=4    3=4   .SRF
       934    3415.311658   46  378    7 256  3=24   1=23   4=9   .doc
    ok looks quit equal

  17. #17
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Results for my laptop (excluding external drives):

    Code:
    Files      Size (MB) Pred Text e8e9 Alp    Top 3 Periods     dir/ file .ext
    ------ -------------- ---- ---- ---- --- ------ ------ ------ --------------
    859438  685558.788050   75  391    7 256  1=51   2=13   4=11  
     25167  170925.412326   49  383    7 256  1=50   2=5    3=5   .mp3
       181   83523.046765   15  379    8 256  1=15   2=5    4=4   .mkv
      1698   37521.209285   12  373    7 256  1=10   3=6    2=5   .mp4
      5542   34024.297456    7  382    8 256  1=6    2=5    3=5   .zip
       706   31189.686383    6  380    8 256  1=6    2=4    3=4   .rar
      2101   29163.256622   17  374    7 256  1=17   3=6    4=6   .flv
    114762   28978.843731   35  376    8 256 17=12   1=12   2=8   .jpg
       173   25750.193787   26  371    9 256  1=16   3=11   2=7   .avi
        85   17652.327812   13  368    7 256  1=11   3=6    2=5   .m4v
     35401   17471.221711  370  349   11 256  1=156  4=73   2=68  .dll
      9778   13110.390552  138  365   10 256  1=67   4=25   2=23  .exe
     64546    9022.939536   16  376    7 256  1=14   3=6    2=6   .png
       472    6445.276874  293  542    1 256  1=126  2=96  11=63  .mshi
      1630    5478.949678  140  356    6 256  1=81   4=19   2=18  .dat
       397    5337.626258   34  383    7 256  1=20   2=17   8=9   .mshc
     28806    5087.746149   14  388    7 256  3=12   9=11   6=8   .gif
       444    4814.902695    8  378    7 256  1=7    3=5    4=5   .webm
      5919    4629.545804  146 1000    0 232  4=35   1=30   8=26  .mht
         9    4613.145600   17  378    8 256  1=17   2=4    3=4   .hd
         4    4118.258357  459  376    6 256  1=332  4=48   2=32  .vdi
      2376    3683.120186   10  381    8 256  1=6    2=6    4=5   .swf

  18. #18
    Member
    Join Date
    Oct 2012
    Location
    Uzbekistan
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    I'm not going to leave my PC on overnight.
    Just use hibernation

  19. #19
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    You mean "Stand by"? My conputer doesn't have anything named hibernation in the shutdown menu.
    IIRC it shuts down only parts of the machine, in particular, fans don't stop.

  20. #20
    Member
    Join Date
    May 2007
    Location
    Poland
    Posts
    85
    Thanks
    8
    Thanked 3 Times in 3 Posts
    http://support.microsoft.com/kb/920730
    Also nice trick with a timer (puts PC on hibernate after exactly 1 hour)
    Code:
    timeout /t 3600 /NOBREAK && Shutdown /h

  21. #21
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by m^2 View Post
    IIRC it shuts down only parts of the machine, in particular, fans don't stop.
    Hibernation saves the RAM to HDD (hiberfile.sys in root of system drive) and then shuts down the machine entirely. The Stand by (or Sleep mode) you describe works the same on desktops as it does on notebooks - as long as you don't cut the power, you wake up instantaneously and consume a bit of power all the time to keep your session in RAM. The hibernation may be disabled by default (or by power profile?).
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  22. #22
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by jethro View Post
    http://support.microsoft.com/kb/920730
    Also nice trick with a timer (puts PC on hibernate after exactly 1 hour)
    Code:
    timeout /t 3600 /NOBREAK && Shutdown /h
    It's a Vista thing:
    https://support.microsoft.com/kb/920730#appliesto
    So thx for the suggestion, but it's not for me.

  23. #23
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear dr. Mahoney,

    I slightly do not agree with you. You assume that data are mixed with programs and system. But I thing that most of users separates their data files from others files. If you wish I can attempt to analyze user?s directories for better knowledge about their properties and to send a bit changed/modified results (security reasons).

    Sincerely yours,
    Fatbit

  24. #24
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. m^2,

    in WinXP command shutdown works with slightly different parameters.

    Sincerely yours,
    Fatbit

    Shutdown
    Allows you to shut down or restart a local or remote computer. Used without parameters, shutdown will logoff the current user.
    Syntax
    shutdown [{-l|-s|-r|-a}] [-f] [-m [\\ComputerName]] [-t xx] [-c "message"] [-d[u][p]:xx:yy]
    Parameters
    -l : Logs off the current user, this is also the defualt. -m ComputerName takes precedence.
    -s : Shuts down the local computer.
    -r : Reboots after shutdown.
    -a : Aborts shutdown. Ignores other parameters, except -l and ComputerName. You can only use -a during the time-out period.
    -f : Forces running applications to close.
    -m [\\ComputerName] : Specifies the computer that you want to shut down.
    -t xx : Sets the timer for system shutdown in xx seconds. The default is 20 seconds.
    -c "message" : Specifies a message to be displayed in the Message area of the System Shutdown window. You can use a maximum of 127 characters. You must enclose the message in quotation marks.
    -d [u][p]:xx:yy : Lists the reason code for the shutdown. The following table lists the different values.
    Examples
    To shut down \\MyServer in 60 seconds, force running applications to close, restart the computer after shutdown, indicate a user code, indicate that the shutdown is planned, log major reason code 125, and log minor reason code 1, type:
    shutdown -r -f -m \\MyServer -t 60 -d up:125:1
    PsShutdown is a command-line utility similar to the shutdown utility from the Windows 2000 Resource Kit, but with the ability to do much more. In addition to supporting the same options for shutting down or rebooting the local or a remote computer, PsShutdown can logoff the console user or lock the console (locking requires Windows 2000 or higher). PsShutdown requires no manual installation of client software.
    usage: psshutdown [[\\computer[,computer[,..] | @file [-u user [-p psswd]]] -s|-r|-h|-d|-k|-a|-l|-o [-f] [-c] [-t nn|h:m] [-n s] [-v nn] [-e [u|p]:xx:yy] [-m "message"]
    computer Perform the command on the remote computer or computers specified. If you omit the computer name the command runs on the local system, and if you specify a wildcard (\\*), the command runs on all computers in the current domain.
    @file Run the command on each computer listed in the text file specified.
    -u Specifies optional user name for login to remote computer.
    -p Specifies optional password for user name. If you omit this you will be prompted to enter a hidden password.
    -a Aborts a shutdown (only possible while a countdown is in progress)
    -c Allow the shutdown to be aborted by the interactive user
    -d Suspend the computer
    -e Shutdown reason code.
    -f Forces all running applications to exit during the shutdown instead of giving them a chance to gracefully save their data
    -h Hibernate the computer
    -k Poweroff the computer (reboot if poweroff is not supported)
    -l Lock the computer
    -m This option lets you specify a message to display to logged-on users when a shutdown countdown commences
    -n Specifies timeout in seconds connecting to remote computers
    -o Logoff the console user
    -r Reboot after shutdown
    -s Shutdown without poweroff
    -t Specifies the countdown in seconds until the shutdown (default: 20 seconds) or the time of shutdown (in 24 hour notation)
    -v Display message for the specified number of seconds before the shutdown. If you omit this parameter the shutdown notification dialog displays and specifying a value of 0 results in no dialog.

  25. #25
    Member
    Join Date
    Oct 2012
    Location
    Uzbekistan
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    My conputer doesn't have anything named hibernation in the shutdown menu.
    All Windows versions above Win 2k is capable of hibernation. Maybe hibernation is disabled on your system, try "powercfg -h on". If it's don't work then i suspect that some of your hardware don't support hibernation ...

  26. #26
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Thanks, for the info, I searched the net and found I can add it to the shutdown menu in Control Panel's power options.
    There's a problem that I can't actually because it requires as much free space on the system disk as I have RAM. I guess I could find some obscure registry key that would enable me to use another, but the machine needs cleanup anyway.
    Thank you all for the answers.

  27. #27
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by david_werecat View Post
    Here are additional builds for Windows 32 and 64 bit. I changed the code to use better file buffering, resulting in a 2x speedup.

    note: The builds marked with "Hdr" check for file headers, so are slower.
    FYI: The builds require VC 2012 dlls and that dlls don't work on XP.
    What a low move from MS...

  28. #28
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    As a memory conservation measure, I added options not to save stats for individual files and directories, only extensions. And extensionless files have their position too.
    Included: source and a Windows x64 build that stats extensions only.
    Attached Files Attached Files

  29. #29
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    A modified version:
    * instead of alphabet size it calculates Shannon entropy
    * uses megabytes instead of SI-megabytes
    * replaced tabs with spaces
    * requires C++11

    Entropy is in bits/byte
    Edit: bugfix
    Edit2: a tiny optimisation suddenly made the code faster by 1/3 on my PC. Overall a version that stats only file extensions is 2-3 times faster than no header version from DW depending on directory hierarchy in here.
    Attached Files Attached Files
    Last edited by m^2; 28th October 2012 at 23:57.

  30. #30
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    There is only one megabyte. Perhaps you're referring to http://en.wikipedia.org/wiki/Mebibyte

Page 1 of 2 12 LastLast

Similar Threads

  1. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 22:45
  2. Most efficient/practical compression method for short strings?
    By never frog in forum Data Compression
    Replies: 6
    Last Post: 1st September 2009, 04:05
  3. MaximumCompression.com Benchmarks
    By osmanturan in forum Data Compression
    Replies: 29
    Last Post: 5th May 2009, 10:31
  4. Maximum Practical Compression
    By Bulat Ziganshin in forum Forum Archive
    Replies: 5
    Last Post: 31st March 2008, 15:20
  5. Best practical archiver
    By nimdamsk in forum Forum Archive
    Replies: 34
    Last Post: 24th March 2007, 21:51

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •