Page 74 of 85 FirstFirst ... 2464727374757684 ... LastLast
Results 2,191 to 2,220 of 2523

Thread: zpaq updates

  1. #2191
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    I released v7.08. I realize that multi-part archives are a useful feature so I am looking at restoring this feature at least partially in the next version. However, I seem to be spending a lot of time on it without success so far, so for now I am releasing without it. If you really need it then keep using v7.07 for now. All older versions (source and exe) are available at http://mattmahoney.net/dc/zpaq.html

    > - restructure the code e.g. one class per file to minimize touching points and better overwiew

    The best way to make the code easier to understand and update is to make it smaller. I removed about 650 lines of code. Removing multi-part archives allowed me to greatly simplify the Archive class and to eliminate the InputFile and OutputFile classes by writing Windows API versions of fopen(), fread(), fseeko(), etc to handle UTF-16 file names.

    Code:
    >wc zpaq706.cpp libzpaq706.*
      3962  13463 131731 zpaq706.cpp
      7732  34647 272835 libzpaq706.cpp
      1509   9365  61660 libzpaq706.h
     13203  57475 466226 total
    
    >wc zpaq708.cpp libzpaq708.*
      3308  11334 112834 zpaq708.cpp
      7732  34647 272835 libzpaq708.cpp
      1509   9365  61661 libzpaq708.h
     12549  55346 447330 total
    
    >wc zpaq70[68].pod
      751  5083 32101 zpaq706.pod
      709  4699 29786 zpaq708.pod
    
    875,567 zpaq706.zip
    606,183 zpaq708.zip
    > PS please re-add a "test" command.

    I could, but it would just be syntactic sugar for extract -test. If you want a more thorough (but slower) test, use unzpaq206. It checks for strict adherence to the standard, unlike zpaq which is designed to recover from some errors. But either one will probably do what you want, which is to decompress, verify against the stored hashes, and discard the output without writing anything to disk.

    > With closed closed communication it's much more difficult to build a community then with open and a community will help to get zpaq better and more popular.

    People contact me on this forum and by email. I usually respond quickly.

    Remember that zpaq is public domain open source. If there is a feature you want and you don't want to wait for me to implement it, then you can make your own fork and change it how you like.

    I realize I am going against the trend of most software projects to become more bloated with features. Look at the changelogs for rar, 7zip, freearc, etc. How much time do they spend fixing arcane bugs in features you probably never use?

  2. #2192
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    116
    Thanks
    35
    Thanked 28 Times in 23 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I realize I am going against the trend of most software projects to become more bloated with features. Look at the changelogs for rar, 7zip, freearc, etc. How much time do they spend fixing arcane bugs in features you probably never use?
    Ahhh. The luxury of not having a marketing department on your back !

    "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away". Antoine de Saint-Exupery

  3. #2193
    Member
    Join Date
    Mar 2015
    Location
    Bulgaria
    Posts
    47
    Thanks
    0
    Thanked 9 Times in 7 Posts
    +1 for multipart archives.. it's invaluable for system admins... Matt, data deduplication + multipart is one of the smartest features ever of zpaq... I use them nonstop ;]

    current implementation of multipart is OK, but the new workaround way for extract may not be the best in some scenarios

    i.e. The idea of downloading all parts of archive locally, then extract it seems not good ;(
    For example u sync two locations via dropbox, one site constantly adding, other site constantly extracting. U need to concatenate all parts for every extract, right?

  4. The Following User Says Thank You to MiroGeorg For This Useful Post:

    batchman61 (3rd April 2016)

  5. #2194
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    i do not agree on -test.
    i think that test is a very common task, so should be a command, not a switch
    add, list, extract, test
    you could see extract as a list, or list as extract etc
    there is no "ieee-glossary" so, in this case, i suggest commmon sense.

    on avoidance of code bloat: this is a very good thing, BUT not a good reason to strip off valuable functions like multipart, afaik

    finally: think about change the terrible -to on extract
    this is really the most understandable behaviour of all compressor i ever seen.
    i know that is very easy to do a stringreplace, but almost all (or better ALL) other work with relative path
    so zpaq e pippo.zpaq c:/extracthere to write into c;/extracthere folder


    ps sorry i am using a smartphone without english support tonight

  6. #2195
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Even with the previous version, you already needed all parts to extract because newer parts can point to fragments in older parts. It looks like I will be able to re-implement remote multipart archives in a more limited way without adding back all 650 lines of code. (That is where most of the savings came from). The way I have in mind is like this:

    zpaq add arc???? files...

    This creates or updates arc0000.zidx (instead of arc0000.zpaq) as a small local index, and just like before, creates parts arc0001.zpaq, arc0002.zpaq, etc, which you can move to a remote location. To restore, you have to move them back. Then you have the extra step of concatenating them before you can extract.

    In Windows: copy/b arc*.zpaq new.zpaq
    Or in Linux: cat arc*.zpaq > new.zpaq

    Then you can extract as usual: zpaq extract new.zpaq

    For the typical case where you have a big initial backup and lots of small updates, you can concatenate without needing twice as much disk space:

    mv arc0001.zpaq new.zpaq
    cat arc*.zpaq >> new.zpaq

    The other difference is that the index is required to update whether or not the other parts are present. I think with these restrictions it will be possible to add this feature back with probably 50 lines of code or less. More importantly, it will be conceptually easier to describe and understand in the documentation.

    > i do not agree on -test.

    In Windows, create test.bat so you can say "test archive.zpaq"
    Code:
    zpaq extract %1 -all -test
    I agree that -to is confusing. Try: zpaq e pippo.zpaq -to c:/extractthere

    I think that will do what you want whether the archive stores absolute or relative paths.

    foo -> c:/extractthere/foo
    c:/foo -> c:/extractthere/c/foo

  7. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    batchman61 (3rd April 2016)

  8. #2196
    Member
    Join Date
    Mar 2016
    Location
    Germany
    Posts
    4
    Thanks
    5
    Thanked 4 Times in 2 Posts
    Very happy to read "just like before, creates parts arc0001.zpaq, arc0002.zpaq".
    Does "small local index arc0000.zidx" mean the index file only has a new extension and the same content like before ?

    Extract prerequisite "custom c
    oncatenation of multipart files to an archive" seems to be a significant change / degrade to me.
    First, it doesn't sound exactly like "less to read and easier to understand" and second, afaics it may make things more complex for addons like Totalcommander Plugin.

  9. The Following User Says Thank You to batchman61 For This Useful Post:

    mhajicek (4th April 2016)

  10. #2197
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    The reason for changing the filename extension is to not confuse it with a regular archive. Also, "cat *.zpaq" or "copy/b *.zpaq" would include the index otherwise. I probably should change the extension for parts too, like ".zprt" or something, because they would not work as regular archives.

  11. #2198
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    dear mr matt, what is exactly the issue with multipart extraction? just more code, or not working properly?
    in the first case not a big deal, at lease for me.
    brutal (no extraction) multipart take me about 5 line of code patching zpaq
    how about a UNPAQ with extraction of multipart archive?
    so you can have a little zpaq (to make multipart) and a bigger unpaq (if someone need multi extract)
    i preferer single zpaq, indeed
    another suggestion: make sure (check) that zpaq can work with exe extension instead of zpaq, to become resilient to cryptolocker
    this is a very smart workaround for backup

  12. #2199
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    370
    Thanks
    26
    Thanked 22 Times in 15 Posts
    @Matt what happens if zpaq opens an level 2 archive with a block not from type d,c h, i?

  13. #2200
    Member
    Join Date
    Mar 2015
    Location
    Bulgaria
    Posts
    47
    Thanks
    0
    Thanked 9 Times in 7 Posts
    multipart example REAL scenario...

    Example arhive is ~500GB initially. next parts are about 30-50-gb/day... one day u decide to decompress... total archive is about 1.5TB. so... 1st u need 1.5TB free hdd, custom script to concatenate + 5hours (if downloading at 100MB/s via LAN). then u can begin extracting, for example for another 5 hours ... and this is BEST case.

    The worst case: at remote location u didn't have 1,5TB free. Downloading 1.5TB over WAN isn't good alternative either.
    So.. zpaq multipart is VERY BEST for transferring large amount of data at remote locations... then at remoet location u cannot extract or test easily...


    In the near future SSD may replace HDD. Extracting this way may wear out much faster SSD.

    Is it really worth implementing multipart in this way.. who will use custom-script to concatenate before every extract...? Let other participants try to foresee according to their scenarios...

  14. #2201
    Member
    Join Date
    Jul 2014
    Location
    Russia
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    There is a problem!
    Example zpaq archive primes.zpaq became broken in versions after 7.05!
    I attached that file inside zip archive.
    What happened? Why new versions can't generate prime numbers?
    Attached Files Attached Files

  15. #2202
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    I released zpaq 7.09. It should fix the bug in extracting primes.zpaq. It failed when there was no stored filename in streaming format. The correct behavior is to drop the .zpaq extension.

    I am still working on adding back multi-part archives. It seems to be a harder problem. fcorbelli, do you have a 5 line patch to do it? I can't figure out how to do it without writing about 100-200 lines of code

    > @Matt what happens if zpaq opens an level 2 archive with a block not from type d,c h, i?

    zpaq will ignore the block and continue with a warning. unzpaq206 will give an error and stop, since it tests for strict compliance. zpaqd will extract the raw data because it doesn't know anything about journaling format.

  16. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    mlogic (6th April 2016)

  17. #2203
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    I have a pre-release of zpaq v7.10 which restores multi-part archives. http://mattmahoney.net/dc/zpaq710.zip

    It includes extract (no need to manually concatenate). It works pretty much like earlier versions. You use * or ???? in the archive name, which are matched to part numbers 0 and higher. Part 0 is the index. Each update creates a new part (1, 2, 3...) and updates the index. For example:

    zpaq add "x??" files (creates index x00.zpaq and part x01.zpaq)
    zpaq add "x??" files (updates index x00.zpaq and creates part x02.zpaq)
    zpaq extract "x??" (extracts from parts 1 and 2, ignoring the index)
    zpaq list "x??" (lists the parts, ignoring the index)
    zpaq list x00 (lists the index, which should show the same result).

    In the Windows version you don't need quotes to prevent * and ?? from being expanded.

    Concatenating the parts not including the index gives you a regular archive. This works even if encrypted.

    You only need the index to update. Thus, you can move the other parts to a remote location and retrieve them when you want to extract. You only need the other parts (not the index) to extract.

    It is an error to update an index as a regular archive and vice versa. It is an error to extract from an index or any single parts other than part 1 if they contain pointers to earlier parts. You can list individual parts.

    A few minor differences from v7.07.
    - The index is required to update even if the other parts are present.
    - It is an error to create a part that already exists.
    - add -until will truncate the index but not delete the future parts. You have to delete them first to avoid an error.
    - add, extract, and list don't check that the parts are consistent with the index.
    - -method i only works on a multi-part archive. It updates the index and discards the part that would have been created. This can cause the index to get out of sync with the other parts. You can fix with -until to roll back to before you did this.

    I need to do more testing before release. It probably has bugs.

  18. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    batchman61 (6th April 2016),Eppesuig (8th April 2016)

  19. #2204
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I am still working on adding back multi-part archives. It seems to be a harder problem. fcorbelli, do you have a 5 line patch to do it? I can't figure out how to do it without writing about 100-200 lines of code
    Simply write back splitted archive without mercy .
    I am not sure if I have my own multipart zpaq, I'll check at home or search in this forum

    Briefly where the file is updated, here write to the delta file (no naming scheme, no index, no nothing).

    Very ugly, but very effective.

    EDIT: http://encode.ru/threads/456-zpaq-up...6251#post36251

    AFAIK the problem is extracting without joining back.
    Personally do not need advanced naming scheme and so on (if this can save code).

    Even splitted index only is not so important for me, if this is a big issue to maintain
    http://encode.ru/threads/1955-FastBa...ll=1#post38113


    priority:

    1) split the archive 100%
    2) comfortable way to extract without cat 99%
    3) index-only-update 30%: good, but not so useful for rsync

    No other bells and whistles, at least for me.

    EDIT:
    0) HERE the very code-bloat, but useful feature.
    (I think I write my own patch (... it's about 20years that I do not write C, but...))
    A database-like INDEX file with text on versions.
    As previously stated something like
    Code:
    add... -note "version707a"
    ...
    Code:
    add ... - note "version 710"
    to get
    Code:
    extract ... "version 710"
    is VERY welcome, if this can be "injected" into index file without too problems

    PS sorry, but I'm a guru-delphi developer... so I am not very quick "decoding" C...

  20. #2205
    Member
    Join Date
    Mar 2016
    Location
    Germany
    Posts
    4
    Thanks
    5
    Thanked 4 Times in 2 Posts
    another minor difference: 7.10 pre no longer creates directories (eg. add D:\tmp\rStore-Test\rStore_????.zpaq fails when D:\tmp\rStore-Test doesn't exist). A minor request as well: can you add a command ver to show version (v7.10) ? This would allow to handle version specific differences when needed.
    Last edited by batchman61; 6th April 2016 at 17:45.

  21. #2206
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Omitting the separate index does simplify the problem. But if I do that, some people will complain that they can't move the archive parts offsite. The whole reason for even having an index is that rsync doesn't work right, even though it is supposed to only upload the new part of an appended file.

    Also, does your code work with encryption? I could drop encryption entirely, but again, somebody will complain if I do. Does it work with -until?

    Anyway, the new version is 200 lines longer, so still a 450 line savings. The big change is splitting class Archive into InputArchive and OutputArchive. I'm going to fix some minor things before release.

    Also one other change I forgot to mention. In a multi-part archive, only the part and not the index are transacted. In a normal archive, if you interrupt an update with Ctrl-C, then the appended part with start with an invalid C block (csize = -1) to mark the end of the archive. List and extract will ignore the appended part, and the next add will overwrite it. The C block is overwritten with a valid block (csize = size of all the compressed D blocks) as the last step in an update.

    In a multi-part archive, the new part is transacted as usual, but nothing is written to the index until the compression is complete and it is time to append the meta-data. This step is usually fast. It writes a valid C block to the index (csize = 0 because there are no D blocks). Then it append the meta data to both the index and part, and finally goes back to rewrite the C block only in the part. If you interrupt zpaq, then most likely the index will not be updated at all and you will have a partial archive part that you will need to manually delete before the next update. If you don't delete it, then zpaq will remind you with an error. In the rare case where you manage to interrupt it while it is writing the metadata, then you will need -until to truncate the partially updated index.

  22. #2207
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Omitting the separate index does simplify the problem. But if I do that, some people will complain that they can't move the archive parts offsite. The whole reason for even having an index is that rsync doesn't work right, even though it is supposed to only upload the new part of an appended file.
    ???
    I do not have big issue with rsync-over-ssh.
    For me the only reason to use the index is legacy hardware (or cheap leased VPS), where computing the rolling hash of big files (say 30-100GB) can be time consuming.

    So this is good running rsync without the (maybe too) brutal --append switch (or if you do not run md5remotely via ssh, just to be sure), huge archives, slow hardware and unreliable network link.
    Code:
    rsync.exe --omit-dir-times --append --no-owner --no-perms --partial --progress -e "ssh -p %PORTA% -i %CHIAVE% "  -rlt  --delete "/cygdrive/%PERCORSOWIN%" "%UTENTE%@%SERVER%:/dati/synca/%UTENTE%/sicurezza" ssh -p %PORTA% -i %CHIAVE%  %UTENTE%@%SERVER% 'md5deep /dati/synca/%UTENTE%/sicurezza/%NOMEZPAQ%'
    md5deep %NOMEBACKUP%
    Useless otherwise, as far I know.


    =======
    Turning back on index: is it possible (without much effort) to store something else, as non-standard (aka old zpaq's format breaker),
    say a fixed length string (32 chars?) for every version, taken from command line?

    =======
    About my 2014 patch: I do not know, I drop just after you insert into zpaq this feature, but I think this should work with encryption or whatever, because taking the last part of the writted (added) data.

  23. #2208
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Minor update:
    - Bug fix in libzpaq.h. If Array throws "array too big" error, it left inconsistent state where destructor causes assertion failed.
    - -method i works on single part archives only (not multi part) as before.

    As for index, I could just take it out and see if anyone complains. I'm sure somebody would. I have no use for it, or for multipart for that matter. But somebody does.

  24. #2209
    Member
    Join Date
    Mar 2015
    Location
    Bulgaria
    Posts
    47
    Thanks
    0
    Thanked 9 Times in 7 Posts
    +1 for current index implementation. It's very clever.

  25. #2210
    Member
    Join Date
    Mar 2016
    Location
    Germany
    Posts
    4
    Thanks
    5
    Thanked 4 Times in 2 Posts
    +1 for index and yes, there is use for it. Afaik incremental archiving of (very) large files on limited storage by moving parts to another storage layer is an exclusive feature of zpaq. In (hopefully rare) case you need to restore a version it can be done on the layer where parts are stored and large file then copied to primary storage. This is an outstanding concept to me, I understood from others they make use of it day by day and it should attract more users (as long as it's stable).

  26. #2211
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by batchman61 View Post
    +1 for index and yes, there is use for it. Afaik incremental archiving of (very) large files on limited storage by moving parts to another storage layer is an exclusive feature of zpaq.
    Talking about compressor, yes.
    On image software for Windows no, but in effect with a "differential" (not incremental) backup.

    The usefulness of index is, for me, limited, because you need "somewhere" the initial big chunck to extract (you cannot throw away, of course).

    And if this is big (say terabyte+) a lot of time is needed to put "all the pieces" togheter (if you ask Matt to extract multipart from different path I think a volcano will rise )

    So, maybe, this can be useful for a "backup everytime, restore almost never" scenario.
    For someone this can be worth, no dubt.

    I have no use for it, or for multipart for that matter. But somebody does.
    Maybe because you do not think (or work) as sysadm, or DBA, or even worse virtual-sysadm

  27. #2212
    Member
    Join Date
    Apr 2016
    Location
    Poland
    Posts
    16
    Thanks
    3
    Thanked 1 Time in 1 Post
    +1 for ZPAQ multi-part support. I'm using this feature everyday. Thank you Mr Mahoney for reintroducing it in 7.10.

    My second concern is about comparing content of two the same archives in separate locations during process I called for my needs 'consolidation'. I have remote branch office servers and HQ backup server. There is scheduled task on branch servers which backups user data (using multi-part feature) then new archive part is copied (not moved) to HQ server and than put on tapes using GFS scenario. Once per month I'm executing mentioned consolidation to purge deleted user data. I don't need to keep their files forever as retention is provided by tape archives. Consolidation is simple proces of unpacking last archive version and compressing it again to new archive as first version. This is done independently on branch office server and HQ server (that's why new data part archive is copied not moved to HQ server) to avoid resending first big archive after consolidation through WAN connection. And here is my concern: how to compare if those two consolidated archives are the same? Using MD5 (or any other) checksum of XXX-001.zpaq files is not the proper way as in zpaq files timestamp of operation is stored thus checksums will be different. Curently I'm using

    zpaq l <remote consolidated backup> <local files to backup> -not =

    command to achieve my goal but I'm not convinced if this is enough (this is done late night so change of user data during comparison is unlike) and feel it is not reliable. What would be helpful? To have possibility to generate single checksum of all data in archive. If both archives (remote branch and HQ) have it same then I'm sure both archives are the same. Mr Mahoney: is it possible to add option to return such chcecksum? Or, perhaps, is there a different way which I missed to achieve my goal?

  28. #2213
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by starczek View Post
    My second concern is about comparing...
    I think you want something like "list filenames WITH hash".
    Then sort, awk & compare.

  29. #2214
    Member
    Join Date
    Apr 2016
    Location
    Poland
    Posts
    16
    Thanks
    3
    Thanked 1 Time in 1 Post
    Quote Originally Posted by fcorbelli View Post
    I think you want something like "list filenames WITH hash".
    Then sort, awk & compare.
    Rather one checksum for all archive content - i.e. as one would concatenate all files in zpaq archive sorted by name, extension or date/time and calculate single checksum. I do not need hash for every file as I only need to make sure that 'consolidated' *.zpaq files both on servers are the same.

    Another solution would be to implement purge command - something that once was implemented. Then I could remove from archive versions older then one month (for example) using built-in zpaq feature.

    EDIT
    I think I found solution. It is quite embarrassing but it is the obvious one... It is using an index file! Since my transfer method over WAN gives me 100% sure that files have been transfered without problems I can just move (this time not copy but move) 001,002,... archives from branch to HQ leaving only index file (000) on branch server. Then after consolidation on HQ server all I have to do is to copy only index (000) from HQ consolidated version to branch server (every branch server have separate backup set). As a result backup procedure on branch server will be aware what files are in consolidated backup on HQ server and during next archiving only small file containing differences will be created. And that small file will moved from branch to HQ. And procedure repeats.

    I know my case is a bit off-topic from main focus if this thread. Sorry for that.
    Last edited by starczek; 8th April 2016 at 00:57. Reason: enlightenment

  30. #2215
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    I updated v7.10 again. I separated the multi-part and indexing features. Sometimes it makes more sense to have a multi-part archive without an index or an indexed single part archive. There is no need to tie these features together. So if you use * or ? in an archive name then you get a sequence of parts starting at 1 which are equivalent to their concatenation. There is no part 0. The add command creates a new archive using the next available part number. Extract and list work as usual.

    The option "-index <filename>" creates or updates an index for a remote archive, which is assumed to be consistent with it. For example, each update could be the following steps:

    zpaq add arc files -index in.zpaq (create arc.zpaq with changes and append metadata to in.zpaq)
    cat arc.zpaq >> remote.zpaq
    rm arc.zpaq

    If arc is multipart, then zpaq will guess the part number from the index as before. So "arc??" would create arc01.zpaq, arc02.zpaq, etc. You would not have to concatenate the parts. In other words the following are equivalent:

    zpaq707 add "arc??" files
    zpaq710 add "arc??" files -index arc00.zpaq

    Both features work with encryption. The archive and index are encrypted with the same key but different AES-256 CTR keystreams by changing the first byte of the salt. zpaq calculates the archive size from the index so it can use the correct CTR offset.

    Neither feature works with add -until, which truncates a normal archive. You can't index a streaming archive (-method s...). -index fails if the archive exists. As usual you can list or compare an index but not add or extract, provided you give it a .zpaq extension. It doesn't add one automatically.

    I also fixed a bug in v7.07-7.09 that caused extract -force to give a segmentation fault in Linux if the output file does not already exist. (It was closing a NULL FILE* in equal(), which somehow works in Windows. I still need more testing before release.

    Anyway, one way to compare two archives (or an archive and an index) is to list them (maybe with -all) and compare the output with diff. This will show differences in sizes or dates but not file contents. To compare contents, you would have to extract one and compare to the other. You can compare like this:

    zpaq list archive dir (compare external dir with internal dir by dates and attributes).
    zpaq list archive dir -force (ignore dates and compare contents with stored SHA-1 hashes, slower).
    zpaq list archive dir1 -to dir2 (compare external dir1 to internal dir2).

    It is theoretically possible to compare the fragment hashes between two archives to compare contents without extracting provided they were created with the same -fragment option. I don't have the code for it, however.

  31. #2216
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I updated v7.10 again...It is theoretically possible to compare the fragment hashes between two archives to compare contents without extracting provided they were created with the same -fragment option. I don't have the code for it, however.
    Thanks Matt for your work.
    Just a question: the hashes of the files is stored, or is "recomputed" via the fragments?
    Is it possible to list() with hash exposed?

  32. #2217
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by starczek View Post
    Rather one checksum for all archive content - i.e. as one would concatenate all files in zpaq archive sorted by name, extension or date/time and calculate single checksum. I do not need hash for every file as I only need to make sure that 'consolidated' *.zpaq files both on servers are the same.
    I know, but I do not think that a "big-single-hash" is present.
    And, if you get a difference, you cannot know "how" (which file is different)

  33. #2218
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    zpaqd will list hashes by ID, SHA-1, and size. Then it lists files followed by a list of fragment IDs. For example:

    Code:
    >zpaq a x enwik5 enwik6
    zpaq v7.10 journaling archiver, compiled Apr  7 2016
    Creating x.zpaq at offset 0 + 0
    Adding 1.100000 MB in 2 files -method 14 -threads 2 at 2016-04-08 15:09:30.
    90.91% 0:00:00 + enwik6 1000000
    100.00% 0:00:00 + enwik5 100000 -> 2904
    100.00% 0:00:00 [1..13] 1002964 -method 14,98,1
    2 +added, 0 -removed.
    
    0.000000 + (1.100000 -> 1.002904 -> 0.380685) = 0.380685 MB
    0.250 seconds (all OK)
    
    >zpaqd l x
    Block 1 at 0: 0.000 MB
    comp 0 0 0 0 0
    hcomp
    post 0 end
      4e20ea61 jDC20160408150930c0000000001 8 jDC☺ -> 103
      csize = 379664
    
    Block 2 at 104: 1.116 MB
    comp 9 16 0 20 0
    hcomp
     c-- *c=a a+= 255 d=a *d=c halt
    pcomp ;
     a> 255 jf 13 (to 17) a=0 b=0 c=0 d=0 r=a 1 r=a 2 r=a 3
     r=a 4 halt
     (17) a<<=d a+=c c=a a= 8 a+=d d=a a=r 1 a== 0 jf 51 (to 81)
     a= 1 r=a 2 a=c a&= 3 a> 0 jf 30 (to 71) a-- a<<= 3 r=a 3
     a=c a>>= 2 c=a b=r 3 a&= 7 a+=b r=a 3 a=c a>>= 3
     c=a a=d a-= 5 d=a a= 1 r=a 1 jmp 10 (to 81)
     (71) a=c a>>= 2 c=a d-- d-- a= 3 r=a 1
     (81) a=r 1 a== 1 jf 61 (to 148) a=d a> 2 jf 56 (to 148) a=c a&= 1 a== 1
     jf 21 (to 120) a=c a>>= 1 c=a b=r 2 a=c a&= 1 a+=b a+=b
     r=a 2 a=c a>>= 1 c=a d-- d-- jmp 26 (to 146)
     (120) a=c a>>= 1 c=a a=r 2 a<<= 2 b=a a=c a&= 3 a+=b
     r=a 2 a=c a>>= 2 c=a d-- d-- d-- a= 2 r=a 1
     (146) jmp -67 (to 81)
     (148) a=r 1 a== 2 jf 57 (to 211) a=r 3 a>d jt 52 (to 211) a=c r=a 6 a=d
     r=a 7 b=r 3 a= 1 a<<=b d=a a-- a&=c a+=d d=a
     b=r 4 a=b a-=d c=a d=r 2
     (182) a=d a> 0 jf 8 (to 195) d-- a=*c *b=a c++ b++ out
     jmp -13 (to 182)
     (195) a=b r=a 4 a=r 6 b=r 3 a>>=b c=a a=r 7 a-=b d=a
     a=0 r=a 1
     (211) a=r 1 a== 3 jf 43 (to 260) a=d a> 1 jf 38 (to 260) a=c a&= 1 a== 1
     jf 20 (to 249) a=c a>>= 1 c=a b=r 2 a&= 1 a+=b a+=b r=a 2
     a=c a>>= 1 c=a d-- d-- jmp 9 (to 258)
     (249) a=c a>>= 1 c=a d-- a= 4 r=a 1
     (258) jmp -49 (to 211)
     (260) a=r 1 a== 4 jf 34 (to 300) a=d a> 7 jf 29 (to 300) b=r 4 a=c *b=a
     out b++ a=b r=a 4 a=c a>>= 8 c=a a=d a-= 8
     d=a a=r 2 a-- r=a 2 a== 0 jf 3 (to 300) a=0 r=a 1
     (300) halt
    end
      3c5a480e jDC20160408150930d0000000001 1002964 jDC☺ -> 379663
    
    Block 3 at 379768: 0.000 MB (same model as block 1)
      44989e48 jDC20160408150930h0000000001 316 jDC☺ -> 413
      bsize = 379664
             1 bfe74853c56ddfc1bf3679f17315e4aa8774f6a9      97096
             2 049d60374cca5a77668ab28bf5665bea95b28a75      40605
             3 4730d81e92ad89d5997e2c3f8644cc8749415a44       6362
             4 880ff2b06a081c33de729c7339467e3704e5fa9d     171032
             5 d25b093a38bf9451d1def494a8d3bb697adee90d      92679
             6 98a0390fef53fdbb89e0461f449014a620dabb64      50294
             7 aba8d5254416be15ce29d6952dad7cad85a09303      94069
             8 de3a4cd75d94f5a0b1477bf3905386f850f17090     248559
             9 57dd45b79ea912325664bfc09bc927ebcca1e145      13664
            10 5565a9a1baa044ec8907ae742e63129e7dbc6ce0      54554
            11 442d2a43d591a8d005b4df2063c2a3a5263ab952      52462
            12 bea8e863b31cd2ee7c565c45f973fa5ec6e9bbf2      78624
            13 5325c289d1874ed7e004715b8c05339bda7b7077       2904
    
    Block 4 at 380182: 1.116 MB (same model as block 2)
      aa42b6e3 jDC20160408150930i0000000001 112 jDC☺ -> 502
      20110123200045 enwik5 7720000000 1 13
      20111020142224 enwik6 7720000000 1-12
    
    0.26 seconds
    The contents of the I block are the date in YYYYMMDDHHMMSS format, file name, attributes in hex (first byte 77 = 'w' for Windows or 75 = 'u' for Linux) and the fragment IDs. A range like 2-5 means 2 3 4 5.

    This will also work for indexes. To create or update an index in v7.10:

    zpaq add "" files -index index.zpaq -method 0

    -method i is no longer supported. -method 0 means no compression which has no effect other than to speed it up because the compressed data is being discarded into the empty archive "".

  34. #2219
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Another v7.10 unreleased update.

    - Fixed update to multi-part archive so that when nothing is updated, no new part is created.
    - Re-implemented check for intermittent disk access when updating. (It was in 7.09 but temporarily removed in 7.10 pre-releases).
    - Failure to add files (permission denied, etc) is a warning instead of an error. The file is reported and skipped and zpaq will return status 0 and report "all OK".
    - You can use options -f -mN -sN -tN as abbreviations for -force, -method N, -summary N, -threads N. Other options cannot be abbreviated. (Commands a, x, l still work).
    - -summary option has no default. Use -s20 to list 20 biggest files.
    - -detailed is removed. Use -s-1 to list fragment IDs.
    - New simpler help screen.
    - zpaq.pod updated to document changes.

    > another suggestion: make sure (check) that zpaq can work with exe extension instead of zpaq, to become resilient to cryptolocker

    You can already use any extension:

    zpaq a archive.exe files...
    zpaq x archive.exe

    It only adds .zpaq if there is no extension. In Windows you can also use archive. and there will be no extension.

    wc line, word, byte counts:
    Code:
    3881  13344  129154 zpaq705.cpp (last stable version Apr 2015-Mar 2016)
    3962  13463  131731 zpaq706.cpp (fixes fuzz testing crashes)
    3962  13463  131733 zpaq707.cpp (bug fix: encrypted multi-part index salt)
    3308  11334  112834 zpaq708.cpp (XP fix, removes multi-part, key prompt, -nodelete, add -test)
    3310  11341  112908 zpaq709.cpp (bug fix: extract unnamed streaming file)
    3491  11961  118137 zpaq710a.cpp (re-adds multi-part with part 0 index)
    3517  12052  119078 zpaq710b.cpp (-index replaces part 0)
    3496  12012  118560 zpaq710c.cpp (simpler command line parsing)
    Edit: released. http://mattmahoney.net/dc/zpaq.html

  35. #2220
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Thank you indeed, I'm validating 7.10 extracting ~1TB ~1M files created yesterday with 7.09 in a few hours (about 8-10 including check)

Page 74 of 85 FirstFirst ... 2464727374757684 ... LastLast

Similar Threads

  1. ZPAQ self extracting archives
    By Matt Mahoney in forum Data Compression
    Replies: 31
    Last Post: 17th April 2014, 03:39
  2. ZPAQ 1.05 preview
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 30th September 2009, 04:26
  3. zpaq 1.02 update
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 10th July 2009, 00:55
  4. Metacompressor.com benchmark updates
    By Sportman in forum Data Compression
    Replies: 79
    Last Post: 22nd April 2009, 03:24
  5. ZPAQ pre-release
    By Matt Mahoney in forum Data Compression
    Replies: 54
    Last Post: 23rd March 2009, 02:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •