Results 1 to 5 of 5

Thread: Archiver and compression utility suggestions

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Archiver and compression utility suggestions

    Hello,
    I am a newbie to the compression world and to this forum as well.


    I am looking for ways to do fast and smart compression and decompression for one of my datasets so I can quickly transfer data between a computer and a network share. My dataset constitutes a mix of large (multiple files ranging between 50-100MB) and a number of small sized files. The number of small files are at least ~5 times more than the large files. The data changes but the delta of the changes is fairly small, although the file metadata i.e., file last modified/accessed time, for most of the files gets periodically updated.


    Efficient compression is not my goal but I am looking for ways to:
    (1) create an archive of a given directory while compressing as quickly as possible without recompressing already compressed data (i.e., files/data that got compressed but haven't been modified).
    (2) extract an archive to a given directory by decompressing the files that got modified but deleting any additional files that may exist in source (in a mirror mode)


    The following is my scenario:
    (1) Compressed archive, say foo.archive gets created locally on computerA and is copied to network share
    (2) foo.archive gets copied from network to compA and extracted to the target directory


    When step (1) repeats, I am wondering if there is a utility that updates an existing foo.archive with the latest data in a given directory by not compressing already compressed data (in a way, deduplication). When step (2) repeats, the utility should not bother decompressing data that is on the disk and has not been modified but additionally mimics the files/folders in the archive (i.e., delete any additional files in the target directory).


    Is there an existing utility that meets the above requirement in full or partially? I would greatly appreciate any suggestions.


    Thanks!
    Last edited by TheEmptyMind; 8th July 2013 at 04:39.

  2. #2
    Member
    Join Date
    Apr 2013
    Location
    IT
    Posts
    71
    Thanks
    22
    Thanked 13 Times in 12 Posts
    ZPAQ http://mattmahoney.net/dc/zpaq.html has lot of options, recognize already compressed data and makes deduplication; can work in incremental mode and support journaling (useful for backup).
    It supports both fast compression/decompression methods and slow/high compression modes.
    Backward and forward compatibility are supported by embedding archive parameters and algorithms using a compact description, however the program is very actively updated (it's an open-source, the executable are released but it's also easy to compile and rebuild the executable from the source).
    In another thread in this forum there are specifically frequent updates and news about ZPAQ from its author.

  3. #3
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks. I briefly tested ZPAQ earlier and liked it's speed and the compression ratio, however I do not have a requirement to store changed data in the archive. I dropped a note with Matt (ZPAQ author/maintainer) wondering if there is a way to turn off versioning and he replied that there isn't a way today and may consider adding this feature later. He suggested that recreating a new archive periodically will help to not store multiple versions. Unfortunately, copying changed data back and forth between computers and network share and periodically recreating the archive from the scratch may negate the benefit I get by opting to compress in the first place.

  4. #4
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I just tested ZPAQ with two data sets generated in two consecutive days. The data set contains ~1200 files and of size ~185MB. I generated ZPAQ archive against the oldest day with options "zpaq64.exe a data.zpaq C:\local\data.old -threads 4 -method 0 -quiet" which generated about 142MB data in 15s (Intel Core 2 Duo and 3GB RAM). Now, I ran "zpaq64.exe a data.zpaq C:\local\data.new -threads 4 -method 0 -quiet" which took about 10s to run and created a .zpaq archive of 194MB (more than original size). This essentially means that I will have to recreate the archive almost every time from the scratch which may have more overhead. I realize ZPAQ goals, in it's current to form are different than what I am after, but I am curious to a see if I should tinker with other options, if any.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I would suggest the default method (-method 1). Method 0 just deduplicates and stores with no further compression. method 1 is usually faster than zip with better compression, and decompresses at the same speed as method 0 (faster than it can write to disk). zpaq internally detects data that is already compressed and just stores it, so if most of your data is already compressed it will probably be almost as fast.

    I am working on adding a feature to purge old versions from an archive. This will involve copying the archive skipping over compressed data blocks that are no longer referenced by the latest version of the index. This is not the ideal solution, but zpaq uses an append-only format which makes it easy to store multiple versions but hard to implement a more traditional archive where the old versions are replaced with each update.

    Incremental extract would also be a nice feature. Currently if you wanted to restore a directory tree to an earlier state you would have to delete the directory and then extract the version you wanted.

Similar Threads

  1. CHK Checksum Utility
    By encode in forum Data Compression
    Replies: 167
    Last Post: 6th November 2017, 14:22
  2. pcompress, a deduplication/compression utility
    By moinakg in forum Data Compression
    Replies: 152
    Last Post: 5th March 2015, 15:29
  3. Remote diff utility
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 6th September 2009, 15:37
  4. Replies: 18
    Last Post: 5th November 2007, 12:12
  5. Replies: 33
    Last Post: 24th October 2007, 12:39

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •