Results 1 to 7 of 7

Thread: Virtual Hard Disk Compress/Dedupe

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    ZA
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Virtual Hard Disk Compress/Dedupe

    Greetings all!

    I'm looking for advice on which compression utility would be best for compressing large amounts of .VHD (Virtual hard disk) files. 10-20TB with plenty of duplication due to (Windows mostly) OS files, also lots of SQL data inside the .VHD files.

    We have tried qpress and eXdupe (which is pretty good, but see below).

    What we're looking for:

    - stable
    - reasonable compression
    - speed
    - can extract individual files
    - supports directory recursion
    - benefit from lots of memory (64GB+ free) and lots of CPU cores but at low clock speed (Quad Proc E5-4640)

    eXdupe usually works quite well, but we notice that sometimes it struggles in certain places inside the VHD files where speed drops from 300-400MB/s right down to 10-20MB/s. The compression ratio on our files is between 5:1 and 10:1, which is acceptable.

    So, what should we be looking at?

    Thanks

    J

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    You can try zpaq with default compression level. http://mattmahoney.net/dc/zpaq.html
    In my tests it compresses better than exdupe with almost the same speed.

  3. #3
    Member
    Join Date
    Jul 2013
    Location
    ZA
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    You can try zpaq with default compression level. http://mattmahoney.net/dc/zpaq.html
    In my tests it compresses better than exdupe with almost the same speed.
    Thanks Matt, will try it out.

    Does it do dedupe on first run, or only subsequent runs when updating an existing archive?

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    zpaq does dedupe on all updates. It splits files into fragments with an average size of 64K (range 4K-508K), compares their SHA-1 hashes and stores fragments only once. It compares with fragments already added in previous updates as well as fragments previously added in the current update.

    zpaq is a journaling append-only archive. When you do a backup, if the file date has changed then it will append the new version but retain the old version. When you extract you can give a date and it will extract the latest version of the file up to the end of the archive as it existed on that date.

    zpaq v6.36 is the latest stable version. In 6.37 I am adding a purge command to remove old versions, but it needs more testing.

  5. #5
    Member
    Join Date
    Jul 2013
    Location
    ZA
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    zpaq does dedupe on all updates. It splits files into fragments with an average size of 64K (range 4K-508K), compares their SHA-1 hashes and stores fragments only once. It compares with fragments already added in previous updates as well as fragments previously added in the current update.

    zpaq is a journaling append-only archive. When you do a backup, if the file date has changed then it will append the new version but retain the old version. When you extract you can give a date and it will extract the latest version of the file up to the end of the archive as it existed on that date.

    zpaq v6.36 is the latest stable version. In 6.37 I am adding a purge command to remove old versions, but it needs more testing.
    Thanks Matt. I have been trying it out, and it does seem to compress better than eXdupe. Unfortunately, what eXdupe took 1 hour to compress, took zpaq over 2. We are backing up to a network share, not local disk - do you think this could be causing it to be much slower?

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Backing up to a network share would depend on your network speed. For a 1 Gb/s Ethernet, the practical speed limit is 30 MB/sec (25% of theoretical capacity due to packet collisions). This would be after compression and deduplication for zpaq and I assume eXdupe. eXdupe uses LZ4 which is faster than zpaq's default LZ77 but doesn't compress as well. What compression ratios did you get?

    Also, for archives smaller than available RAM, the first test will probably be slower than the second because the input is not yet cached in memory. But I assume if it took an hour we are talking about hundreds of GB.

    Also, I released zpaq v6.38 yesterday. It includes a compare function and a bug fix for extraction, but there should be no difference in compression speed or ratio.

  7. #7
    Member
    Join Date
    Jun 2013
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 2 Times in 1 Post
    Quote Originally Posted by JayM View Post
    eXdupe usually works quite well, but we notice that sometimes it struggles in certain places inside the VHD files where speed drops from 300-400MB/s right down to 10-20MB/s. The compression ratio on our files is between 5:1 and 10:1, which is acceptable.
    Hi JayM

    eXdupe 0.4.2 has just been released which fixed that problem

  8. The Following 2 Users Say Thank You to rrrlasse For This Useful Post:

    Bulat Ziganshin (22nd July 2013),Surfer (27th July 2013)

Similar Threads

  1. 60% of disk storage is used for copies
    By Matt Mahoney in forum Data Compression
    Replies: 9
    Last Post: 11th May 2013, 11:39
  2. Idea for raising compression efficiency on disk images
    By Mexxi in forum Data Compression
    Replies: 10
    Last Post: 18th February 2010, 05:56
  3. Compress-LZF
    By spark in forum Data Compression
    Replies: 2
    Last Post: 16th October 2009, 00:08
  4. Virtual test-machines
    By Vacon in forum Data Compression
    Replies: 7
    Last Post: 14th April 2009, 23:19
  5. Replies: 4
    Last Post: 17th March 2008, 21:19

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •