Results 1 to 3 of 3

Thread: eXdupe 0.4.1 with complete source

  1. #1
    Member
    Join Date
    Jun 2013
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 2 Times in 1 Post

    eXdupe 0.4.1 with complete source

    I've released the complete (that is, with core deduplication library) source on www.exdupe.com

    Problem with rsync and other rolling hash functions is that they scatter their hash table access randomly, creating one expensive L1 cache miss per input byte.

    eXdupe computes hash values in another way. For any 8 KB chunk it identifies the first position that fulfills

    char((SMALL_PRIME * (653 + src[i] + src[i + 33*percent] + src[i + 66*percent] + src[i + len - slide - 4]))) == 0, where i is incremented in a loop.

    Obviously this is fulfilled after 256 bytes on average. From *that* i-position, a quick hash (let's call it Q) is computed from a handfull of bytes.

    The Q value is used as index for a stronger hash of the entire 8 KB block, starting from offset 0, so that hashtable[Q] = strong hash.

    eXdupe performs two passes on the input data: First it hashes each concecutive 8 KB block. Next it uses an 8 KB sliding window to test for blocks seen in first pass. The inner loop of the second pass is basically just the above loop that searches for a position - it doesn't do any hashtable lookups whatsoever (only after 256 bytes on average).

    Regards

    Lasse Reinhold
    Last edited by rrrlasse; 9th June 2013 at 23:46.

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Isn't that problem also solved by using the same rolling hash to decide fragment boundaries on the first pass?

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    exdupe was the first deduplication program jumped over 1GB/s barrier, and your algorithm looks very original, but now it seems outdated. simple but genius zpaq algorithm as implemented in srep 3.9 runs 5x faster and provides better compression ratio

Similar Threads

  1. Open source JPEG compressors
    By inikep in forum Data Compression
    Replies: 8
    Last Post: 22nd October 2011, 00:16
  2. LZMA source
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 29th March 2010, 18:45
  3. PeaZip - open source archiver
    By squxe in forum Data Compression
    Replies: 1
    Last Post: 3rd December 2009, 22:01
  4. fcm2 with source is here!
    By encode in forum Data Compression
    Replies: 90
    Last Post: 17th June 2008, 00:45
  5. Interesting Deflate source
    By encode in forum Forum Archive
    Replies: 10
    Last Post: 21st April 2008, 15:30

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •