Results 1 to 2 of 2

Thread: Hashing

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    Hashing

    How to select proper hash algo

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    UMAC & VMAC

    UMAC and VMAC (site) are the fastest MAC algorithms available. They include UHASH/VHASH as part of their algorithms and, to the best of my knowledge, U/VHASH can be used as fast & reliable hash algorithms - more reliable than any non-cryptographic ones (crc,xxhash,cityhash) and even cryptograhically secure as far as attacker doesn't know the hash key. Moreover, since U/VHASH are keyed hashes, you can calculate hash of any K*N size, where N is the size of single hash, by calling them K times with different key values.

    Main loop of both hashes is based on very fast family of universal hashes:

    sum = (data[0]+key[0])*(data[1]+key[1]) + (data[2]+key[2])*(data[3]+key[3]) + ...
    result = sum >> (sizeof(*data)*8)

    UHASH is 32-bit hash, i.e. data[i], key[i] and result are 32-bit values. data[i]+key[i] is 32-bit operation, while multiplication is 32*32->64 and sum is 64-bit. VHASH is 64-bit hash, i.e. everything is twice as wide. Both have "double versions" that return double-wide result by performing the calculation twice (with different keys, of course). The "double version" reduces number of memory loads, though, so it's ~25% faster than two runs of ordinary version.

    By the definition of univeral hashing, every data[i] should have its own, independent key[i]. So U/VHASH employs this univeral hashing loop only as the base building block. Only 8..8192 (VMAC_NHBYTES) bytes are generated for key[] and therefore separate univeral hash value is computed for each VMAC_NHBYTES bytes of input data. Then those block hashes are combined using much slower but cryptographically-strong algorithms.

    Since VHASH performs 64*64->128 multiplication and requires register memory enough to storing two 128-bit values, it turns very slow on plain x86 register-poor architecture. With VMAC_NHBYTES=8192, i got 5 cpb for pure x86, 1 cpb for x86-SSE2 (hand-optimized algorithm storing data in MMX registers) and 0.4 cpb for plain x64 - all these for VHASH-128 algo. VHASH-64 was 0.25 cpb.

    UMAC should be much faster on x86, especially with SSE2/AVX2 - 2*UMAC-64 should be 0.5 cpb with SSE2, 0.25 cpb with AVX2. AVX512 (expected in Skylake CPUs) includes 64*64->128 SIMD multiplication, so with AVX512 VHASH should be faster than UHASH once again.

    VMAC is currently used by SREP as block checksum and for deduplication. Forthcoming FA'Next uses VMAC for deduplication if chunk checksums aren't stored to the archive. Crypto++ includes VMAC implementation.
    Last edited by Bulat Ziganshin; 19th May 2015 at 10:54.

  3. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    Matt Mahoney (19th May 2015),ne0n (19th May 2015)

Similar Threads

  1. Superfast hashing algorithm
    By gpnuma in forum Data Compression
    Replies: 22
    Last Post: 20th September 2015, 12:44
  2. hashing LZ
    By willvarfar in forum Data Compression
    Replies: 13
    Last Post: 24th August 2010, 20:29
  3. What type of hashing is this.
    By Earl Colby Pottinger in forum Data Compression
    Replies: 11
    Last Post: 22nd June 2010, 05:23
  4. Knuth's Multiplicative Hashing
    By encode in forum Data Compression
    Replies: 5
    Last Post: 28th May 2008, 22:18
  5. A nice article about hashing
    By encode in forum Forum Archive
    Replies: 19
    Last Post: 26th September 2007, 21:31

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •