Page 1 of 2 12 LastLast
Results 1 to 30 of 37

Thread: Extremely fast hash

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    Extremely fast hash

    I want to develop one more hash for usage in SREP. Typically it will be applied to 256/512-byte blocks, less often - to blocks of 32..65536 bytes. It doesn't need to be too fair - now i use just polynomial hash (a[0]+K*a[1]+K*K*a[2]+...) since it supports rolling computation and anything same or better will be enough and CRC quality will be fine. crc32c is already used as the first, rolling hash so it's not an option. Preferably it would be rolling but not 100% necessary - on my 100 gb testfile, i need to compute 80 millions of 512-byte hashes or 2 millions of 64KB hashes, so fast non-rolling hash would be not worse than several times slower rolling one

    now i consider the following variants:

    1) simd-tuned fast semi-rolling hash - since Intel cpus can perform SSE (4*int32) multiplication with throughput 1 and latency afair 5, it seems that we can reach speed of 16 bytes/cycle using polynomial hash over 32-bit words with a stripe of 5*4:

    hash0 = w[0]+K^4*w[4]+K^8*w[8]+...
    hash1 = w[1]+K^4*w[5]+K^8*w[9]+...
    ...
    hash = hash0+K*hash1+K^2*hash2+K^3*hash3

    where hash0,hash1,hash2,hash3 are 32-bit words of the single SSE register and actual scheme should use 5 SSE registers containing 20 32-bit words in order to deal with SSE MUL latency. The scheme is easily ported to AVX2, Phi or plain 32-bit cpus with given number of registers and MUL latency. I call it semi-rolling since you can roll it by 4 bytes, so it just needs keeping 4 last hash values instead of 1. So it's both fast and rolling, but i'm pretty sure that its fairness would be even lower than of byte-grain polynomial hash.

    2) something more fair, but non-rolling. The main idea is to keep exploiting intel SSE ability to perform 4*32-bit MUL each cycle, so it would process 16 bytes/cycle with MUL and a few other operations. like xxHash but use something faster for SSE instead of ROL operations

    3) semi-universal hashing. single cycle loads 16 bytes of input data, 16 bytes of constants, performs 4*32-bit MUL bertween them, and adds 4*32-bit results to the accumulator register. So it would process the same 16 bytes/cycle, non-rolling, with good fairness although lower bits will be less fair

    4) almost universal hashing. hash = (data0+const0)*(data1+const1) + (data2+const2)*(data3+const3)+... it is universal hashing if used MUL operation is 32*32=64-bit, but instead i plan to accumulate separately high and low 32 bits of products and then just add them together to produce 32-bit hash. It needs 4 loads, 4 adds and 2 muls per 32 bytes of input data so it may also fit in 16 bytes/cycle niche

    What you think about these hashes? Can you propose other fast ones?

    Existing fast hashes:
    Last edited by Bulat Ziganshin; 10th July 2013 at 20:23.

  2. #2
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    This paper might be helpful, if you haven't seen it:

    http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf

  3. #3
    Member
    Join Date
    Apr 2012
    Location
    Denmark
    Posts
    65
    Thanks
    21
    Thanked 16 Times in 14 Posts
    Just as a side-note, all pmul[..] xmm1, xmm2" have a latency of only 3 on Intel, except for Atom where it is 5. Either way memory speed/load latency will already be your limiting factor.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by sh0dan View Post
    Just as a side-note, all pmul[..] xmm1, xmm2" have a latency of only 3 on Intel, except for Atom where it is 5. Either way memory speed/load latency will already be your limiting factor.
    thanks. i thought it's 3 for MUL and 5 for PMUL. Smaller latency allows to use less registers, which are especially limited in 32-bit code

    Hashing isn't isolated task, so data will be already in L1 cache which can provide 2*16 bytes each cycle (and with AVX2 everything will be 2x faster). The real limiting factor is limited throughput of MUL/PMUL operations. btw, are you know - can MUL and PMUL be performed simultaneously, i.e. on different pipelines?

    Does it correct to expect that 4 loads, 4 adds and 2 muls (all SSE) can be processed in just 2 cycles on Sandy Bridge? Of course with enough interleaving



    Quote Originally Posted by nburns View Post
    This paper might be helpful, if you haven't seen it:

    http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf
    thank you, it's interesting, but not for that case. I already have potential match and comparing hashes is more efficient than bloom-filtering
    Last edited by Bulat Ziganshin; 10th July 2013 at 17:33.

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    I want to develop one more hash for usage in SREP. Typically it will be applied to 256/512-byte blocks, less often - to blocks of 32..65536 bytes. It doesn't need to be too fair - now i use just polynomial hash (a[0]+K*a[1]+K*K*a[2]+...) since it supports rolling computation and anything same or better will be enough and CRC quality will be fine. crc32c is already used as the first, rolling hash so it's not an option. Preferably it would be rolling but not 100% necessary - on my 100 gb testfile, i need to compute 80 millions of 512-byte hashes or 2 millions of 64KB hashes, so fast non-rolling hash would be not worse than several times slower rolling one

    now i consider the following variants:

    1) simd-tuned fast semi-rolling hash - since Intel cpus can perform SSE (4*int32) multiplication with throughput 1 and latency afair 5, it seems that we can reach speed of 16 bytes/cycle using polynomial hash over 32-bit words with a stripe of 5*4:

    hash0 = w[0]+K^4*w[4]+K^8*w[8]+...
    hash1 = w[1]+K^4*w[5]+K^8*w[9]+...
    ...
    hash = hash0+K*hash1+K^2*hash2+K^3*hash3

    where hash0,hash1,hash2,hash3 are 32-bit words of the single SSE register and actual scheme should use 5 SSE registers containing 20 32-bit words in order to deal with SSE MUL latency. The scheme is easily ported to AVX2, Phi or plain 32-bit cpus with given number of registers and MUL latency. I call it semi-rolling since you can roll it by 4 bytes, so it just needs keeping 4 last hash values instead of 1. So it's both fast and rolling, but i'm pretty sure that its fairness would be even lower than of byte-grain polynomial hash.

    2) something more fair, but non-rolling. The main idea is to keep exploiting intel SSE ability to perform 4*32-bit MUL each cycle, so it would process 16 bytes/cycle with MUL and a few other operations. like xxHash but use something faster for SSE instead of ROL operations

    3) semi-universal hashing. single cycle loads 16 bytes of input data, 16 bytes of constants, performs 4*32-bit MUL bertween them, and adds 4*32-bit results to the accumulator register. So it would process the same 16 bytes/cycle, non-rolling, with good fairness although lower bits will be less fair

    4) almost universal hashing. hash = (data0+const0)*(data1+const1) + (data2+const2)*(data3+const3)+... it is universal hashing if used MUL operation is 32*32=64-bit, but instead i plan to accumulate separately high and low 32 bits of products and then just add them together to produce 32-bit hash. It needs 4 loads, 4 adds and 2 muls per 32 bytes of input data so it may also fit in 16 bytes/cycle niche

    What you think about these hashes? Can you propose other fast ones?

    Existing fast hashes:
    1. The fastest published hashes for 64-bit platforms that I'm aware of are CityHash (the version with use of hardware CRC), fnv1a-Penumbra, fnv1a-Tesla2 and xxhash256 depending on platform.
    2. fnv1a-YoushimitsuTRIADiiXMM and Penumbra (they are the same except for the latter being unrolled) are quite similar to what you propose. Multiplication, xor, rotation implemented in SSE2 (also AVX, but nobody ever tried them on AVX capable machines and showed results). Rotation instruction is not available, so it's actually 2 shifts and or. They do 5.19 and 5.22 bytes/cycle on the author's platform (and less on mine). 16 bytes/cycle is going to be hard.
    3. SSE2 offers 128 bits of processing capacity per cycle. AVX - 256. Multiple ALUs of AMD - 192. Of Intel - 256. There's also a FPU. That's another what? 64?
    I'd like to see somebody try to interleave 3 hashes, 1 on vector engine, 1 on ALUs and 1 on FPU and combine them at the end. Would that work? If yes, how would CPU power management react? ^_^
    4. One interesting thing is that most modern CPUs have all 3 things, vector engine, multiple ALUs, FPU. If you relied on compiler generating the code for you you could achieve fair performance portability (though consumption ratios would have to be tweakable. Actually there would be probably many more tweaks needed.)
    Last edited by m^2; 10th July 2013 at 19:07.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    m^2,

    1. SandyBridge still have only 3 ALUs, so it may be better to use them all for SSE operations rather than integer/fp ones. I suggest you to look into Agner Fog optimization manuals in order to count cpu cycles properly
    2. ROL operation is pretty slow in SSE that limits speed of xxHash. another problem is that with SSE xxHash implementation, it utilizes only one SSE register, so it needs 3+1+1+1+3 cycles per each 16 bytes processed. By utilizing 3 SSE registers, it can fit in 2 cycles per each 16 bytes (limited by execution of 2 PMUL operations), 2x more with AVX2. xxHash256 can't utilize SSE at all, making it lame duck for a really fast algorithm
    3. CityHashCrc is limited to 4/8 bytes per cycle - a throughput of CRC32C operation
    Last edited by Bulat Ziganshin; 10th July 2013 at 19:47.

  7. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    thank you, it's interesting, but not for that case. I already have potential match and comparing hashes is more efficient than bloom-filtering
    That paper mostly deals with hash functions. Even though it talks about bloom filters, I think it's still relevant to your case, because either way the goal is to minimize collisions.

    Also, I tried some of the hashes on this page, and the code works. I didn't do any comparative testing. The wikipedia page for rolling hash has good references.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    this paper is about using just 2 hash functions for Bloom filter. it says nothing about construction of fast hashes. yes, it solves general problem of fast isNumberOfSet? checking, just in the way that i see unapplicable to SREP
    Last edited by Bulat Ziganshin; 10th July 2013 at 20:04.

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Your first version ("almost" rolling hash, on 4 bytes boundaries) looks fairly interesting.
    And quite achievable.

    The only downside I can foresee is the recommendation for SSE load to be 16-bytes aligned.
    There are also instructions for unaligned loads, but Intel states it costs a "large performance penalty".
    How really large is it ? It's not clear, maybe this cost is overstated.

    [edit] I've just been playing with "aligned load" vs "unaligned load" SSE instructions, and the differences are not large at all, at least on my CPU (i5-3340M).
    When data is nonetheless aligned, using the "unaligned" intrinsic has a negligible cost, somewhere in the 1-2% region.
    When data is not aligned, the "aligned" intrinsic crashes (obviously), and the "unaligned" intrinsic succeeds with a performance cost of 15%.
    Not bad. The "large" performance warning seems overblown.
    Last edited by Cyan; 12th July 2013 at 09:57.

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by sh0dan View Post
    Just as a side-note, all pmul[..] xmm1, xmm2" have a latency of only 3 on Intel, except for Atom where it is 5. Either way memory speed/load latency will already be your limiting factor.
    i've consulted cpu analyst and he have said that it's an error in Intel manual, the correct value as measured by Agner Fog and written in his manual is 5. anyway, we will check it

    Quote Originally Posted by Cyan View Post
    The only downside I can foresee is the recommendation for SSE load to be 16-bytes aligned.
    it's very outdated, around of Pentium4 times. afair, on the Sandy Bridge, it should have no penalty at all

    meanwhile, i've realized that even my usual byte-granular polynomial (rolling) hash may be computed in 4 byte steps - i.e. load 4 bytes into dwords of SSE register, add them to the accumulator SSE register, and then multiple every dword in the accumulator to K^4. It should allow to process 4 bytes per cycle (8 for AVX2), limited by the throughput of PMUL operation

    nevertheless, i decided to use universal hashing - it will give me another 32 bit of strong entropy that means that i can reduce strong digest used for final checking from 160 to 128 bits and provide user the same reliability using less amount of memory

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    there is also interesting analysis at http://locklessinc.com/articles/fast_hash/ - author says that we use MUL to promote entropy to the higher bits and then need to use ROL, SHR+XOR, BSWAP or SHUFFLE to promote entropy into opposite way. This idea greatly helps in understanding how to construct good hashing functions

  12. #12
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Indeed, I feel Shuffle is the right way to go for SSE algorithm,

    but unfortunately, it is more complex to emulate for standard x86 or x64 code.

    It's quite the opposite for the ROL instruction.

    This lead to different hash design decisions.

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    What about shuffle by a multiple of 8 bits?

  14. #14
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I've been playing with hash functions of late, and I discovered some things. The polynomial rolling hash that you described, even though it's a mediocre hash on its own, can be re-hashed into a high-quality hash. In other words, you can use the rolling function to get a 64-bit number and then pass the number through something like murmurhash and it will redistribute the bits and produce a high-quality hash.

    In fact, it's not necessary to use all of murmurhash. I took one function from murmurhash, combined with the polynomial rolling hash, and created a high-quality hash. This is the function:

    //-----------------------------------------------------------------------------
    // Finalization mix - force all bits of a hash block to avalanche
    //...


    FORCE_INLINE uint64_t fmix64 ( uint64_t k )
    {
    k ^= k >> 33;
    k *= BIG_CONSTANT(0xff51afd7ed558ccd);
    k ^= k >> 33;
    k *= BIG_CONSTANT(0xc4ceb9fe1a85ec53);
    k ^= k >> 33;

    return k;
    }


    As it says in the comment, this avalanches and mixes all the bits.

    Here's my "expandable hash," which can be used as a rolling hash or a non-rolling hash, or it can expand and contract:

    #include <stdint.h>


    #define exphashB 273
    #define exphashBinv 0xff0ff0ff0ff0ff1
    #define exphashConst1 0xff51afd7ed558ccd
    #define exphashConst2 0xc4ceb9fe1a85ec53


    #define HASHDATA_EMPTY { 0, exphashB }


    struct hash_data_t {
    uint64_t value;
    uint64_t Bpow;
    };


    #define hash_add_right(hdata, cdata) \
    (hdata.Bpow *= exphashB, hdata.value = ((hdata.value + (unsigned char)(cdata)) * exphashB))


    #define hash_remove_left(hdata, cdata) \
    (hdata.Bpow *= exphashBinv, hdata.value = (hdata.value - hdata.Bpow * (unsigned char)(cdata)))


    #define avalanche_bits(hvalue, hmultiplier) (((hvalue) ^ ((hvalue) >> 33)) * (hmultiplier))


    #define stir_hash(hdata) \
    avalanche_bits(avalanche_bits(hdata.value ^ hdata.Bpow, exphashConst1), exphashConst2)


    #define reduce_hash_to_32bits(hdata) (stir_hash(hdata) + (stir_hash(hdata) >> 32))


    #define hash_mod_2k(hdata, numbits) \
    (reduce_hash_to_32bits(hdata) & (0xffffffffu >> (32 - numbits)))

    // For example:
    //
    // struct hash_data_t hash = HASHDATA_EMPTY;
    // hash_add_right(hash, 'x');
    // hash_add_right(hash, 'y');
    // hash_add_right(hash, 'z');

    // remove from left to get rolling:
    // hash_remove_left(hash, 'x');

    // extract a 32-bit hash:
    // uint32_t h = reduce_hash_to_32bits(hash);


    Here are the SMHasher results. It passes every test (except the seed test, because it ignores the seed).

    Code:
    -------------------------------------------------------------------------------
    --- Testing ExpandableHash (Neal Burns's expandable hash)
    
    
    [[[ Sanity Tests ]]]
    
    
    Verification value 0xAABCD2F0 : Passed!
    Running sanity check 1..........PASS
    Running sanity check 2..........PASS
    
    
    [[[ Speed Tests ]]]
    
    
    Bulk speed test - 262144-byte keys
    Alignment  0 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  1 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  2 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  3 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  4 -  0.315 bytes/cycle -  900.64 MiB/sec @ 3 ghz
    Alignment  5 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    Alignment  6 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    Alignment  7 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    
    
    Small key speed test -    1-byte keys -    21.60 cycles/hash
    Small key speed test -    2-byte keys -    22.29 cycles/hash
    Small key speed test -    3-byte keys -    20.81 cycles/hash
    Small key speed test -    4-byte keys -    22.30 cycles/hash
    Small key speed test -    5-byte keys -    25.75 cycles/hash
    Small key speed test -    6-byte keys -    27.40 cycles/hash
    Small key speed test -    7-byte keys -    27.49 cycles/hash
    Small key speed test -    8-byte keys -    30.06 cycles/hash
    Small key speed test -    9-byte keys -    31.76 cycles/hash
    Small key speed test -   10-byte keys -    34.03 cycles/hash
    Small key speed test -   11-byte keys -    36.79 cycles/hash
    Small key speed test -   12-byte keys -    38.23 cycles/hash
    Small key speed test -   13-byte keys -    40.41 cycles/hash
    Small key speed test -   14-byte keys -    45.15 cycles/hash
    Small key speed test -   15-byte keys -    47.37 cycles/hash
    Small key speed test -   16-byte keys -    47.91 cycles/hash
    Small key speed test -   17-byte keys -    52.99 cycles/hash
    Small key speed test -   18-byte keys -    54.98 cycles/hash
    Small key speed test -   19-byte keys -    56.96 cycles/hash
    Small key speed test -   20-byte keys -    60.41 cycles/hash
    Small key speed test -   21-byte keys -    62.95 cycles/hash
    Small key speed test -   22-byte keys -    65.18 cycles/hash
    Small key speed test -   23-byte keys -    66.76 cycles/hash
    Small key speed test -   24-byte keys -    69.23 cycles/hash
    Small key speed test -   25-byte keys -    71.90 cycles/hash
    Small key speed test -   26-byte keys -    73.87 cycles/hash
    Small key speed test -   27-byte keys -    75.83 cycles/hash
    Small key speed test -   28-byte keys -   110.96 cycles/hash
    Small key speed test -   29-byte keys -    81.14 cycles/hash
    Small key speed test -   30-byte keys -    83.48 cycles/hash
    Small key speed test -   31-byte keys -   121.29 cycles/hash
    
    
    [[[ Differential Tests ]]]
    
    
    Testing 8303632 up-to-5-bit differentials in 64-bit keys -> 32 bit hashes.
    1000 reps, 8303632000 total tests, expecting 1.93 random collisions..........
    4 total collisions, of which 4 single collisions were ignored
    
    
    Testing 11017632 up-to-4-bit differentials in 128-bit keys -> 32 bit hashes.
    1000 reps, 11017632000 total tests, expecting 2.57 random collisions..........
    2 total collisions, of which 2 single collisions were ignored
    
    
    Testing 2796416 up-to-3-bit differentials in 256-bit keys -> 32 bit hashes.
    1000 reps, 2796416000 total tests, expecting 0.65 random collisions..........
    0 total collisions, of which 0 single collisions were ignored
    
    
    
    
    [[[ Avalanche Tests ]]]
    
    
    Testing  32-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.597333%
    Testing  40-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.556667%
    Testing  48-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.692667%
    Testing  56-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.624000%
    Testing  64-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.638000%
    Testing  72-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.646000%
    Testing  80-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.736667%
    Testing  88-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.672000%
    Testing  96-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.708667%
    Testing 104-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.740000%
    Testing 112-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.592667%
    Testing 120-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.695333%
    Testing 128-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.684000%
    Testing 136-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.666667%
    Testing 144-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.662000%
    Testing 152-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.694667%
    
    
    [[[ Keyset 'Cyclic' Tests ]]]
    
    
    Keyset 'Cyclic' - 8 cycles of 4 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11747.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  25 - 0.022%
    
    
    Keyset 'Cyclic' - 8 cycles of 5 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11530.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  27 - 0.038%
    
    
    Keyset 'Cyclic' - 8 cycles of 6 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11828.00 ( 1.02x)
    Testing distribution - Worst bias is the  20-bit window at bit  26 - 0.033%
    
    
    Keyset 'Cyclic' - 8 cycles of 7 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11890.00 ( 1.02x)
    Testing distribution - Worst bias is the  20-bit window at bit  23 - 0.019%
    
    
    Keyset 'Cyclic' - 8 cycles of 8 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11636.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit   1 - 0.029%
    
    
    
    
    [[[ Keyset 'TwoBytes' Tests ]]]
    
    
    Keyset 'TwoBytes' - up-to-4-byte keys, 652545 total keys
    Testing collisions   - Expected    49.57, actual    49.00 ( 0.99x)
    Testing distribution - Worst bias is the  16-bit window at bit   1 - 0.091%
    
    
    Keyset 'TwoBytes' - up-to-8-byte keys, 5471025 total keys
    Testing collisions   - Expected  3484.56, actual  3513.00 ( 1.01x)
    Testing distribution - Worst bias is the  20-bit window at bit  12 - 0.053%
    
    
    Keyset 'TwoBytes' - up-to-12-byte keys, 18616785 total keys
    Testing collisions   - Expected 40347.77, actual 40153.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit   1 - 0.007%
    
    
    Keyset 'TwoBytes' - up-to-16-byte keys, 44251425 total keys
    Testing collisions   - Expected 227963.15, actual 226755.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  17 - 0.006%
    
    
    Keyset 'TwoBytes' - up-to-20-byte keys, 86536545 total keys
    Testing collisions   - Expected 871784.70, actual 864891.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit   8 - 0.003%
    
    
    
    
    [[[ Keyset 'Sparse' Tests ]]]
    
    
    Keyset 'Sparse' - 32-bit keys with up to 6 bits set - 1149017 keys
    Testing collisions   - Expected   153.70, actual   161.00 ( 1.05x)
    Testing distribution - Worst bias is the  17-bit window at bit  31 - 0.132%
    
    
    Keyset 'Sparse' - 40-bit keys with up to 6 bits set - 4598479 keys
    Testing collisions   - Expected  2461.72, actual  2561.00 ( 1.04x)
    Testing distribution - Worst bias is the  19-bit window at bit   6 - 0.047%
    
    
    Keyset 'Sparse' - 48-bit keys with up to 5 bits set - 1925357 keys
    Testing collisions   - Expected   431.55, actual   453.00 ( 1.05x)
    Testing distribution - Worst bias is the  17-bit window at bit  29 - 0.053%
    
    
    Keyset 'Sparse' - 56-bit keys with up to 5 bits set - 4216423 keys
    Testing collisions   - Expected  2069.66, actual  2046.00 ( 0.99x)
    Testing distribution - Worst bias is the  19-bit window at bit   8 - 0.045%
    
    
    Keyset 'Sparse' - 64-bit keys with up to 5 bits set - 8303633 keys
    Testing collisions   - Expected  8026.87, actual  7993.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  23 - 0.032%
    
    
    Keyset 'Sparse' - 96-bit keys with up to 4 bits set - 3469497 keys
    Testing collisions   - Expected  1401.34, actual  1420.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  26 - 0.046%
    
    
    Keyset 'Sparse' - 256-bit keys with up to 3 bits set - 2796417 keys
    Testing collisions   - Expected   910.36, actual   924.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  23 - 0.052%
    
    
    Keyset 'Sparse' - 2048-bit keys with up to 2 bits set - 2098177 keys
    Testing collisions   - Expected   512.50, actual   450.00 ( 0.88x)
    Testing distribution - Worst bias is the  18-bit window at bit  23 - 0.069%
    
    
    
    
    [[[ Keyset 'Combination Lowbits' Tests ]]]
    
    
    Keyset 'Combination' - up to 8 blocks from a set of 8 - 19173960 keys
    Testing collisions   - Expected 42799.01, actual 42586.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  25 - 0.019%
    
    
    
    
    [[[ Keyset 'Combination Highbits' Tests ]]]
    
    
    Keyset 'Combination' - up to 8 blocks from a set of 8 - 19173960 keys
    Testing collisions   - Expected 42799.01, actual 42934.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  17 - 0.017%
    
    
    
    
    [[[ Keyset 'Combination 0x8000000' Tests ]]]
    
    
    Keyset 'Combination' - up to 20 blocks from a set of 2 - 2097150 keys
    Testing collisions   - Expected   512.00, actual   506.00 ( 0.99x)
    Testing distribution - Worst bias is the  18-bit window at bit  20 - 0.085%
    
    
    
    
    [[[ Keyset 'Combination 0x0000001' Tests ]]]
    
    
    Keyset 'Combination' - up to 20 blocks from a set of 2 - 2097150 keys
    Testing collisions   - Expected   512.00, actual   502.00 ( 0.98x)
    Testing distribution - Worst bias is the  18-bit window at bit  13 - 0.070%
    
    
    
    
    [[[ Keyset 'Combination Hi-Lo' Tests ]]]
    
    
    Keyset 'Combination' - up to 6 blocks from a set of 15 - 12204240 keys
    Testing collisions   - Expected 17339.30, actual 17205.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  21 - 0.018%
    
    
    
    
    [[[ Keyset 'Window' Tests ]]]
    
    
    Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per test
    Window at   0 - Testing collisions   - Expected   128.00, actual   125.00 ( 0.98x)
    Window at   1 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at   2 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at   3 - Testing collisions   - Expected   128.00, actual   129.00 ( 1.01x)
    Window at   4 - Testing collisions   - Expected   128.00, actual   134.00 ( 1.05x)
    Window at   5 - Testing collisions   - Expected   128.00, actual   123.00 ( 0.96x)
    Window at   6 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at   7 - Testing collisions   - Expected   128.00, actual   138.00 ( 1.08x)
    Window at   8 - Testing collisions   - Expected   128.00, actual   116.00 ( 0.91x)
    Window at   9 - Testing collisions   - Expected   128.00, actual   135.00 ( 1.05x)
    Window at  10 - Testing collisions   - Expected   128.00, actual   151.00 ( 1.18x)
    Window at  11 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  12 - Testing collisions   - Expected   128.00, actual   146.00 ( 1.14x)
    Window at  13 - Testing collisions   - Expected   128.00, actual   153.00 ( 1.20x)
    Window at  14 - Testing collisions   - Expected   128.00, actual   156.00 ( 1.22x)
    Window at  15 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  16 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  17 - Testing collisions   - Expected   128.00, actual   115.00 ( 0.90x)
    Window at  18 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  19 - Testing collisions   - Expected   128.00, actual   127.00 ( 0.99x)
    Window at  20 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  21 - Testing collisions   - Expected   128.00, actual   134.00 ( 1.05x)
    Window at  22 - Testing collisions   - Expected   128.00, actual   130.00 ( 1.02x)
    Window at  23 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  24 - Testing collisions   - Expected   128.00, actual   149.00 ( 1.16x)
    Window at  25 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  26 - Testing collisions   - Expected   128.00, actual   121.00 ( 0.95x)
    Window at  27 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  28 - Testing collisions   - Expected   128.00, actual   108.00 ( 0.84x)
    Window at  29 - Testing collisions   - Expected   128.00, actual   140.00 ( 1.09x)
    Window at  30 - Testing collisions   - Expected   128.00, actual   144.00 ( 1.13x)
    Window at  31 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  32 - Testing collisions   - Expected   128.00, actual   127.00 ( 0.99x)
    Window at  33 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  34 - Testing collisions   - Expected   128.00, actual   109.00 ( 0.85x)
    Window at  35 - Testing collisions   - Expected   128.00, actual   110.00 ( 0.86x)
    Window at  36 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  37 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at  38 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  39 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  40 - Testing collisions   - Expected   128.00, actual   146.00 ( 1.14x)
    Window at  41 - Testing collisions   - Expected   128.00, actual   126.00 ( 0.98x)
    Window at  42 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  43 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  44 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  45 - Testing collisions   - Expected   128.00, actual   120.00 ( 0.94x)
    Window at  46 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  47 - Testing collisions   - Expected   128.00, actual   110.00 ( 0.86x)
    Window at  48 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at  49 - Testing collisions   - Expected   128.00, actual   145.00 ( 1.13x)
    Window at  50 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  51 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  52 - Testing collisions   - Expected   128.00, actual   130.00 ( 1.02x)
    Window at  53 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  54 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  55 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  56 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at  57 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  58 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at  59 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  60 - Testing collisions   - Expected   128.00, actual   132.00 ( 1.03x)
    Window at  61 - Testing collisions   - Expected   128.00, actual   144.00 ( 1.13x)
    Window at  62 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  63 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  64 - Testing collisions   - Expected   128.00, actual   125.00 ( 0.98x)
    
    
    [[[ Keyset 'Text' Tests ]]]
    
    
    Keyset 'Text' - keys of form "Foo[XXXX]Bar" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25101.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  21 - 0.032%
    
    
    Keyset 'Text' - keys of form "FooBar[XXXX]" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25445.00 ( 1.00x)
    Testing distribution - Worst bias is the  19-bit window at bit  10 - 0.019%
    
    
    Keyset 'Text' - keys of form "[XXXX]FooBar" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25350.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  24 - 0.017%
    
    
    
    
    [[[ Keyset 'Zeroes' Tests ]]]
    
    
    Keyset 'Zeroes' - 65536 keys
    Testing collisions   - Expected     0.50, actual     0.00 ( 0.00x)
    Testing distribution - Worst bias is the  13-bit window at bit   6 - 0.393%
    
    
    
    
    [[[ Keyset 'Seed' Tests ]]]
    
    
    Keyset 'Seed' - 1000000 keys
    Testing collisions   - Expected   116.42, actual 999999.00 (8589.93x) !!!!!
    Testing distribution - Worst bias is the  17-bit window at bit   0 - 99.999% !!!!!
    
    
    *********FAIL*********
    
    
    
    
    Input vcode 0x73334fdc, Output vcode 0xe1f9123e, Result vcode 0x00000001
    Verification value is 0x00000001 - Testing took 1866.872407 seconds
    -------------------------------------------------------------------------------
    I hope this is useful.

    Edit: I looked over the Wikipedia page on universal hashing. Even though I came up with this hash through mixing and matching from other hash functions and playing with SMHasher, it resembles the hashes on that page and I think it could be justified by similar theory.

    With some modifications, I think you could make it a universal hash that has some guarantees against attack. Or, there could be ways to make it faster.
    Last edited by nburns; 7th August 2013 at 20:19.

  15. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    nburns, it looks an excellent idea, especially if one needs a small hash

    universal hashing is about chhosing random hash on each run, i.e. you can multiply by random number. also you need to carefully choose construction algorithm in order to make mathematical proof of its properties
    Last edited by Bulat Ziganshin; 8th August 2013 at 21:26.

  16. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nburns View Post
    I've been playing with hash functions of late, and I discovered some things. The polynomial rolling hash that you described, even though it's a mediocre hash on its own, can be re-hashed into a high-quality hash. In other words, you can use the rolling function to get a 64-bit number and then pass the number through something like murmurhash and it will redistribute the bits and produce a high-quality hash.

    In fact, it's not necessary to use all of murmurhash. I took one function from murmurhash, combined with the polynomial rolling hash, and created a high-quality hash. This is the function:

    //-----------------------------------------------------------------------------
    // Finalization mix - force all bits of a hash block to avalanche
    //...


    FORCE_INLINE uint64_t fmix64 ( uint64_t k )
    {
    k ^= k >> 33;
    k *= BIG_CONSTANT(0xff51afd7ed558ccd);
    k ^= k >> 33;
    k *= BIG_CONSTANT(0xc4ceb9fe1a85ec53);
    k ^= k >> 33;

    return k;
    }


    As it says in the comment, this avalanches and mixes all the bits.

    Here's my "expandable hash," which can be used as a rolling hash or a non-rolling hash, or it can expand and contract:

    #include <stdint.h>


    #define exphashB 273
    #define exphashBinv 0xff0ff0ff0ff0ff1
    #define exphashConst1 0xff51afd7ed558ccd
    #define exphashConst2 0xc4ceb9fe1a85ec53


    #define HASHDATA_EMPTY { 0, exphashB }


    struct hash_data_t {
    uint64_t value;
    uint64_t Bpow;
    };


    #define hash_add_right(hdata, cdata) \
    (hdata.Bpow *= exphashB, hdata.value = ((hdata.value + (unsigned char)(cdata)) * exphashB))


    #define hash_remove_left(hdata, cdata) \
    (hdata.Bpow *= exphashBinv, hdata.value = (hdata.value - hdata.Bpow * (unsigned char)(cdata)))


    #define avalanche_bits(hvalue, hmultiplier) (((hvalue) ^ ((hvalue) >> 33)) * (hmultiplier))


    #define stir_hash(hdata) \
    avalanche_bits(avalanche_bits(hdata.value ^ hdata.Bpow, exphashConst1), exphashConst2)


    #define reduce_hash_to_32bits(hdata) (stir_hash(hdata) + (stir_hash(hdata) >> 32))


    #define hash_mod_2k(hdata, numbits) \
    (reduce_hash_to_32bits(hdata) & (0xffffffffu >> (32 - numbits)))

    // For example:
    //
    // struct hash_data_t hash = HASHDATA_EMPTY;
    // hash_add_right(hash, 'x');
    // hash_add_right(hash, 'y');
    // hash_add_right(hash, 'z');

    // remove from left to get rolling:
    // hash_remove_left(hash, 'x');

    // extract a 32-bit hash:
    // uint32_t h = reduce_hash_to_32bits(hash);


    Here are the SMHasher results. It passes every test (except the seed test, because it ignores the seed).

    Code:
    -------------------------------------------------------------------------------
    --- Testing ExpandableHash (Neal Burns's expandable hash)
    
    
    [[[ Sanity Tests ]]]
    
    
    Verification value 0xAABCD2F0 : Passed!
    Running sanity check 1..........PASS
    Running sanity check 2..........PASS
    
    
    [[[ Speed Tests ]]]
    
    
    Bulk speed test - 262144-byte keys
    Alignment  0 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  1 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  2 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  3 -  0.315 bytes/cycle -  900.65 MiB/sec @ 3 ghz
    Alignment  4 -  0.315 bytes/cycle -  900.64 MiB/sec @ 3 ghz
    Alignment  5 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    Alignment  6 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    Alignment  7 -  0.315 bytes/cycle -  900.63 MiB/sec @ 3 ghz
    
    
    Small key speed test -    1-byte keys -    21.60 cycles/hash
    Small key speed test -    2-byte keys -    22.29 cycles/hash
    Small key speed test -    3-byte keys -    20.81 cycles/hash
    Small key speed test -    4-byte keys -    22.30 cycles/hash
    Small key speed test -    5-byte keys -    25.75 cycles/hash
    Small key speed test -    6-byte keys -    27.40 cycles/hash
    Small key speed test -    7-byte keys -    27.49 cycles/hash
    Small key speed test -    8-byte keys -    30.06 cycles/hash
    Small key speed test -    9-byte keys -    31.76 cycles/hash
    Small key speed test -   10-byte keys -    34.03 cycles/hash
    Small key speed test -   11-byte keys -    36.79 cycles/hash
    Small key speed test -   12-byte keys -    38.23 cycles/hash
    Small key speed test -   13-byte keys -    40.41 cycles/hash
    Small key speed test -   14-byte keys -    45.15 cycles/hash
    Small key speed test -   15-byte keys -    47.37 cycles/hash
    Small key speed test -   16-byte keys -    47.91 cycles/hash
    Small key speed test -   17-byte keys -    52.99 cycles/hash
    Small key speed test -   18-byte keys -    54.98 cycles/hash
    Small key speed test -   19-byte keys -    56.96 cycles/hash
    Small key speed test -   20-byte keys -    60.41 cycles/hash
    Small key speed test -   21-byte keys -    62.95 cycles/hash
    Small key speed test -   22-byte keys -    65.18 cycles/hash
    Small key speed test -   23-byte keys -    66.76 cycles/hash
    Small key speed test -   24-byte keys -    69.23 cycles/hash
    Small key speed test -   25-byte keys -    71.90 cycles/hash
    Small key speed test -   26-byte keys -    73.87 cycles/hash
    Small key speed test -   27-byte keys -    75.83 cycles/hash
    Small key speed test -   28-byte keys -   110.96 cycles/hash
    Small key speed test -   29-byte keys -    81.14 cycles/hash
    Small key speed test -   30-byte keys -    83.48 cycles/hash
    Small key speed test -   31-byte keys -   121.29 cycles/hash
    
    
    [[[ Differential Tests ]]]
    
    
    Testing 8303632 up-to-5-bit differentials in 64-bit keys -> 32 bit hashes.
    1000 reps, 8303632000 total tests, expecting 1.93 random collisions..........
    4 total collisions, of which 4 single collisions were ignored
    
    
    Testing 11017632 up-to-4-bit differentials in 128-bit keys -> 32 bit hashes.
    1000 reps, 11017632000 total tests, expecting 2.57 random collisions..........
    2 total collisions, of which 2 single collisions were ignored
    
    
    Testing 2796416 up-to-3-bit differentials in 256-bit keys -> 32 bit hashes.
    1000 reps, 2796416000 total tests, expecting 0.65 random collisions..........
    0 total collisions, of which 0 single collisions were ignored
    
    
    
    
    [[[ Avalanche Tests ]]]
    
    
    Testing  32-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.597333%
    Testing  40-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.556667%
    Testing  48-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.692667%
    Testing  56-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.624000%
    Testing  64-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.638000%
    Testing  72-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.646000%
    Testing  80-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.736667%
    Testing  88-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.672000%
    Testing  96-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.708667%
    Testing 104-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.740000%
    Testing 112-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.592667%
    Testing 120-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.695333%
    Testing 128-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.684000%
    Testing 136-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.666667%
    Testing 144-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.662000%
    Testing 152-bit keys ->  32-bit hashes,   300000 reps.......... worst bias is 0.694667%
    
    
    [[[ Keyset 'Cyclic' Tests ]]]
    
    
    Keyset 'Cyclic' - 8 cycles of 4 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11747.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  25 - 0.022%
    
    
    Keyset 'Cyclic' - 8 cycles of 5 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11530.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  27 - 0.038%
    
    
    Keyset 'Cyclic' - 8 cycles of 6 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11828.00 ( 1.02x)
    Testing distribution - Worst bias is the  20-bit window at bit  26 - 0.033%
    
    
    Keyset 'Cyclic' - 8 cycles of 7 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11890.00 ( 1.02x)
    Testing distribution - Worst bias is the  20-bit window at bit  23 - 0.019%
    
    
    Keyset 'Cyclic' - 8 cycles of 8 bytes - 10000000 keys
    Testing collisions   - Expected 11641.53, actual 11636.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit   1 - 0.029%
    
    
    
    
    [[[ Keyset 'TwoBytes' Tests ]]]
    
    
    Keyset 'TwoBytes' - up-to-4-byte keys, 652545 total keys
    Testing collisions   - Expected    49.57, actual    49.00 ( 0.99x)
    Testing distribution - Worst bias is the  16-bit window at bit   1 - 0.091%
    
    
    Keyset 'TwoBytes' - up-to-8-byte keys, 5471025 total keys
    Testing collisions   - Expected  3484.56, actual  3513.00 ( 1.01x)
    Testing distribution - Worst bias is the  20-bit window at bit  12 - 0.053%
    
    
    Keyset 'TwoBytes' - up-to-12-byte keys, 18616785 total keys
    Testing collisions   - Expected 40347.77, actual 40153.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit   1 - 0.007%
    
    
    Keyset 'TwoBytes' - up-to-16-byte keys, 44251425 total keys
    Testing collisions   - Expected 227963.15, actual 226755.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  17 - 0.006%
    
    
    Keyset 'TwoBytes' - up-to-20-byte keys, 86536545 total keys
    Testing collisions   - Expected 871784.70, actual 864891.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit   8 - 0.003%
    
    
    
    
    [[[ Keyset 'Sparse' Tests ]]]
    
    
    Keyset 'Sparse' - 32-bit keys with up to 6 bits set - 1149017 keys
    Testing collisions   - Expected   153.70, actual   161.00 ( 1.05x)
    Testing distribution - Worst bias is the  17-bit window at bit  31 - 0.132%
    
    
    Keyset 'Sparse' - 40-bit keys with up to 6 bits set - 4598479 keys
    Testing collisions   - Expected  2461.72, actual  2561.00 ( 1.04x)
    Testing distribution - Worst bias is the  19-bit window at bit   6 - 0.047%
    
    
    Keyset 'Sparse' - 48-bit keys with up to 5 bits set - 1925357 keys
    Testing collisions   - Expected   431.55, actual   453.00 ( 1.05x)
    Testing distribution - Worst bias is the  17-bit window at bit  29 - 0.053%
    
    
    Keyset 'Sparse' - 56-bit keys with up to 5 bits set - 4216423 keys
    Testing collisions   - Expected  2069.66, actual  2046.00 ( 0.99x)
    Testing distribution - Worst bias is the  19-bit window at bit   8 - 0.045%
    
    
    Keyset 'Sparse' - 64-bit keys with up to 5 bits set - 8303633 keys
    Testing collisions   - Expected  8026.87, actual  7993.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  23 - 0.032%
    
    
    Keyset 'Sparse' - 96-bit keys with up to 4 bits set - 3469497 keys
    Testing collisions   - Expected  1401.34, actual  1420.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  26 - 0.046%
    
    
    Keyset 'Sparse' - 256-bit keys with up to 3 bits set - 2796417 keys
    Testing collisions   - Expected   910.36, actual   924.00 ( 1.01x)
    Testing distribution - Worst bias is the  19-bit window at bit  23 - 0.052%
    
    
    Keyset 'Sparse' - 2048-bit keys with up to 2 bits set - 2098177 keys
    Testing collisions   - Expected   512.50, actual   450.00 ( 0.88x)
    Testing distribution - Worst bias is the  18-bit window at bit  23 - 0.069%
    
    
    
    
    [[[ Keyset 'Combination Lowbits' Tests ]]]
    
    
    Keyset 'Combination' - up to 8 blocks from a set of 8 - 19173960 keys
    Testing collisions   - Expected 42799.01, actual 42586.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  25 - 0.019%
    
    
    
    
    [[[ Keyset 'Combination Highbits' Tests ]]]
    
    
    Keyset 'Combination' - up to 8 blocks from a set of 8 - 19173960 keys
    Testing collisions   - Expected 42799.01, actual 42934.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  17 - 0.017%
    
    
    
    
    [[[ Keyset 'Combination 0x8000000' Tests ]]]
    
    
    Keyset 'Combination' - up to 20 blocks from a set of 2 - 2097150 keys
    Testing collisions   - Expected   512.00, actual   506.00 ( 0.99x)
    Testing distribution - Worst bias is the  18-bit window at bit  20 - 0.085%
    
    
    
    
    [[[ Keyset 'Combination 0x0000001' Tests ]]]
    
    
    Keyset 'Combination' - up to 20 blocks from a set of 2 - 2097150 keys
    Testing collisions   - Expected   512.00, actual   502.00 ( 0.98x)
    Testing distribution - Worst bias is the  18-bit window at bit  13 - 0.070%
    
    
    
    
    [[[ Keyset 'Combination Hi-Lo' Tests ]]]
    
    
    Keyset 'Combination' - up to 6 blocks from a set of 15 - 12204240 keys
    Testing collisions   - Expected 17339.30, actual 17205.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  21 - 0.018%
    
    
    
    
    [[[ Keyset 'Window' Tests ]]]
    
    
    Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per test
    Window at   0 - Testing collisions   - Expected   128.00, actual   125.00 ( 0.98x)
    Window at   1 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at   2 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at   3 - Testing collisions   - Expected   128.00, actual   129.00 ( 1.01x)
    Window at   4 - Testing collisions   - Expected   128.00, actual   134.00 ( 1.05x)
    Window at   5 - Testing collisions   - Expected   128.00, actual   123.00 ( 0.96x)
    Window at   6 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at   7 - Testing collisions   - Expected   128.00, actual   138.00 ( 1.08x)
    Window at   8 - Testing collisions   - Expected   128.00, actual   116.00 ( 0.91x)
    Window at   9 - Testing collisions   - Expected   128.00, actual   135.00 ( 1.05x)
    Window at  10 - Testing collisions   - Expected   128.00, actual   151.00 ( 1.18x)
    Window at  11 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  12 - Testing collisions   - Expected   128.00, actual   146.00 ( 1.14x)
    Window at  13 - Testing collisions   - Expected   128.00, actual   153.00 ( 1.20x)
    Window at  14 - Testing collisions   - Expected   128.00, actual   156.00 ( 1.22x)
    Window at  15 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  16 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  17 - Testing collisions   - Expected   128.00, actual   115.00 ( 0.90x)
    Window at  18 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  19 - Testing collisions   - Expected   128.00, actual   127.00 ( 0.99x)
    Window at  20 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  21 - Testing collisions   - Expected   128.00, actual   134.00 ( 1.05x)
    Window at  22 - Testing collisions   - Expected   128.00, actual   130.00 ( 1.02x)
    Window at  23 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  24 - Testing collisions   - Expected   128.00, actual   149.00 ( 1.16x)
    Window at  25 - Testing collisions   - Expected   128.00, actual   137.00 ( 1.07x)
    Window at  26 - Testing collisions   - Expected   128.00, actual   121.00 ( 0.95x)
    Window at  27 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  28 - Testing collisions   - Expected   128.00, actual   108.00 ( 0.84x)
    Window at  29 - Testing collisions   - Expected   128.00, actual   140.00 ( 1.09x)
    Window at  30 - Testing collisions   - Expected   128.00, actual   144.00 ( 1.13x)
    Window at  31 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  32 - Testing collisions   - Expected   128.00, actual   127.00 ( 0.99x)
    Window at  33 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  34 - Testing collisions   - Expected   128.00, actual   109.00 ( 0.85x)
    Window at  35 - Testing collisions   - Expected   128.00, actual   110.00 ( 0.86x)
    Window at  36 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  37 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at  38 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  39 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  40 - Testing collisions   - Expected   128.00, actual   146.00 ( 1.14x)
    Window at  41 - Testing collisions   - Expected   128.00, actual   126.00 ( 0.98x)
    Window at  42 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  43 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  44 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  45 - Testing collisions   - Expected   128.00, actual   120.00 ( 0.94x)
    Window at  46 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  47 - Testing collisions   - Expected   128.00, actual   110.00 ( 0.86x)
    Window at  48 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at  49 - Testing collisions   - Expected   128.00, actual   145.00 ( 1.13x)
    Window at  50 - Testing collisions   - Expected   128.00, actual   133.00 ( 1.04x)
    Window at  51 - Testing collisions   - Expected   128.00, actual   131.00 ( 1.02x)
    Window at  52 - Testing collisions   - Expected   128.00, actual   130.00 ( 1.02x)
    Window at  53 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  54 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  55 - Testing collisions   - Expected   128.00, actual   124.00 ( 0.97x)
    Window at  56 - Testing collisions   - Expected   128.00, actual   136.00 ( 1.06x)
    Window at  57 - Testing collisions   - Expected   128.00, actual   113.00 ( 0.88x)
    Window at  58 - Testing collisions   - Expected   128.00, actual   114.00 ( 0.89x)
    Window at  59 - Testing collisions   - Expected   128.00, actual   142.00 ( 1.11x)
    Window at  60 - Testing collisions   - Expected   128.00, actual   132.00 ( 1.03x)
    Window at  61 - Testing collisions   - Expected   128.00, actual   144.00 ( 1.13x)
    Window at  62 - Testing collisions   - Expected   128.00, actual   122.00 ( 0.95x)
    Window at  63 - Testing collisions   - Expected   128.00, actual   119.00 ( 0.93x)
    Window at  64 - Testing collisions   - Expected   128.00, actual   125.00 ( 0.98x)
    
    
    [[[ Keyset 'Text' Tests ]]]
    
    
    Keyset 'Text' - keys of form "Foo[XXXX]Bar" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25101.00 ( 0.99x)
    Testing distribution - Worst bias is the  20-bit window at bit  21 - 0.032%
    
    
    Keyset 'Text' - keys of form "FooBar[XXXX]" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25445.00 ( 1.00x)
    Testing distribution - Worst bias is the  19-bit window at bit  10 - 0.019%
    
    
    Keyset 'Text' - keys of form "[XXXX]FooBar" - 14776336 keys
    Testing collisions   - Expected 25418.13, actual 25350.00 ( 1.00x)
    Testing distribution - Worst bias is the  20-bit window at bit  24 - 0.017%
    
    
    
    
    [[[ Keyset 'Zeroes' Tests ]]]
    
    
    Keyset 'Zeroes' - 65536 keys
    Testing collisions   - Expected     0.50, actual     0.00 ( 0.00x)
    Testing distribution - Worst bias is the  13-bit window at bit   6 - 0.393%
    
    
    
    
    [[[ Keyset 'Seed' Tests ]]]
    
    
    Keyset 'Seed' - 1000000 keys
    Testing collisions   - Expected   116.42, actual 999999.00 (8589.93x) !!!!!
    Testing distribution - Worst bias is the  17-bit window at bit   0 - 99.999% !!!!!
    
    
    *********FAIL*********
    
    
    
    
    Input vcode 0x73334fdc, Output vcode 0xe1f9123e, Result vcode 0x00000001
    Verification value is 0x00000001 - Testing took 1866.872407 seconds
    -------------------------------------------------------------------------------
    I hope this is useful.

    Edit: I looked over the Wikipedia page on universal hashing. Even though I came up with this hash through mixing and matching from other hash functions and playing with SMHasher, it resembles the hashes on that page and I think it could be justified by similar theory.

    With some modifications, I think you could make it a universal hash that has some guarantees against attack. Or, there could be ways to make it faster.
    One huge weakness of SMHasher is that it works at wrong level.
    Hash is as strong as its weakest component and components should be analysed independently.
    If the main loop has issues, good postprocessing can often hide this fact from SMHasher. Good preprocessing does it even better.
    It would be more useful if you posted results of the main loop and postprocessing independently. Each may get some failures because neither needs to be a perfect hash alone (main loop needs diffusion, postprocessing - distribution), but bundling them together can hide weaknesses in the important areas.

  17. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    nburns, it looks an excellent idea, especially if one needs a small hash

    universal hashing is about chhosing random hash on each run, i.e. you can multiply by random number. also you need to carefully choose construction algorithm in order to make mathematical proof of its properties
    On the wikipedia page for universal hashing, they give a universal hash on integers and vectors. In my algorithm, you could compute the 64-bit polynomial residue as now, but replace the single fmix64() with a universal hash on integers to get the mixed 32-bit result. The universal string hash on that page isn't much different than what I'm doing now, anyway, and you could possibly tweak that part, but it's probably more efficient to add universality at the end.

    I imagine that you are going to say that you could potentially lose some universality by using a non-universal function first. I think that the criterion for universality relates to there being a random-looking relationship between hashes and inputs, i.e. like applying a random permutation to the hashes that come out of some hash function. I don't think it specifies a special type of function to compress the entropy. The hash that's described for strings in the wikipedia article appears to be basically an ordinary polynomial hash function that's been slightly modified. My algorithm uses the same fundamental approach, i.e. convert the string to a number in some base and take the residue mod a number that's co-prime to the base (a polynomial). Its output is not universal, but it has an empirically good distribution with respect to collisions. In theory, you could extend the technique to generate as many significant bits as you like, and guarantee unique outputs for all strings under some arbitrary length (using multiple residues). The output in the form of a single integer, or a constant number of integers, could then be turned into a universal hash.
    Last edited by nburns; 9th August 2013 at 07:36.

  18. #18
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by m^2 View Post
    One huge weakness of SMHasher is that it works at wrong level.
    Hash is as strong as its weakest component and components should be analysed independently.
    If the main loop has issues, good postprocessing can often hide this fact from SMHasher. Good preprocessing does it even better.
    It would be more useful if you posted results of the main loop and postprocessing independently. Each may get some failures because neither needs to be a perfect hash alone (main loop needs diffusion, postprocessing - distribution), but bundling them together can hide weaknesses in the important areas.
    Thanks for taking a look at my results.

    Do you know of any specific kinds of bad hash functions that SMHasher fails to catch? It seems to find abundant flaws even in popular hashes that are considered good, like FNV. When I run the unprocessed intermediate values through it, it finds a large number of correlations. This information is not especially useful, because it's well known that a polynomial residue computation like I'm using will have patterns in its lower few bits, and this seems to trigger every kind of failure in SMHasher tests. The noise from that obscures any useful information. The patterns would be important, except that I'm using 64 bits for the residue and mixing it down to 32 bits for the hash. So there is room for some wasted bits in the unprocessed intermediate values. I'm not especially accomplished in number theory, but I think I have an okay enough grasp of how the main loop works to trust it somewhat, especially in combination with good results from SMHasher and field testing. What I'm not totally sure about is, for instance, how the magic number chosen for the polynomial/number base interacts with the other numbers, like the size of the field and the distribution of the input data. Consequently, I'm not sure how efficiently I'm using the field. As well, I have an intuitive grasp of what the mixing function I took from Murmurhash is doing, but I don't understand the particulars well enough that, say, I can predict how well it will work without even needing to do empirical testing.

    SMHasher seems like a fairly rigorous set of tests, to me (granted that I'm a relative newbie to hashing). I haven't examined its tests in great detail, but it seems to divide its efforts between feeding the hash function difficult data and counting collisions, and looking at pairs of bits in the output to detect correlations. Considering that a non-cryptographic, general-purpose hash function is judged by how often it causes collisions, this seems like a fundamentally sound approach. It needs to treat the functions as black boxes to some extent, because they are basically Turing machines.

    I take it that you have experience creating hash functions. I'm interested to hear what you have to say.

    Edit: I had an idea that seems to simplify a useful class of hash functions. Rather than think of an entire hash as an element over a field size 2^32 or a large prime, think of the individual bits as elements over a field size 2^5 or 2^6. If you repeatedly move each bit by a number coprime to this small field, it will iterate through the elements in a random-looking pattern. When all the bits move and are XORed together, they will produce complex and unpredictable patterns.
    Last edited by nburns; 9th August 2013 at 08:08.

  19. #19
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nburns View Post
    Thanks for taking a look at my results.

    Do you know of any specific kinds of bad hash functions that SMHasher fails to catch?
    Yes. A couple of mine that I haven't realeased because I found the issues early enough. And there's one that made some splash that I suspect is bad, but it's just a suspicion that I haven't verified (because I don't care enough), so I won't spread FUD.

    Quote Originally Posted by nburns View Post
    SMHasher seems like a fairly rigorous set of tests, to me (granted that I'm a relative newbie to hashing). I haven't examined its tests in great detail, but it seems to divide its efforts between feeding the hash function difficult data and counting collisions, and looking at pairs of bits in the output to detect correlations. Considering that a non-cryptographic, general-purpose hash function is judged by how often it causes collisions, this seems like a fundamentally sound approach. It needs to treat the functions as black boxes to some extent, because they are basically Turing machines.
    Fully automated testing with SMHasher is very easy and finds a number of common problems, but is not sufficient.
    One problem with SMHasher is that interaction between components can confuse it.
    There's another that I know, SMHasher tests nearly only on very small strings, which is nigh useless on hashes with medium-large state. Spooky has 96 bytes of state. There are 2 tests with words larger than that, a xor hash with state this large followed with good finalisation would nearly pass. And crypto hashes can get multiple KB of state. 256-byte xor hash would pass SMHasher. And this is a failure of the black box model; for a set of tests of any length there's a hash that will just accumulate all the input and blow out just a moment later.


    Quote Originally Posted by nburns View Post
    It seems to find abundant flaws even in popular hashes that are considered good, like FNV. When I run the unprocessed intermediate values through it, it finds a large number of correlations. This information is not especially useful, because it's well known that a polynomial residue computation like I'm using will have patterns in its lower few bits, and this seems to trigger every kind of failure in SMHasher tests. The noise from that obscures any useful information. The patterns would be important, except that I'm using 64 bits for the residue and mixing it down to 32 bits for the hash. So there is room for some wasted bits in the unprocessed intermediate values. I'm not especially accomplished in number theory, but I think I have an okay enough grasp of how the main loop works to trust it somewhat, especially in combination with good results from SMHasher and field testing. What I'm not totally sure about is, for instance, how the magic number chosen for the polynomial/number base interacts with the other numbers, like the size of the field and the distribution of the input data. Consequently, I'm not sure how efficiently I'm using the field. As well, I have an intuitive grasp of what the mixing function I took from Murmurhash is doing, but I don't understand the particulars well enough that, say, I can predict how well it will work without even needing to do empirical testing.
    Because FNV is simple and has small state, so it doesn't trigger any of the SMHasher issues above. Your hash has small state too, but is not simple.
    Test avalanching of the main loop alone. For a 32-bit hash, it is sufficient that in any achievable state any input word cases at least 32 bits of entropy. That is - it should flip no less than 32 and no more than (state_size - 32) bits. Note that your state is actually 128, not 64 bits.

    Quote Originally Posted by nburns View Post
    I take it that you have experience creating hash functions. I'm interested to hear what you have to say.
    I have some experience. But I still have a lot to learn.

    Quote Originally Posted by nburns View Post
    Edit: I had an idea that seems to simplify a useful class of hash functions. Rather than think of an entire hash as an element over a field size 2^32 or a large prime, think of the individual bits as elements over a field size 2^5 or 2^6. If you repeatedly move each bit by a number coprime to this small field, it will iterate through the elements in a random-looking pattern. When all the bits move and are XORed together, they will produce complex and unpredictable patterns.
    There are 2 main properties of a hash, confusion and diffusion. Non-crypto hashes need only the latter. Though the former can help in SMHasher. And if I understand the proposal correctly, it's just a xor hash with permutation of input - therefore as weak as xor hash itself.

  20. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by nburns View Post
    On the wikipedia page for universal hashing
    i don't greatly understand the theory, just thinks that if it will be possible to produce universal hash this way, it will be done many years ago. just sum up all bits/words in the input string and then apply mixing function

  21. #21
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by m^2 View Post
    Yes. A couple of mine that I haven't realeased because I found the issues early enough. And there's one that made some splash that I suspect is bad, but it's just a suspicion that I haven't verified (because I don't care enough), so I won't spread FUD.


    Fully automated testing with SMHasher is very easy and finds a number of common problems, but is not sufficient.
    One problem with SMHasher is that interaction between components can confuse it.
    There's another that I know, SMHasher tests nearly only on very small strings, which is nigh useless on hashes with medium-large state. Spooky has 96 bytes of state. There are 2 tests with words larger than that, a xor hash with state this large followed with good finalisation would nearly pass. And crypto hashes can get multiple KB of state. 256-byte xor hash would pass SMHasher. And this is a failure of the black box model; for a set of tests of any length there's a hash that will just accumulate all the input and blow out just a moment later.
    SMHasher uses heuristics to guess what kinds of inputs will trigger collisions and what kinds of collisions to look for (in some cases, at least), so its results will naturally be only as good as these heuristics. Running the tests already takes a substantial amount of time, so making the test sequences much longer is probably not practical. The space of possible inputs obviously grows exponentially with increasing length, so a "try everything" type approach won't work. SMHasher seems to be general enough to catch a substantial portion of flaws, though.

    Even without looking at the details of a particular function, you could assume some things about hashing and hash functions that would make the problem of catching them easier. For instance, that flaws can be deduced from bit correlations. If SMHasher finds that some input bit is highly correlated with some particular bit of output, that is a strong indication that the function is weak, and will have excess collisions over specific types of input. A large number of common kinds of flaws will produce correlated bits, and it isn't necessary to know which was the cause. If you suppose something like an 80-20 rule, most flaws will produce simple patterns and be relatively easy to detect.

    Because FNV is simple and has small state, so it doesn't trigger any of the SMHasher issues above. Your hash has small state too, but is not simple.
    Test avalanching of the main loop alone. For a 32-bit hash, it is sufficient that in any achievable state any input word cases at least 32 bits of entropy. That is - it should flip no less than 32 and no more than (state_size - 32) bits. Note that your state is actually 128, not 64 bits.

    The other number that's calculated is actually part of the rolling version, and it doesn't serve much of a purpose otherwise. I just used it instead of the length to add entropy, because its value depends on the length and it's available, whereas the basic interface to the hash adds bytes one at a time and doesn't take a length parameter. AFAIK, the length works equally well.


    I have some experience. But I still have a lot to learn.


    There are 2 main properties of a hash, confusion and diffusion. Non-crypto hashes need only the latter. Though the former can help in SMHasher. And if I understand the proposal correctly, it's just a xor hash with permutation of input - therefore as weak as xor hash itself.
    The idea is to consider the bits of an integer to be on a finite field, with computations done using modular arithmetic. Thinking of bits over a small field is simpler than thinking of integers over a large field, but you'd be solving a similar problem, only once for each bit. The most obvious difference when using this technique is that bit shifts would wrap around, i.e. they'd become rotations.

    Last night I thought this through all the way to its logical conclusion. I think the most convenient implementation would place the bits on a field mod a prime, rather than a power of two. The whole scheme seems to work out quite neatly with the use of a Mersenne prime, such as 2^31-1. I have already encountered mentions of using Mersenne primes as fields, so it's not an original idea.

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The problem with strengthening a weak hash is that any collisions in the weak hash will still be there after strengthening.

    For example, suppose you use a Rabin hash over a window of size N like H := H + c0*M - cn*(M^N) (mod 2^64) where c0 is inserted into the window and cn is removed. You get a collision whenever c1 is increased by k and c0 is decreased by k*M. If M is small, like 3, then you get many collisions in text like "go", "fr", "eu" etc. among others. If you use a large random odd M, then you probably already have a good hash that doesn't need to be strengthened.

  23. #23
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    i don't greatly understand the theory, just thinks that if it will be possible to produce universal hash this way, it will be done many years ago. just sum up all bits/words in the input string and then apply mixing function
    For the first hashing step, you'd have to use a real hash function, not junk, because the universal hashing step only has this input to work with. It can't undo collisions.

    I think the universal properties would be fully intact as long as the first step kept producing a unique hash for each input, but any collision that happened in the first function would be preserved as an artifact in the final result, and I don't know enough about universal hashing to say how consequential that would be.

    Actually, I'm not sure I understand your use case for hashing very well. What I think you're doing is generating hashes from user files in the course of running some algorithm, and the hashes will only be used for the duration of the algorithm and not stored. So the same data need not generate identical hashes across multiple runs, and a hash function can be chosen randomly for each run. You also, in addition, want some protection in case an attacker has placed something in the data that would try to cause trouble by manipulating the hash function. A cryptographic hash function isn't necessary, so you want to use a faster hash function that is random and universal as a lightweight way to prevent attacks.

    I'm not sure if what I've stated is correct, partly because it doesn't seem to me to make complete sense.

    I never previously thought that universality in hash functions was related in any way to being resistant to attack. I thought that the sole purpose was to be able to choose good hash functions using randomized methods, the way randomized methods are used by quicksort, etc.
    Last edited by nburns; 10th August 2013 at 07:26.

  24. #24
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The problem with strengthening a weak hash is that any collisions in the weak hash will still be there after strengthening.

    For example, suppose you use a Rabin hash over a window of size N like H := H + c0*M - cn*(M^N) (mod 2^64) where c0 is inserted into the window and cn is removed. You get a collision whenever c1 is increased by k and c0 is decreased by k*M. If M is small, like 3, then you get many collisions in text like "go", "fr", "eu" etc. among others. If you use a large random odd M, then you probably already have a good hash that doesn't need to be strengthened.
    That's why I chose 273 to multiply with. Conceptually, it's supposed to be the base of a number that you're constructing mod 2^k, so it seems like it ought to be larger than any digit, but not too much larger. If it was too large the number would contain room for extra digits that would never be used, which seems like a bad thing. Later, I tried 257, which, since it's 2^8 + 1, should be possible to compute extremely fast, while hopefully still mixing up the number by just enough.

    It's interesting that the particular number chosen for this parameter seems to matter much less than you might think. 257 seems like it could be a poor choice for many reasons. For one thing, there's only one nonzero bit other than 1, and it's shifted by exactly the width of one byte. If you multiplied by 256 instead, it would basically destroy the field and the hashes would be predictably terrible.

    But I haven't seen dramatic differences among pretty much any odd numbers I've tried, and I've tried a handful. At least, not when there is a step at the end that carefully mixes all the bits, such as the one I took from murmurhash and use here. If you aren't going to use such a function, and you multiply by a small number, then you'll find that the last one or two characters you added will remain stuck in the low bits, because M^1 or M^2 or so are still a lot smaller than 2^32 or 2^64. I'm not sure whether I've tried this, but it should be about the same if you just multiply by the small number a few more times. If you want to see what that does to the bits, it's easy. You can look at successive powers of this number in binary using the calculators built into to both windows and mac.

    Using only multiplications and/or additions will never do much to the lower bits, though. They will always remain a simple combination of the inputs. I think the problem is that addition and multiplication are asymmetrical. They move bits left very efficiently, but only left.

  25. #25
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    nburns,

    1. i need a hash function that prevents 1) occasional collisions and 2) deliberate collisions. for the first it's enough to use any good hash with enough size, second requires either using cryptographic hash, or i can use a MAC - as far as MAC key is computed AFTER all input data are established (so that attacker can't modify input data in order to break MAC with this particular key value). any universal hashing may be used as MAC

    2. afaiu, universal hashing needs to use the key that's at least as large as input data

    btw, look at http://emboss.github.io/blog/2012/12...-dos-reloaded/

  26. #26
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Bulat,

    Thanks for the link to that article.

    What would be the purpose of the attack? Denial of service?

  27. #27
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nburns View Post
    SMHasher uses heuristics to guess what kinds of inputs will trigger collisions and what kinds of collisions to look for (in some cases, at least), so its results will naturally be only as good as these heuristics. Running the tests already takes a substantial amount of time, so making the test sequences much longer is probably not practical. The space of possible inputs obviously grows exponentially with increasing length, so a "try everything" type approach won't work. SMHasher seems to be general enough to catch a substantial portion of flaws, though.
    I don't agree with the last statement. It's fairly easy to trick and some hashes help themselves f.e. by making alphabet transformations. The right way is to not use a one-size-fits-all evaluation tool, but make ones specific to each hash. Some generalisation is possible, but black box testing can only give you assurance of weakness, not of strength.

    Quote Originally Posted by nburns View Post
    Even without looking at the details of a particular function, you could assume some things about hashing and hash functions that would make the problem of catching them easier. For instance, that flaws can be deduced from bit correlations. If SMHasher finds that some input bit is highly correlated with some particular bit of output, that is a strong indication that the function is weak, and will have excess collisions over specific types of input. A large number of common kinds of flaws will produce correlated bits, and it isn't necessary to know which was the cause. If you suppose something like an 80-20 rule, most flaws will produce simple patterns and be relatively easy to detect.
    I guess my statement above fits here well.


    Quote Originally Posted by nburns View Post
    The other number that's calculated is actually part of the rolling version, and it doesn't serve much of a purpose otherwise. I just used it instead of the length to add entropy, because its value depends on the length and it's available, whereas the basic interface to the hash adds bytes one at a time and doesn't take a length parameter. AFAIK, the length works equally well.
    Yeah, you're right. I guess your hash doesn't have 32-bit diffusion then.


    Quote Originally Posted by nburns View Post
    The idea is to consider the bits of an integer to be on a finite field, with computations done using modular arithmetic. Thinking of bits over a small field is simpler than thinking of integers over a large field, but you'd be solving a similar problem, only once for each bit. The most obvious difference when using this technique is that bit shifts would wrap around, i.e. they'd become rotations.

    Last night I thought this through all the way to its logical conclusion. I think the most convenient implementation would place the bits on a field mod a prime, rather than a power of two. The whole scheme seems to work out quite neatly with the use of a Mersenne prime, such as 2^31-1. I have already encountered mentions of using Mersenne primes as fields, so it's not an original idea.
    Yeah, just be weary not to step into the Adler trap of failing to distinguish between input words.

  28. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by nburns View Post
    What would be the purpose of the attack? Denial of service?
    we've already discussed that in the Cyan hash topic. attacker can provide trojan.avi that has the same hash as antivirus.exe so deduplication of entire HDD and the following extraction will replace antivirus.exe with trojan contents

  29. #29
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    2. afaiu, universal hashing needs to use the key that's at least as large as input data
    If the attacker could reverse engineer any deterministic (non-cryptographic) hash, then I think you're right. You'd have to use the universal hash on the original input.

  30. #30
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by m^2 View Post
    Fully automated testing with SMHasher is very easy and finds a number of common problems, but is not sufficient.
    One problem with SMHasher is that interaction between components can confuse it.
    There's another that I know, SMHasher tests nearly only on very small strings, which is nigh useless on hashes with medium-large state. Spooky has 96 bytes of state. There are 2 tests with words larger than that, a xor hash with state this large followed with good finalisation would nearly pass. And crypto hashes can get multiple KB of state. 256-byte xor hash would pass SMHasher. And this is a failure of the black box model; for a set of tests of any length there's a hash that will just accumulate all the input and blow out just a moment later.
    I don't agree with the last statement. It's fairly easy to trick and some hashes help themselves f.e. by making alphabet transformations. The right way is to not use a one-size-fits-all evaluation tool, but make ones specific to each hash. Some generalisation is possible, but black box testing can only give you assurance of weakness, not of strength.


    It would probably be useful if you coded up and published some demonstrably poor functions that beat SMHasher to demonstrate these weaknesses. Even if they're impossible to solve in principle, demonstrating them would be valuable.

    Getting people confused tends to be pretty easy. Actually, that's why automatic testing with few assumptions is important. If you had to write specific tests for a particular function, it would be easy to get caught up in the same error of thinking that fooled the inventor of the hash.

    Because FNV is simple and has small state, so it doesn't trigger any of the SMHasher issues above. Your hash has small state too, but is not simple.
    Test avalanching of the main loop alone. For a 32-bit hash, it is sufficient that in any achievable state any input word cases at least 32 bits of entropy. That is - it should flip no less than 32 and no more than (state_size - 32) bits. Note that your state is actually 128, not 64 bits.


    The difference between FNV and my main loop is insignificant, I think. FNV fails a bunch of tests and mine passes, but I believe the only difference is the final mixing function and possibly the fact that I do 64-bit computations. FNV ought to pass all the tests with these modifications.

    Yeah, just be weary not to step into the Adler trap of failing to distinguish between input words.


    You're referring to Adler-32? I just looked it up on wikipedia. As a hash function, it's way too simple. You would not have to even run SMHasher; the issues are really obvious.
    Last edited by nburns; 20th August 2013 at 02:11.

Page 1 of 2 12 LastLast

Similar Threads

  1. Fast CRC table construction and rolling CRC hash calculation
    By Bulat Ziganshin in forum Data Compression
    Replies: 10
    Last Post: 27th May 2018, 03:32
  2. Hash / Checksum algorithm considerations
    By Cyan in forum Data Compression
    Replies: 61
    Last Post: 16th June 2017, 00:28
  3. Directory hash as one string
    By FatBit in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 17th January 2012, 00:29
  4. Fastest non-secure hash function!?
    By Sanmayce in forum Data Compression
    Replies: 13
    Last Post: 20th November 2010, 20:54
  5. Hash Zip
    By Black_Fox1 in forum Forum Archive
    Replies: 6
    Last Post: 4th March 2007, 18:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •