Results 1 to 23 of 23

Thread: Superfast hashing algorithm

  1. #1
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts

    Superfast hashing algorithm

    Greetings to all,
    I'm currently looking for an extremely fast hashing function which doesn't require cryptographic quality, exhibits good distribution and more importantly extreme speed, producing 32-bit hashes.
    I know XXHash, MurmurHash, FNV etc. I'm more interested in something even simpler than those hash functions, that still provides a fair distribution and has other qualities such as those tested in SMHasher.
    The purpose of it would be a superfast hashsum integrity check similar to CRC32 (speaking of CRC32 it is apparently now an integrated assembly instruction on some recent processor, does anyone have a performance benchmark for that ?).
    If I can't find anything I'll probably design my own, but before that I'd like to know if anyone is aware of such an algorithm ? Thanks for your ideas.

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    There's no need to reinvent the wheel!

    CRC32 (Slice-by-8 implementation) for fast processing, good all-rounder, might be not the best idea for short strings
    Custom implementation of Adler32 (modulo delay count optimization trick), is the fastest, extremely poor performance on short strings
    Custom FNV-like stuff - fast enough, good for short strings

    In other words, it also depends on what kind of stuff you about to hash!

  3. The Following User Says Thank You to encode For This Useful Post:

    gpnuma (20th January 2015)

  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    If you are asking about data compression purposes, ANS entropy coding allows to add checksum for free: you just fix the initial state and test if it you got this value after finishing decoding.
    After an error, the state performs a complex random walk e.g. in 32 bit state space for rANS.

  5. The Following User Says Thank You to Jarek For This Useful Post:

    gpnuma (20th January 2015)

  6. #4
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Well the "stuff to hash" will be chunks of 64-bytes random data.
    The hash function should be able to have series of chunks of 64 bytes as an input, and output a 32 byte unsigned integer which is "unique" (least collisions possible) depending on the data provided.

    For what I want CRC32 and Adler32 are way too slow...
    I know I can always revert to the famous (value * big_multiplier) >> (32 - necessary_bytes) hash function, but I was wondering if there was anything with comparable or slightly slower speed and better characteristics, as it will be used to check data integrity.
    Last edited by gpnuma; 20th January 2015 at 16:42.

  7. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    typical data size (dozens/hundreds/millions bytes), computing environment (x86/x64/arm/...), required speed (1/10/100 GB/sec)? purpose - integrity checking?

  8. #6
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Typical data size : million bytes
    Computing environment : all (compilers have to be able to optimize the code depending on platform so no ASM).
    Required speed : maximum, *but* the algorithm has to ensure data integrity with a fair statistical certainty, i.e. it should be *extremely* rare that changing a few bytes of input would give the same hash as output.

    Yes it's for integrity checking as you have said.

  9. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    umac-32: up to 256 GB/sec employing all 4 cores on i7-4790K. it has cryptograhic quality as far as seed is unknown to attacker. otherwise, it just has better quality than any other non-crypto hash. drawbacks is large codebase, large startup time, and simd code required for fastest implementation. on modern intel cpus, it runs at 16 bytes/cycle using AVX2, 8 bytes/cycle with SSE2 and 4-8 bytes/cycle with scalar x86/x64

  10. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    gpnuma (20th January 2015)

  11. #8
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    umac-32: up to 256 GB/sec employing all 4 cores on i7-4790K. it has cryptograhic quality as far as seed is unknown to attacker. otherwise, it just has better quality than any other non-crypto hash. drawbacks is large codebase, large startup time, and simd code required for fastest implementation. on modern intel cpus, it runs at 16 bytes/cycle using AVX2, 8 bytes/cycle with SSE2 and 4-8 bytes/cycle with scalar x86/x64
    Thanks a lot this sounds very interesting. Large codebase ? that's not really a problem.
    When you say "large startup time" you mean it takes a lot of cycles for processors to optimize (branching etc.) the algorithm ? Or that it takes a lot of parameters setting initially ?
    I'm looking for something which is fast on a single thread by the way.
    Would you have a link to the C source code for UMAC32 ?

  12. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    http://en.wikipedia.org/wiki/UMAC
    http://fastcrypto.org/umac/

    it needs a lot of time to generate tables, and then a lot of time to "finalize" hash for each file being processed. it makes it slower than xxhash for files less than 1 KB

    btw, i want to use it too, so we can cooperate in polishing it for modern compilers. it was developed ~10 years ago and still lacks f.e. avx2 version

  13. #10
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    xxhash looks good for your purpose: https://code.google.com/p/xxhash/

  14. #11
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    umac-32: up to 256 GB/sec employing all 4 cores on i7-4790K. it has cryptograhic quality as far as seed is unknown to attacker. otherwise, it just has better quality than any other non-crypto hash. drawbacks is large codebase, large startup time, and simd code required for fastest implementation. on modern intel cpus, it runs at 16 bytes/cycle using AVX2, 8 bytes/cycle with SSE2 and 4-8 bytes/cycle with scalar x86/x64
    I have tried to reproduce your speed results, but it's slower here. 1573 MB/s @ 3.2 ghz or 0.5 bytes/cycle. Juggling compilers doesn't help. I remember you proposed to tweak some parameter in another thread, it doesn't help.

    Still, uhash is a great choice and I second Bulat in this recommendation.


    As for others proposing xxhash, I have lots of reserve for its quality.
    The reason is an extremely week main loop. Try to extract it and test for statistical quality. It's very bad.
    I believe that it passes a sanity test called smhasher largely because its 4 parallel loops and finalisation phase manage to confuse it AND because that was the specific design goal, the author has kept tweaking until it passed. If one is into hashes this fast, I recommend Spooky, for 3 reasons:
    * it has some semi-theoretical background, though the author uses some hand waving in his proofs
    * the author is a hashing veteran
    * the QA the author has done is well documented

  15. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    now i have time to check it myself, so i will report my results tomorrow

  16. #13
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Maciej,
    I remember we had this discussion one or two years ago.

    If the point is that it's possible to create an input specifically designed to confuse the inner loop and maximize collision, the answer is yes, you are right, it's possible.
    This is true for any non-cryptographic hash function, including Spooky hash.

    Being non-cryptographic is not the same as being totally useless.
    There are many situations where the hash is just required to produce a reasonably random output,
    and where there is no reason to believe it will have to deal with external attacks.
    Or where such attack is rather pointless.


    Someone expecting to deal with untrusted input data, which could be used as an attack vector to deliberately increase collisions, typically in order to slow down the target,
    must turn its attention to cryptographic quality hashes. And shall pay an appropriate performance price for it.

  17. #14
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    First of all thanks to everyone for all the ideas. I'm going to check out Uhash32 and spookyhash as well, I did not know those two.
    Something I was wondering, did anyone benchmark google's farmhash (https://code.google.com/p/farmhash/) ? It's not really what I want because their philosophy is apparently to multiply the codebase to suit every platform, whereas I'd like to use the same codebase which could be optimized by any compiler on any platform, but nevertheless it sounds interesting.

  18. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    uhash is the internal part of umac that you really need. i refer to it as umac mainly because it should be easier to google

  19. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    Maciej,
    I remember we had this discussion one or two years ago.

    If the point is that it's possible to create an input specifically designed to confuse the inner loop and maximize collision, the answer is yes, you are right, it's possible.
    This is true for any non-cryptographic hash function, including Spooky hash.

    Being non-cryptographic is not the same as being totally useless.
    There are many situations where the hash is just required to produce a reasonably random output,
    and where there is no reason to believe it will have to deal with external attacks.
    Or where such attack is rather pointless.


    Someone expecting to deal with untrusted input data, which could be used as an attack vector to deliberately increase collisions, typically in order to slow down the target,
    must turn its attention to cryptographic quality hashes. And shall pay an appropriate performance price for it.
    Since that talk I have revised my position. Regardless of how weak an algorithm is, if it's a pareto frontier in strength-speed, there may be use for it.
    Nevertheless IMO xxhash is a good choice for a very narrow set of applications because I believe in Spooky's strength much more and the speed difference is small enough to be important only for very few.

  20. #17
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Code:
    m% ./fsbench -b32768 farmhash32 farmhash64 farmhash128 spookyhash murmur3_x64_128 murmur3_x86_128 murmur3_x86_32 fsbench
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    FarmHash32                              SVN r4       
        2710268 (x 1.000)     2372 MB/s 2378 MB/s       32e15 5994e12
    FarmHash64                              SVN r4       
        2710600 (x 1.000)     6539 MB/s 6611 MB/s       32e15 2254e12
    FarmHash128                             SVN r4       
        2711264 (x 1.000)     7004 MB/s 7092 MB/s       32e15  17e152
    SpookyHash                              V2 2012-08-05 
        2711264 (x 1.000)     1109 MB/s 1116 MB/s       32e15  21e15
    murmur3_x64_128                         2012-02-29   
        2711264 (x 1.000)     2248 MB/s 2254 MB/s       32e15   9e15
    murmur3_x86_128                         2012-02-29   
        2711264 (x 1.000)     1702 MB/s 1706 MB/s       32e15  32e15
    murmur3_x86_32                          2012-02-29   
        2710268 (x 1.000)     1701 MB/s 1711 MB/s       32e15 2651e12
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    done... (4*X*1) iteration(s)).
    If farmhash is reasonably strong, call me impressed.

    ADDED: BTW something's wrong with Spooky. It should have been ~3 times faster. No time to check it now though.
    Last edited by m^2; 22nd January 2015 at 17:55.

  21. The Following User Says Thank You to m^2 For This Useful Post:

    gpnuma (22nd January 2015)

  22. #18
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I updated xxhash in fsbench, so for completeness:

    Code:
    m% ./fsbench -b32768 xxhash xxhash64 fsbench
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    xxhash                                  r39          
        2713266 (x 1.000)     2774 MB/s 2774 MB/s       32e15  32e15
    xxhash64                                r39          
        2713266 (x 1.000)     5097 MB/s 5092 MB/s       32e15  32e15
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    done... (4*X*1) iteration(s)).

  23. #19
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by m^2 View Post
    I updated xxhash in fsbench, so for completeness:

    Code:
    m% ./fsbench -b32768 xxhash xxhash64 fsbench
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    xxhash                                  r39          
        2713266 (x 1.000)     2774 MB/s 2774 MB/s       32e15  32e15
    xxhash64                                r39          
        2713266 (x 1.000)     5097 MB/s 5092 MB/s       32e15  32e15
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    done... (4*X*1) iteration(s)).
    Hey m^2 thanks for your benchmarks.
    I've changed my requirements since density could take huge data sets as input (the faster, the more interesting on bigger data sets), and I'm now looking for a 64 bit hashsum function instead of 32 bit.
    Here's where I'm at right now :

    MurmurHash 64

    It's fast (2.7 bytes/cycle here equating to 8GB/s at 3GHz) but it is not the fastest and does not exhibit one if the best distributions either (although it's excellent don't get me wrong).
    It has been well tested and there is a Hash DOS attack known for it.

    XXHash 64

    Very fast but it's a derivative of the Murmurhash era (series of MUL/ROT on the same word). It is very good on SMHasher but there are concerns about the main loop's statistical quality, and the fact that it was optimized for SMHasher makes me uneasy.
    The fact that it's recent also means it hasn't really been battle-tested so there might be some yet-to-be discovered flaws, especially as it's a derivative of Murmur.

    FarmHash 64

    The successor of CityHash 64.
    The fastest of all ! It does great on SMHasher again, but as the author states in a very laudable manner :
    We may have gone slightly overboard in trying to please SMHasher and othersimilar tests, but we don't want anyone to choose a different hash function
    because of some minor issue reported by a quality test."
    Again, it was optimized against SMHasher and therefore has an edge over its competitors on the best known benchmark which makes me a bit uneasy.
    It is also a derivative of CityHash which had a simple Hash DOS demonstrated on it.

    SpookyHash 64 <---- good choice IMO but requires big endian coding

    Very fast as well (5 bytes/cycle here equating to 14 GB/s at 3GHz). It is based on ADD/ROT/XOR on 12 words. It has been under development since 2010 and integrated in SMHasher in 2012 so I'm pretty sure it has not been optimized at all for SMHasher but still reaches the maximum score which is impressive !
    It doesn't exhibit any obvious flaws such as a simple Hash DOS possibility (http://crypto.stackexchange.com/ques...ble-to-hashdos).
    It seems like a great choice for what I want, however the existing codebase is only little endian and I need both little and big endian, I'd need to develop big endian conversions and looking at the source code it's probably not that simple.

    SipHash 64 <---- good choice IMO but requires optimization

    Now this one is one-of-a-kind as it is stated as cryptographic. It exhibits speeds close to other non-cryptographic hashes (0.9 bytes/cycle here equating to 2.5 GB/s at 3GHz).
    And it adds another function which is very interesting and that is security : it is virtually impossible to forge a document to obtain the same hash.
    So not only could it be used to check data integrity, but also it could be used to check data validity or data origin (MD5 style - bad example I know).
    That's actually worth a small speed penalty I think, so if it could be optimized to run a little bit faster this algorithm would be absolutely perfect !

  24. #20
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by gpnuma View Post
    the existing codebase is only little endian and I need both little and big endian, I'd need to develop big endian conversions and looking at the source code it's probably not that simple.
    Such conversions are usually trivial. Simply, every time you load from the buffer, byteswap and before you save - byteswap again.
    The performance cost is tiny, all CPUs that I tried did byteswaps in practically no time.
    Quote Originally Posted by gpnuma View Post
    SipHash 64 <---- good choice IMO but requires optimization

    Now this one is one-of-a-kind as it is stated as cryptographic. It exhibits speeds close to other non-cryptographic hashes (0.9 bytes/cycle here equating to 2.5 GB/s at 3GHz).
    0.9 cycle/byte? What CPU and what implementation?
    I played *a lot* with this hash, implemented it at least 5 times in 3 languages, benchmarked a dozen of other implementations, tweaked several. The fastest on my CPU was < 2.4 GB/s at 3.2 ghz or 1.3 cycle/byte; all other fast implementations did 1.9-2.2 GB/s with only that one sticking out on 1 compiler. To be precise, Majkowski's code on gcc 4.7 did this; on clang and gcc 4.2 it was sub-par.
    0.9 c/b is possible, but don't assume that all your users will see such speeds.

  25. #21
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Such conversions are usually trivial. Simply, every time you load from the buffer, byteswap and before you save - byteswap again.
    The performance cost is tiny, all CPUs that I tried did byteswaps in practically no time.
    I've got to admit I haven't looked in detail. Maybe it's trivial. If that's the case this algo is a clear winner.

    0.9 cycle/byte? What CPU and what implementation?
    It's 0.9 bytes/cycle
    It's on a Core i7, source derived from csiphash (https://github.com/majek/csiphash.git). The compiler is gcc 4.7.2.
    Again I haven't looked the source in detail, maybe it's optimizable maybe it's not.
    But I find it already very fast for a cryptographic hash, if it can get slightly faster I think it is a serious threat to most non-crypto hashes.
    Last edited by gpnuma; 23rd January 2015 at 00:52.

  26. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    afair, i got 1.8 cycle/bytes speed with siphash-128. but siphash, like umac and vmac are not cryptograhic hashes - they are MACs i.e. they can generate unforgeable hash value as far as their secret key is unknown to attacker. cryptohash doesn't need secret key for that. afair, siphash has faster startup times so it's more useful for hashing short strings, while umac/vmac has better stream speed (you may look at my FARSH speeds since it employs umac's inner loop; vmac-128 is 0.4/1/5 cycles/byte for x64/x86-sse2/x86 codebase)

    all my measurements were done on sandy bridge or haswell cpus. when users posted farsh speed measurements on their cpus, i realized that simd instructions get a huge boost over last 10 years, so newer cpus can be 10x faster on SIMD code (while integer code got only 2x boost)

    and again - every non-crypto hash can be forged, otherwise these fast hashes will be used everywhere instead of SHA

    PS: core i7 isn't an architecture but marketing name for high-end intel cpus since 2008. there are 7 possible architectures under this moniker
    Last edited by Bulat Ziganshin; 20th September 2015 at 03:08.

  27. #23
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    Quote Originally Posted by gpnuma View Post
    SpookyHash 64 <---- good choice IMO but requires big endian coding
    It seems like a great choice for what I want, however the existing codebase is only little endian and I need both little and big endian, I'd need to develop big endian conversions and looking at the source code it's probably not that simple.
    I believe you wrote the conversion at https://github.com/centaurean/spookyhash, right?

    I did something similar for my own use (https://github.com/jibsen/spooky), also trying to make it endian neutral -- but haven't been able to test it since I only have access to x86 at the moment.

    I think your code is nearly there, except for having to read the 64- and 32-bit values in the switch at the end of spookyhash_short using a endian macro as well.

    The question is what the performance is going to be like ...

Similar Threads

  1. 64 GB/s aka 16 bytes/cycle universal hashing
    By Bulat Ziganshin in forum Data Compression
    Replies: 2
    Last Post: 10th April 2014, 01:33
  2. hashing LZ
    By willvarfar in forum Data Compression
    Replies: 13
    Last Post: 24th August 2010, 20:29
  3. What type of hashing is this.
    By Earl Colby Pottinger in forum Data Compression
    Replies: 11
    Last Post: 22nd June 2010, 05:23
  4. Knuth's Multiplicative Hashing
    By encode in forum Data Compression
    Replies: 5
    Last Post: 28th May 2008, 22:18
  5. A nice article about hashing
    By encode in forum Forum Archive
    Replies: 19
    Last Post: 26th September 2007, 21:31

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •