Page 1 of 3 123 LastLast
Results 1 to 30 of 62

Thread: Hash / Checksum algorithm considerations

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts

    Hash / Checksum algorithm considerations

    Hi


    I'm currently trying to evaluate a few hash algorithms to be used as "checksums" for error detection.
    While CRC32 is supposed to be a good choice, it is not in this specific circumstance, because it is too slow,
    much slower than the decoding process in fact.

    I've been testing MurmurHash for this usage, and been experimenting with a few examples of my own.

    Thanks to SMHasher, a few desirable properties have been verified,
    but i was wondering if there was any other tool to check the strength of error detection algorithm.

    Quick though :
    while checkum and hash algorithm are very close relatives,
    it seems hashes look for maximum "spreading", while checksum try to ensure "zero collision" as long as error distance is below a certain level.
    SMHasher seems mostly meant to test the first case.

    [Edit] Link to article : http://fastcompression.blogspot.fr/2...algorithm.html

    Regards

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    So what you don't like in my idea to rely on LZ structure for integrity checking?
    In that case its not necessary to store/compute any hashes at all.

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Shelwien:
    Would your method eg detect single bit errors in uncompressed literal streams?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Quote Originally Posted by comment on Cyan's blog
    I still think that there's no need to add weak crcs to ensure
    block integrity. For a format where there's a risk of literals
    getting modified without affecting the match structure, I'd suggest
    to add some fast encryption instead -
    (like xor with previous encrypted byte, or rc4)
    then accidental changes would mess up the whole block and you'd
    notice it.
    Some options which can be used for integrity checks:
    - strict block size. The encoder can make sure that each block
    is compressed separately (no matches overlapping the block boundary).
    - unused distance (or length) range. Some interval of distance
    values (eg. 64k-256..64k) can be intentionally left unused and it
    would be possible to treat it as error to detect broken blocks.
    - end-of-block code.

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    .
    Last edited by Piotr Tarsa; 30th April 2012 at 23:20. Reason: double post

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Hmm, assuming it's a XOR with previous byte, then bit flips at the same position in two consecutive literals would be unnoticed. Likewise for match/ distance lengths with different even parity. So that would be very weak.

    RC4 is two times slower than RC4 on 32-bit systems. On 64-bit ones probably difference is bigger.

    Checksumming has the added advantage that it can be completely separated from (de)compression code and that checksum computation can be deferred without rewriting the whole stream.

    Overall, your method isn't strictly superior and reliable checsumming would be much easier to add and maintain.


    it seems hashes look for maximum "spreading", while checksum try to ensure "zero collision"
    IIRC Sanmayce once had created a thread with comparison of hash algorithms. He compared both performance and number of collisions. http://encode.ru/threads/1160-Fastes...-hash-function! (too bad he didn't compare the third version of MurmurHash). The differences in number of collisions were usually pretty small so I think you can safely use MurmurHash.

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    My approach is similar to hashing, but
    1) its not a hash, so the encryption can be simpler (or absent at all, in case when entropy coding is used)
    2) there's no redundancy from storing a hash per block

    rc4 is just an example, I didn't know about Cyan's 1GB/s+ speed requirements at that time.
    But it would be really easy to implement encryption based on any hash function
    (the gain is not having to store the hash value), or just some 64-bit shift+add

  8. #8
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Yes, indeed, speed is a strong requirement in this specific case.

    If the Shelwien's encryption scheme can be fast enough, then it's definitely a candidate.
    On first look though, i tend to agree that checksuming look simpler to implement and maintain.
    The cost of block hash is probably a concern for excellent-compression programs, but for fast ones, it is likely negligible.

    If you look at the code of xxhash, you may notice that it's very light,
    essentially some bit rotation and simple additions, alongside a few multiplications.
    So it's not much more than Xor'ing.
    I guess it could even be vectorized, since it works on 4x32=128 bits register per loop.
    From a speed perspective, it satisfies my current need.

    Now, from an Error detection perspective, i'm not sure if it is good enough.
    I would like to establish that with more testbed than "only" SMHasher,
    which is indeed very good, but do not focus that much on error detection scenarios.

    I also remember the long posts from Sanmayce on this.
    Thanks for the link, Piotr.
    At some point, another "benchmark" suite for hashes is mentioned, which could be useful for improved evaluation.

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    The following is a post from Maciek Adamczyk (m^2)

    ********************************************


    I have some thoughts about hashing that I'd like to share. Though I
    can't use my private email or encode.ru account because my PC keeps the
    passwords and it is not functional ATM...
    So it goes:

    -Would be nice if you compared xxhash to CityHash and the latest Jenkins
    creation as well as Sanmayce's functions. They are supposed to be faster
    than murmur too.

    -64-bit version of xxhash? Wouldn't it be even faster, especially the
    strong one?

    -Shelwein's idea of using encryption...My first day reaction was "It's
    wrong. It's based on added redundancy which is the opposite goal of
    compression algos." Second: "Hmm, but LZ is redundant anyway. It may
    offer even more redundancy than you get by slapping a couple of bytes at
    the end of a file. And the idea is actually cool." Third: "There is a
    problem. Stream ciphers won't work well near the end of a stream,
    there'll be too little redundancy left to do some meaningful
    checksumming. So block ciphers with large enough blocks. But what about
    the tail block? Maybe block size should be adjusted, so all blocks are
    equally sized and for too small ones a regular checksum should be used?
    Or padding? And what's large enough anyway? You'd have to calculate how
    much redundancy do you add with LZ4 structure. Possibly, like Shelwein
    suggested, add more of it in some way. Either way it looks complicated."

    -I think that whatever checksumming scheme you'll choose, it should be
    integrated with LZ4. With the current xxhash fast you'll get ~15%
    slowdown on decompression. Isn't that a lot? Doing the hashing while the
    data is still in cache or registers is going to be much faster. So IMO
    you should either do the hashing calculations together with compression
    or provide hooks that call the hash function frequently enough for the
    data to still be in L1. The first option looks to be the faster. The
    second is more flexible and less intrusive, wouldn't significantly
    clutter the code. Either should be possible to make removable with #ifdefs.

    -AFAIK no fast compression library offers integrated checksumming
    capabilities. Many users need it. I think that offering an integrated
    solution is a good idea.

    -cbloom did some checksum testing. You may want to take a look if you
    haven't already

    -As to benchmarks of hash functions....this is one of my goals with
    fsbench. compression+encryption+checksuming or any subset of it.
    Actually some of my recent refactoring was supposed to enable it. But
    either way, it's unlikely to be ready fast enough for you. I'd love to
    see how would LZ4 with integrated checksumming stack against LZO with
    external.

    One more thought on murmur
    Did you compare your hash to 128 bit x64 variant of murmur?
    It's supposed to be much faster...

    Would be cool if you posted this mail to encode.ru

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I would opt for decoupled scheme. I would go further and support multiple hash algorithms. Newest processors have built-in CRC32C support that achieves superb performance. Wonder how it would work with HyperThreading - one virtual core doing (de)compression and one virtual core doing CRC32C computation.

    Also I would add space reservation for multiple hashes, eg when finishing a compressed stream (flushing remaining data to disk) I would add a small space for a few hash records. One hash record would be comprised of hash identifier and hash value. That would allow deferring the computation of hashes.

    Here's a comparison of CRC32C computing algorithms: http://www.strchr.com/crc32_popcnt
    Here's something fast (as far as I understand): http://code.google.com/p/crcutil/

  11. #11
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    A few answers :


    compared xxhash to CityHash and the latest Jenkins creation as well as Sanmayce's functions. They are supposed to be faster than murmur too.
    Lookup3 is supposed to be Bob Jenkin's latest version. It reached 1.2 GB/s on my test machine. Not bad, but less than MurmurHash.
    The tested version is provided directly within SMHasher source code, i could also try to use the version on Bob's website, just in case...

    CityHash is also provided within SMHasher.
    I did not provide the speed result because there are only 64 & 128 bits versions.
    But well, obviously one can cut the result to 32 bits.
    Just for information, the speed of CityHash64 is about 1 GB/s.

    As for Sanmayce, the latest version i'm aware of is this :
    http://encode.ru/threads/1160-Fastes...ll=1#post23022

    Since it's based on FNV, and FNV doesn't really shine speed-wise, i guess there is a limit to how much Sanmayce could improve it.



    64-bit version of xxhash? Wouldn't it be even faster, especially the strong one?
    Good point.
    Yes, it would be faster.

    The problem is that 32-bits / 64-bits / 128-bits are not "the same version" with different optimizations :
    they produce different hash and are not compatible.
    So, for data transmission, whenever the sender use 64-bits, the receiver must to the same.

    It happens that my initial objective was merely to find a CRC32 replacement for speed.
    I was not looking for 64-bits because :
    1) It increases the cost of checksums
    2) 64-bits is fast to compute on modern strong CPU, such as Intel iCore, but there are many other target systems which will not like it.
    So, for this use case, since LZ4 is used in a broad range of devices, i wanted to limit calculations to 32-bits registers.

    Now, outside of this use case,
    a 64-bits version could have some value...


    Shelwein's idea of using encryption... (...) Either way it looks complicated.
    I agree.
    First it affects directly the compression code, resulting (obviously) in a not compatible output.
    Second, as a consequence, it has an impact on code maintenance.
    And third, more crucially, there is no guarantee it's going to be faster. In fact, i would bet the contrary. (Shelwien, please feel free to prove me wrong ! )



    I think that whatever checksumming scheme you'll choose, it should be integrated with LZ4.
    With the current xxhash fast you'll get ~15% slowdown on decompression. Isn't that a lot?
    (...)
    So IMO you should either do the hashing calculations together with compression
    or provide hooks that call the hash function frequently enough for the data to still be in L1.
    Mmmh, okay, i had not thought to integrate it that much.
    The first version would have about the same consequences on the code as Shelwien's proposition.
    The second one looks better, but even there, the modification is not trivial, and impact on code maintenance is sensible.
    Also, i simply challenge the assumption that it is going to be really faster than a simpler straightforward 2-pass scheme.
    I mean, i would not like to make the code a lot heavier to maintain just to gain 1%. It must be much better than that.



    AFAIK no fast compression library offers integrated checksumming capabilities.
    Many users need it. I think that offering an integrated solution is a good idea.
    I take note. I wasn't aware there was such a need. If that's the case, there is indeed a scarcity of solutions around.



    cbloom did some checksum testing. You may want to take a look if you haven't already
    I did, that was very pleasant reading as usual.
    I nonetheless had the feeling that it was an "amateurish" evaluation. Not a bad one, but still, a kind of empirical evaluation.
    SMHasher looks much cleaner, with well-defined and complete list of tests.

    The real issue it that it does not target Error-detection, and therefore provide no test for "guaranteed detection distance".
    Which by the way is understandable given that MurmurHash provides no such guarantees.


    As to benchmarks of hash functions....this is one of my goals with fsbench.
    Good news !
    This will indeed help assessing if "internal hash" is really better.


    Did you compare your hash to 128 bit x64 variant of murmur? It's supposed to be much faster...
    I did not, for the reasons mentioned above.
    But then, if there is any interest in using xxHash in such modes, then obviously i would.



    Here's a comparison of CRC32C computing algorithms: http://www.strchr.com/crc32_popcnt
    Yep, I agree. Hardware CRC is likely to get excellent speed results.
    The real issue is that it ties the calculation to a specific architecture, which then has to be "emulated" on other systems.
    It's okay for "local-only data", but for transported one visiting different systems, it does not look terrible.

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    crc32c is the same crc32, only with different constant. so in lack of hardware support, it would has exactly the same speed as crc32

    Cyan, are you studied requirements of your potential users to control sum algorithm? for archiver i know that people doesn't consider 32-bit hashes strong anymore. i also think that they will prefer well-known and widely tested algo such as crc to new unknown algos. so, ignoring speed, crcr64/md5/tiger and so on would be the best choice

  13. #13
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Hi Bulat

    Yes, i agree on all points.

    But LZ4 is not used as an archiver.
    It's more useful for "short-lived' data, typically for real-time i/o scenarios, and that's where i understand there is a need for checksum.

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i think that user needs for such scenario would be essentially the same, but less strong (both because data are short-lived and because it should be very fast)

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I took a look at xxhash code and I have a feeling that it's kinda inconsistent. You avoid 64-bit maths, so it works fast on weak machines. OTOH as far as I understand it, the main thing that makes xxhash so fast is that you decouple the data into 4 independent streams that can be processed in parallel by an OoO core with multiple ALUs. Weak CPUs rarely have them. So while it doesn't hurt small CPUs, it makes your benchmark results irrelevant for them.
    Maybe I missed something and it has some other purpose?

    Also, a question:
    Why 4 streams? Did you choose the number experimentally? Or is just tweaked for new Intel CPUs?

  16. #16
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Good point.

    The independant stream strategy comes from experience with Huff0.
    The objective was not to target OoO core, although it's correct it does help.
    The most important objective was to avoid writing repetitively at the same memory address.
    Even when data is into cache, there is a "commit delay" between 2 successive writes at the same memory address. By spreading the result over 4 memory addresses, it avoids the commit delay issue.
    I merely "guess" that the commit delay issue is universal, for any CPU with a write-back cache (which is quite common nowadays).

    4 streams was selected "on feeling". It simply looked good. And proved to be.
    When implementing the algorithm, i also noted that 4x32 bits = 128 bits,
    which open the way to potential optimisations with Multiple-Instructions 128-bits registers, such as SSE2+.
    So it strengthened the choice.

    Since then, i've seen other algorithms which tried to specifically target OoO optimisations with multiple ALUs.
    SpookyHash, for example, empirically determined a different (higher) optimal number of streams. The problem is that such approach target a specific CPU type. The assumptions tend to be less valid for different CPU types.

    xxHash is still very young, and can benefit from useful comments and suggestions. I guess the algorithm can still accomodate a few changes at this stage, before becoming "frozen". So don't hesitate if you have any questions or contributions.

    Regards

  17. #17
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Hmmm...where do you have 2 writes to the same part of memory in the loop? What I see there is memory reads and register writes.

  18. #18
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    I guess you are right.
    The comparison with Huff0 is flawed.

    Huff0 was writing into (small) tables, therefore the compiler had no choice but to effectively write the result at memory address.
    In contrast, within Hash formula, the result may stay into a local register.
    So long for write-back performance issue.

    So the increased gain may come mainly from OoO execution on multiple-ALU cores.

  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    what about using murmur-like information exchange (look at line ends):

    v1 += XXH_LE32(p) * PRIME32_2; v1 = XXH_rotl32(v1, 13); v1 *= PRIME32_1; p+=4; v1+=v2;
    v2 += XXH_LE32(p) * PRIME32_2; v2 = XXH_rotl32(v2, 13); v2 *= PRIME32_1; p+=4; v2+=v3;
    v3 += XXH_LE32(p) * PRIME32_2; v3 = XXH_rotl32(v3, 13); v3 *= PRIME32_1; p+=4; v3+=v4;
    v4 += XXH_LE32(p) * PRIME32_2; v4 = XXH_rotl32(v4, 13); v4 *= PRIME32_1; p+=4; v4+=v1;

    in particular, it may allow to return 128-bit hash result. 32 bits for the file hash is considered as not enough secure by many users nowadays

  20. #20
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts

    I have unpublished versions which are doing just that.
    They are meant to produce 64-bits and 128-bits outputs from 32-bits arithmetics.

    I've not yet published them due to limited needs, but well, this can change if there is.

    Note that there are 2 issues to properly check when entangling inputs :
    1) There are dependencies which can slow down parallel execution
    2) More importantly, it can create some "resonance" effects, making the hash formula weak against some form specific inputs.

    Fortunately, all this can be properly tested.
    Last edited by Cyan; 11th February 2013 at 21:21.

  21. #21
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    moinakg code forced me to finally learn those MMX-style things in respect to integer operations, we have the following:
    • MMX (Pentium MMX, 1997) handles 64-bit integer vectors
    • SSE2 (Pentium 4, 2000) handles 128-bit integer vectors
    • SSE4 (Core 2 Wolfdale, 2007) implemented 4*32 multiply
    • AVX2 (Haswell, 2013) will handle 256-bit integer vectors, including 8*32 multiply

    More details:
    • 64/128/256-bit vectors support means only simplest operations, such as add/xor/logical shift
    • Bit rotations aren't implemented at all, so it may be emulated by lshift+rshift+add
    • SSE2 has 2x(32*32->64) multiplication, SSE4 has 4x(32*32->32) and AVX2 has 8x(32*32->32)

    So, the speed of xxHash-like computation:
    • X86/x64 handle 4 bytes using 5 operations: load, mul, add, rot, mul
    • SSE2 handle 16 bytes in 13 ops: load, shuffle+mul+mul+masked move, add, lshift+rshift+add, shuffle+mul+mul+masked move
    • SSE4/AVX2 handle 16/32 bytes in 7 ops: load, mul, add, lshift+rshift+add, mul


    Finally, i taken a look into Bulldozer XOP:
    • 128- and 256-bit operations
    • Rotations!
    • Integer multiply-accumulate (IMAC)

    So, it seems that Bulldozer handle 256 bits in 4 operations: load, imac, rot, mul. Of course, to make room for superscalar execution, we may need to further slice input data

    Finally, Intel MIC has 512-bit MM registers, but i don't investigated its instruction set
    Last edited by Bulat Ziganshin; 9th April 2013 at 21:02. Reason: AVX2 details was wrong

  22. #22
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Nice summary. Haswell will add RORX and MULX, useful for hashing: http://software.intel.com/en-us/blog...-now-available
    Haswell will also provide FMA3 but that is only for FP. There is no integer equivalent like IMAC in AMD. Intel is of late focusing on providing FP features first in it's simd instructions followed by integer ones later. AMD tends to provide both when introducing new features.
    Last edited by moinakg; 16th February 2013 at 17:41.

  23. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    rorx/mulx is the same as ror/mul, only don't touch EFLAGS, they can't change anything for xxHash. but i studied this intel document and found that my understanding was wrong - AVX2 includes the following operation:

    VPMULLD or __m256i _mm256_mullo_epi32(__m256i a, __m256i b)

    which means that we can do 8*32 multiplications and therefore processing of 256 bits for xxHash requires only 7 operations. i.e. your SSE4 code, just working with 256-bit registers

    PS: thank you, i edited AVX2 details in the above post
    Last edited by Bulat Ziganshin; 16th February 2013 at 21:46.

  24. #24
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i've found this in someone's list of "10 most important last year algorithms": Strongly universal string hashing is fast

    We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they requires a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.

    Sources: http://code.google.com/p/variablelengthstringhashing/

  25. #25
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Here it does 3.4 cycles per 'element' in their benchmark on long strings. Though it's faster than FNV1 / 1a in their benchmark. I'll try to integrate it with fsbench...

    ADDED: And they assume to have an infinitely fast random number generator. That is - they store a random string 6x larger than the hashed word and pass to a function.
    ADDED2: Hmm...how about compile-time generated constant array? When security is not important it should work OK.
    Last edited by m^2; 20th February 2013 at 00:26.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    afair, we will have biltin PRNG in Haswell and it probably will be very fast

    since it's an academic paper, i'm interested in their girantees and "strongness". may be they have some ways to check perfectness of hash? i don't read paper yet

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    paper describing general ideas behind extremely fast CRC computation
    article
    and paper describing how to compute CRC32C with the speed of 1.2 ticks/16bytes (i.e. 50 gbyte/s per core)
    paper describing how to compute CRC32/CRC64 with the speed of ~3-4 ticks/16bytes using PCLMULQDQ instruction
    paper describing how to compute CRC32/CRC64 with the speed of 1.2 ticks/byte and CRC128 - 1.7 ticks/byte
    repository with code from the last paper
    Last edited by Bulat Ziganshin; 3rd April 2013 at 15:29.

  28. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    [deleted.]
    Last edited by Bulat Ziganshin; 9th April 2013 at 21:03.

  29. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Cyan View Post

    I have unpublished versions which are doing just that.
    They are meant to produce 64-bits and 128-bits outputs from 32-bits arithmetics..
    can you please share it? i need an extremely fast hash for new srep compression algorithm

  30. #30
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Sure. Let me have a look into that codebase.
    I remember having done a few iterations, testing and mixing 64-bits and 32-bits arithmetic,
    and ended with the result that 64-bits arithmetic is definitely too slow for 32-bits CPU/OS.

    Are you rather more interested into 128-bits or 64-bits output ?

Page 1 of 3 123 LastLast

Similar Threads

  1. Directory hash as one string
    By FatBit in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 17th January 2012, 00:29
  2. Fastest non-secure hash function!?
    By Sanmayce in forum Data Compression
    Replies: 13
    Last Post: 20th November 2010, 20:54
  3. my file compression considerations
    By JB_ in forum Data Compression
    Replies: 2
    Last Post: 5th May 2008, 19:47
  4. PREDICTOR algorithm
    By encode in forum Forum Archive
    Replies: 30
    Last Post: 16th February 2008, 19:28
  5. Hash Zip
    By Black_Fox1 in forum Forum Archive
    Replies: 6
    Last Post: 4th March 2007, 18:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •