Results 1 to 12 of 12

Thread: Modern fast hashes with CRC-like ability to get desired output?

  1. #1
    Member
    Join Date
    Apr 2019
    Location
    Home
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Modern fast hashes with CRC-like ability to get desired output?

    With CRC you can change a few bytes or bits in a stream to get any desired output value (ref 1, ref 2).


    Are there any modern fast hashes with a similar ability?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Its called hash collision, you can get "desired output" from any hash function, simply because it has limited output size, but unlimited input.
    Of course, for cryptographic hashes its quite expensive. (See http://shattered.io/)
    But any of the "fast hashes" can be cracked.
    As to the method, its like this:
    1) See how https://github.com/Shelwien/sudoku-cbmc works
    2) Replace condition which has to be satisfied to reach "match" label with hash function of your choice
    3) Run CBMC

  3. #3
    Member
    Join Date
    Apr 2019
    Location
    Home
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    I don't mean brute forcing. The above refs on CRC are about directly calculating the needed data modification.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    As a demo, here I did xxhash64: http://nishi.dreamhosters.com/u/xxhash64_0.rar
    Code:
    Z:\xxhash64_0>test.bat
    aaaaaaaaaaaaaaaa  sample1   8 bytes               0.828s
    deadbeef0badf00d  sample2   8 bytes               0.813s
    deadbeef0badf00d  sample3  12 bytes charset=[A-z] 10.515s
    0123456789abcdef  sample4  16 bytes charset=[0-Z] 8670.453s
    Its tricky, because CBMC is not very stable, but its possible.
    For example, finding 8 bytes of data for a given hash value without charset limit takes 0.8s here.

  6. The Following User Says Thank You to Shelwien For This Useful Post:

    DernmIl (4th April 2019)

  7. #6
    Member
    Join Date
    Apr 2019
    Location
    Home
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    Interesting, thanks! I'll have to study this.

    How would the the real data stream be fed in? (In the middle of which the calculated bytes should be inserted.)
    If a given hash algorithm can be solved, does this imply it's solvable for any input data and target hash?
    Will solve time remain similar regardless of inputs?

    Found this example of using also an SMT solver:
    https://blog.lse.epita.fr/articles/2...g-algorit.html

  8. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > How would the the real data stream be fed in?
    > (In the middle of which the calculated bytes should be inserted.)

    CBMC's intended use is something like a fuzzer.
    You write whatever algorithm, add some assert directives,
    and it finds conditions where these asserts fail.

    So in theory you can simply calculate whatever you want
    (have to include all the known data in the source though),
    use some of CBMC ways to leave some unknown variables,
    then let it work... or even just generate a CNF system for C/C++ source
    and feed it to an external SAT solver.

    But in practice CBMC is quite unstable... for example
    I had to get rid of memset,memcpy and _rotl64 in xxhash64 - CBMC
    accepts them, but ends up working incorrectly for some reason.

    On other hand, its still easier than manually generating
    input for a solver.

    > If a given hash algorithm can be solved,
    > does this imply it's solvable for any input data and target hash?

    That depends on specific hash implementation.
    Some fast hashes are lossy, so it might be not possible to find data
    for a given hash value.

    But in general any hash is solvable, only complexity is different.
    So for fast hashes with relatively low complexity its much easier
    to do than for cryptographic hashes.

    > Will solve time remain similar regardless of inputs?

    I tried a few xxhash values with 8-byte unlimited input and time seems the same.
    But it goes much slower with a limited charset.

    Still, in general, the answer is no.
    For example, you can make a composite hash which uses either crc64 or xxhash64
    based on input parity.
    Hash values that are mapped via crc64 would be then faster to crack.

  9. #8
    Member
    Join Date
    Apr 2019
    Location
    Home
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    That depends on specific hash implementation.
    By "target hash" I didn't mean different hash functions. I meant the same hash function, just different desired output values.

    Some fast hashes are lossy
    Lossy in what sense? Aren't they all?

    Quote Originally Posted by Shelwien View Post
    (have to include all the known data in the source though)
    That's the problem.

    Assuming solvers run through the whole input multiple times, that's not suitable for large inputs, no? Say 1GB of data.

    Suppose the source data is ordered as follows:
    [X: a specific piece of data]
    [Y: n bytes reserved, fixed length, for modification by the solver]
    [Z: a specific piece of data]

    In the case of xxhash, I think it should be possible to precalculate X by hashing the bytes, and keeping the internal state instead of the final mixed hash. Then, that could serve as the initial state for solver. Though you might have to assure the length of X is a multiple of the stripe size, which is doable, but less desirable.

    But what about Z? If you precalculate the hash for it, wouldn't the input to the solver need to include a way to go from xxh(XY) and xxh(Z) to xxh(XY+Z)?
    (related: https://github.com/Cyan4973/xxHash/issues/125)

  10. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    >> That depends on specific hash implementation.
    > By "target hash" I didn't mean different hash functions.
    > I meant the same hash function, just different desired output values.

    I meant that too.
    Sure, in cases that I actually tested, solver time was the same for different hash values.
    But specifically for fast hashes its quite possible that some values would be
    less frequent (or totally impossible).

    >> Some fast hashes are lossy
    > Lossy in what sense? Aren't they all?

    For datasize<=hashsize a good hash should be lossless.
    But for example, suppose a hash contains a step that looks like this: "x-=x>>5;"
    It maps 0..0xFFFFFFFF to 0..0xF8000000, so its lossy.

    > Suppose the source data is ordered as follows:
    > [X: a specific piece of data]
    > [Y: n bytes reserved, fixed length, for modification by the solver]
    > [Z: a specific piece of data]

    > In the case of xxhash, I think it should be possible to precalculate X by
    > hashing the bytes, and keeping the internal state instead of the final
    > mixed hash. Then, that could serve as the initial state for solver.

    Yes

    > Though you might have to assure the length of X is a multiple of the stripe
    > size, which is doable, but less desirable.

    You can just move unaligned part of X to Y.
    It would still have to be processed by hash function,
    but we can declare it as known data for SAT.

    > But what about Z?
    > If you precalculate the hash for it, wouldn't the input
    > to the solver need to include a way to go from xxh(XY) and xxh(Z) to xxh(XY+Z)?

    Yes.
    Its possible for crc, but not for most other hashes.
    So in general, processing of Y and Z parts has to be included in the SAT script.

    There're two known optimizations though:

    1) Only steps depending on data from Y have to be performed in SAT script.
    So if hash function independently computes a block hash first, then combines
    it with accumulated hash value, it means we can skip this part in SAT script
    and just provide precomputed block hash value.
    This method should be applicable for xxhash, for example.
    Actually SAT tools would be usually able to optimize away independent parts on their own,
    but it makes sense as a speed optimization too.

    2) Sometimes reverse calculation is possible.
    That is, since hash value and data are both known, we might be able to
    process hash function steps in reverse order.
    Its usually possible to do at least for a small number of steps even with cryptohashes.
    But for some hash functions it may be quite as well possible to reach Y from the other side.

    In particular, its probably possible with xxhash, since steps like these are reversible:
    Solve[Mod[11400714785074694791*x,2^64]==1,Integers] -> x=614540362697595703+(2^64)*n
    Mod[11400714785074694791*614540362697595703, 2^64] == 1
    f[x_] := Mod[x*11400714785074694791 + 9650029242287828579, 2^64]
    g[x_] := Mod[x*614540362697595703 + 8565294228366677947, 2^64]
    g[f[123]]==123

    To avoid manual work it might make sense to resolve this by multiple runs of the same SAT.
    That is, we can solve hash(state,data)==value for "state" multiple times
    (to avoid feeding GBs of data at once to SAT), until we get "state" for a point close to Y.

  11. #10
    Member
    Join Date
    Apr 2019
    Location
    Home
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    But for example, suppose a hash contains a step that looks like this: "x-=x>>5;"
    It maps 0..0xFFFFFFFF to 0..0xF8000000, so its lossy.
    If a hash gets 10 in SMHasher, isn't it safe to rule out some output bits never getting used?

    1) Only steps depending on data from Y have to be performed in SAT script.
    But in the current case the problem isn't that the bulk hashing is called thru the SAT solver (it's not going to insert itself in the middle the tight loop), it's that there's a whole lot of bulk hashing to do.

    In particular, its probably possible with xxhash, since steps like these are reversible:
    Solve[Mod[11400714785074694791*x,2^64]==1,Integers] -> x=614540362697595703+(2^64)*n
    Mod[11400714785074694791*614540362697595703, 2^64] == 1
    f[x_] := Mod[x*11400714785074694791 + 9650029242287828579, 2^64]
    g[x_] := Mod[x*614540362697595703 + 8565294228366677947, 2^64]
    g[f[123]]==123
    What tool did you use for this Solve[]?

    To avoid manual work it might make sense to resolve this by multiple runs of the same SAT.
    That is, we can solve hash(state,data)==value for "state" multiple times
    (to avoid feeding GBs of data at once to SAT), until we get "state" for a point close to Y.
    Could you elaborate?

  12. #11
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    31
    Thanked 22 Times in 15 Posts
    Quote Originally Posted by DernmIl View Post
    If a hash gets 10 in SMHasher, isn't it safe to rule out some output bits never getting used?


    But in the current case the problem isn't that the bulk hashing is called thru the SAT solver (it's not going to insert itself in the middle the tight loop), it's that there's a whole lot of bulk hashing to do.


    What tool did you use for this Solve[]?


    Could you elaborate?
    It is probably Mathematica. You can also use Z3 prover.

  13. The Following User Says Thank You to algorithm For This Useful Post:

    DernmIl (19th April 2019)

  14. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > If a hash gets 10 in SMHasher, isn't it safe to rule out some output bits never getting used?

    Not really.
    For example, what would happen if we'd feed a modified cryptohash to SMHasher?
    Something like "md5a(x) = (md5(x)==0) ? 1 : md5(x);"
    I don't think its possible to notice without actually checking the whole input range,
    and statistics would be good otherwise.

    > But in the current case the problem isn't that the bulk hashing is called thru the SAT solver

    Its actually a problem, because CBMC is not optimized for this kind of tasks.
    It theory, adding more operations on constants shouldn't affect SAT system complexity,
    but in practice, CBMC is not that efficient.

    > What tool did you use for this Solve[]?

    Wolfram Mathematica: https://www.wolframalpha.com/input/?...D1,Integers%5D

    But you actually just as well can use CBMC for this (see attach).
    The trick with byte array is necessary because CBMC won't print the solution otherwise.

    > Could you elaborate?

    Well, I'm just proposing to split the task.
    Suppose we have a hash function:
    state' = hash(state,block);
    And the system we're trying to solve is something like this:

    sX=hash(init_state,X);
    sY=hash(sX,Y);
    sZ=hash(sY,Z);

    Here sX can be simply precalculated.
    And sZ,Z are also known, but we normally don't have an inverse function for the hash.
    But we still can solve "sZ=hash(sY,Z);" with SAT separately, just to find the sY value.
    And then solve "sY=hash(sX,Y);", where Y is unknown, which makes the dependency much more complex.

    Comparing to that, solving "sZ=hash(sY,Z);" for sY should be mostly immediate,
    even though we may have to split it further, depending on CBMC capability,
    but in any case, the complexity should be linear on Z size.
    Attached Files Attached Files

  15. The Following User Says Thank You to Shelwien For This Useful Post:

    DernmIl (19th April 2019)

Similar Threads

  1. Fast CRC table construction and rolling CRC hash calculation
    By Bulat Ziganshin in forum Data Compression
    Replies: 10
    Last Post: 27th May 2018, 03:32
  2. CRC used to figure out compression algorithm
    By Omnikam in forum The Off-Topic Lounge
    Replies: 13
    Last Post: 29th February 2016, 03:21
  3. Precomp CRC error
    By tanerjames in forum Data Compression
    Replies: 0
    Last Post: 19th April 2015, 19:10
  4. Modern BWT speed
    By GerryB in forum Data Compression
    Replies: 4
    Last Post: 5th May 2009, 17:28
  5. Fast arithcoder for compression of LZ77 output
    By Bulat Ziganshin in forum Forum Archive
    Replies: 13
    Last Post: 15th April 2007, 17:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •