Results 1 to 9 of 9

Thread: Question for Cyan

  1. #1
    Member
    Join Date
    Aug 2014
    Location
    Overland Park, KS
    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Question for Cyan

    Cyan, I have a question. what is the LZ4 Equivalent of Fast Bytes and Memory In Deflate
    Last edited by calthax; 3rd April 2015 at 02:06.

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Hi Calthax

    Which API are you referring about ?

    Fast Bytes is not part of Deflate API (http://www.zlib.net/manual.html).
    I found one such argument on SevenZip, but then, it's a command line argument.
    I suspect it get transformed into something like "good_length", from deflateTune, but there is no guarantee.
    I "guess" that it's a kind of early-out when a certain match length has been reached.
    There is no such equivalent within LZ4 (it doesn't make sense).
    Maybe one could be added within LZ4 HC later on.

    For memory, I found multiple references, with multiple meanings, so it's hard to tell which one you are talking about.
    Within deflateInit2(), there is a parameter memLevel. It might be this one. (it could also be windowBits).
    I suspect memLevel directly decides the amount of memory for the hash table in front of the hash chain table.
    An equivalent would be LZ4_MEMORY_USAGE, within lz4.h.
    But it's a macro constant, not a function argument, so it's static for a compilation.

    An equivalent for LZ4 HC is DICTIONARY_LOGSIZE, within lz4hc.c.

  3. #3
    Member
    Join Date
    Aug 2014
    Location
    Overland Park, KS
    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Also I have another question what is the LZ4 Equivalent of Distance

  4. #4
    Member
    Join Date
    Nov 2015
    Location
    Mumbai
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Cyan, I want a fast hashing algorithm and I came across your XXHASH. When I run it, it is giving the output as: 3212befb226fb138 'filename'

    What does it mean? I tried looking into code and couldn't understand. My assumption is something went wrong and it is showing some error code. Am I correct? If yes, where did it go wrong? If not, please tell me what that is.

    I also want to know where the hash values gets stored in case of succesful hash operation, because I want to use this to hash string columns in my database and see if I can speed up the select query performance using hashing.

    Please throw some light.

  5. #5
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    `3212befb226fb138` is the 64-bits output of `XXH64()` for `filename`

    If you give it a list of files, it will output one line per file.
    ```
    3212befb226fb138 filename1
    abcdef1234567890 filename2
    ...
    ```

    If you want to test speed on a given file, you can do :
    `./xxhsum -b filename`

    If you want the 32-bits checksum of the file (instead of the default 64-bits one) :
    `./xxhsum -H0 filename`

    For more help type :
    `./xxhsum -h`

  6. #6
    Member
    Join Date
    Nov 2015
    Location
    Mumbai
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    My usecase is that i want to hash a column of values and store hash-keys alone. By this I am trying to see if the select queries gets faster as we search hash-keys instead of actual values. So I need a program which takes an array of strings, hash it and give me the array of hash keys. Will this be possible with your xxhash? If yes, can you tell which function should I use from your code.

    TIA.

  7. #7
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    > Will this be possible with your xxhash?

    Yes

    > If yes, can you tell which function should I use from your code.

    `XXH64()`

    The logic to go through the array of strings and build a array of hashes will be in your program.
    `XXH64()` will only provide one-way transform from one string to one hash.

  8. #8
    Member
    Join Date
    Nov 2015
    Location
    Mumbai
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Will XXHASH assure that any two unique strings given to xxhash produce two unique hash values?

    What is the chance of collison in xxhash?

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    The pingeon hole principle ensures that no hash in the world, including cryptographically secure ones, can guarantee to always give 2 different hashes to 2 different inputs.

    That being said, the probability that 2 different files accidentally generate the same hash value can be made extremely small.

    This table provides good idea of what extremely small means :
    Click image for larger version. 

Name:	proba_comparison.png 
Views:	150 
Size:	47.9 KB 
ID:	4209

    Bottom line : as long as the amount of files you want to compare is <= ~million, 64-bits hash (XXH64) is good enough, as the chances for a single random collision over the entire set are very low.
    If you target much more than that, 128 bits will be better.

  10. The Following User Says Thank You to Cyan For This Useful Post:

    Gonzalo (22nd March 2016)

Similar Threads

  1. Question about fpaq0 I/O
    By ggf31416 in forum Data Compression
    Replies: 15
    Last Post: 16th March 2014, 20:50
  2. tar replacement for Cyan
    By Shelwien in forum Data Compression
    Replies: 80
    Last Post: 9th February 2013, 01:41
  3. lzp question
    By sourena in forum Data Compression
    Replies: 4
    Last Post: 5th February 2012, 18:24
  4. LZC Question
    By moisesmcardona in forum Data Compression
    Replies: 3
    Last Post: 16th August 2009, 22:33
  5. RVLC Question
    By pessen in forum Data Compression
    Replies: 3
    Last Post: 11th July 2009, 03:29

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •