Why don't you tune your hash to the data, instead of using some random values, prime or not?
Look at this source:
http://cm.bell-labs.com/sources/plan...late/deflate.c
Especially, I was interested in a hash function. This implementation uses multiplicative hashing:
<div class="jscript"><pre>
#define hashit(c) (((ulong)(c)&0xffffff)*0x6b43a9b5)>>(32-HashLog)) </pre></div>
The value 0x6b43a9b5 (1799596469) is not even a prime, but I tested such constant with BALZ for 3-byte hashing and it is one of the best so far.![]()
Why don't you tune your hash to the data, instead of using some random values, prime or not?
I do. I carefully test the multiplier on the real files, collecting various performance data!Originally Posted by Shelwien
![]()
the value shouldnt be a prime. primeness is just some sign of rather good multipliers i think that this multiplier will be not much different to your one. actually, once ive tested your prime against ideal hashing and found a very little differenceOriginally Posted by encode
multiplication scheme has only one serious cabeat - highest bits of hashed value affects only highest bits of resulting hash value. smth like this:
(x+(x>>16))*123456791
may improve hashing quality
> highest bits of hashed value
> affects only highest bits of resulting hash value. smth like this:
> (x+(x>>16))*123456791
> may improve hashing quality
(x+(x>>16)) ~= (x+x/65536) = x*(65537/65536)
x*(65537/65536)*123456791 = x*123458674.801132
So I'm not so sure about that.
>> Why don't you tune your hash to the data
> I do. I carefully test the multiplier on the real files,
> collecting various performance data!
Somehow I think that you just manually try out and compare some
selected hash functions/parameters.
And I was talking about
1. Log the memory access pattern for some dataset
with unhashed contexts for addresses
(reads/writes; also other parameters like value size
and offset in hash cell)
2. Measure the processing speed for all 2^32 multiplier values
(or including other hash functions too), keeping some number
of best results.
3. Test the real compressor performance with these stored
best hash configurations.
Actually, in my case the hash function is not so extremely important. I hash 24-bit value to 1M hash HEAD. Its more than enough number of hash buckets. Since I use hash chains, hash function should do good enough separation and spread all 16M values to 1M hashes - to minimize hash chain branches. I kept multiplicative hashing. I tested the multiplier in next manner - I keep only HEAD for string searching (i.e. no further traverse on PREV) and measure the compression ratio on various files.Originally Posted by Shelwien
![]()
Well, I was talking about hash speed.
And that depends mostly not on the distribution of indexes,
but on the alignment and clustering of accesses.
Your explanation didn't contain anything about that.
why? it solves problem of distributing influence of higher value bits among all hash bitsOriginally Posted by Shelwien
I mean it doesn't look any different from simply changing the multiplier value.
Here i really like Matt's solution, since it obviously does what you want.
you may cosider this as an idea of using non-integer multipliers![]()