Sign in to follow this  
taz0010

Need a good hash function

Recommended Posts

I want to map sets of 3 integers into the range 0-n with as few collisions as possible. The function I'm currently using is:
return (0x60000005*x + 0xd8163841*y +0x6b4a8be7*z) % numBuckets;
The constants are prime numbers, which is what the Realtime Collision Detection textbook suggests. Now if I hash every permutation of x,y,z in the range 0-1000 (so 1 billion different combinations), I end up with a very even distribution. But the values of x,y,z typically come in multiples of 10 or 100. e.g. {700,200,-800}, etc. This hash function exhibits really awful bias in this case. When the multiples of 100 up to 1000 are chosen, and hashed to the range 0-47, only multiples of 4 are chosen! Although the distribution among the buckets which were chosen is very even. How can I go about creating a hash function that produces even distributions, especially in the case where the 3 input keys are multiples of a power of ten? I'd also like to avoid the use of modulo if it's at all possible.

Share this post


Link to post
Share on other sites
Well, have you tried different prime numbers? These are the ones I like:

return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) % numBuckets;

You can get rid of the modulo if the number of buckets is a power of two, in that case:

return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) & (numBuckets-1);

Share this post


Link to post
Share on other sites
The problem with your algorithm is that it gives a disproportionate amount of weight to the lower bits of your inputs. To see what I mean, consider this question: if you have two numbers a and b and a is even, is there any value for b that when you multiply a and b you get an odd number? This is what's going on in your function. The lower bits of x, y and z can affect the upper bits of the hash before you mod it, but the upper bits of x, y and z can't affect the lower bits of the hash. This is compounded by the fact that a modulus also gives disproportionate weight to the lower bits if its input as well. A quick band-aid is to add in some terms that right shift some of your values. Ex: add in (x >> 2), (y >> 4) or (z >> 9).

Share this post


Link to post
Share on other sites
SiCrane is right. What I think is hapening


(P1 * x + P2 * y + P3 * z) % numBuckets

but, x ant y and z is multiples of (in this example) 100, so:

x = a * 100, y = b * 100, z = c * 100

(P1 * a * 100 + P2 * b * 100 + P3 * c * 100) % numBuckets

((P1 * a + P2 * b + P3 * c) * 100) % numBuckets

So lets say numBuckets is 48, which has primes 2^4 * 3 as primes
and 100 which has 5^2 * 2^2

((P1 * a + P2 * b + P3 * c) * (5^2 * 2^2)) % (2^4 * 3)

(((P1 * a + P2 * b + P3 * c) * 5^2) % (2^2 * 3)) * 2^2

Which always has a multiple of 4!


Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.

I assume you have original coordinates in floating point, so you should be able to convert them like this:


int x = static_cast<int>(fx * grid_ratio);


where float grid_ratio = 1 / grid_size.

Share this post


Link to post
Share on other sites
I'm getting better results with this:

return (((0xcb1ab31f*x) >> 16)^x + ((0xd8163841*y) >> 16)^y + ((0x6b4a8be7*z) >> 16)^z) % numBuckets;


Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.

Share this post


Link to post
Share on other sites
You could also use 64-bit arithmetic, like this (gcc):
  return (x*860953607428367533ull
+y*6504631217750812297ull
+z*14401978177910355779ull)%numBuckets;





[Edited by - alvaro on March 16, 2010 1:09:04 PM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this