Need a good hash function

Started by
7 comments, last by taz0010 14 years, 1 month ago
I want to map sets of 3 integers into the range 0-n with as few collisions as possible. The function I'm currently using is:

return (0x60000005*x + 0xd8163841*y +0x6b4a8be7*z) % numBuckets;
The constants are prime numbers, which is what the Realtime Collision Detection textbook suggests. Now if I hash every permutation of x,y,z in the range 0-1000 (so 1 billion different combinations), I end up with a very even distribution. But the values of x,y,z typically come in multiples of 10 or 100. e.g. {700,200,-800}, etc. This hash function exhibits really awful bias in this case. When the multiples of 100 up to 1000 are chosen, and hashed to the range 0-47, only multiples of 4 are chosen! Although the distribution among the buckets which were chosen is very even. How can I go about creating a hash function that produces even distributions, especially in the case where the 3 input keys are multiples of a power of ten? I'd also like to avoid the use of modulo if it's at all possible.
Advertisement
Well, have you tried different prime numbers? These are the ones I like:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) % numBuckets;

You can get rid of the modulo if the number of buckets is a power of two, in that case:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) & (numBuckets-1);

The problem with your algorithm is that it gives a disproportionate amount of weight to the lower bits of your inputs. To see what I mean, consider this question: if you have two numbers a and b and a is even, is there any value for b that when you multiply a and b you get an odd number? This is what's going on in your function. The lower bits of x, y and z can affect the upper bits of the hash before you mod it, but the upper bits of x, y and z can't affect the lower bits of the hash. This is compounded by the fact that a modulus also gives disproportionate weight to the lower bits if its input as well. A quick band-aid is to add in some terms that right shift some of your values. Ex: add in (x >> 2), (y >> 4) or (z >> 9).
CRCs are great hash functions, if you can stomach the CPU time (which isn't that bad). If you don't mind using Boost, it has an implementation (lookup `boost::crc_32_type').
SiCrane is right. What I think is hapening

(P1 * x + P2 * y + P3 * z) % numBucketsbut, x ant y and z is multiples of (in this example) 100, so:x = a * 100, y = b * 100, z = c * 100(P1 * a * 100 + P2 * b * 100 + P3 * c * 100) % numBuckets((P1 * a + P2 * b + P3 * c) * 100) % numBucketsSo lets say numBuckets is 48, which has primes 2^4 * 3 as primesand 100 which has 5^2 * 2^2((P1 * a + P2 * b + P3 * c) * (5^2 * 2^2)) % (2^4 * 3)(((P1 * a + P2 * b + P3 * c) * 5^2) % (2^2 * 3)) * 2^2Which always has a multiple of 4!


Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.

I assume you have original coordinates in floating point, so you should be able to convert them like this:

int x = static_cast<int>(fx * grid_ratio);


where float grid_ratio = 1 / grid_size.
I'm getting better results with this:

return (((0xcb1ab31f*x) >> 16)^x + ((0xd8163841*y) >> 16)^y + ((0x6b4a8be7*z) >> 16)^z) % numBuckets;


Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.
You could also use 64-bit arithmetic, like this (gcc):
  return (x*860953607428367533ull          +y*6504631217750812297ull          +z*14401978177910355779ull)%numBuckets;




[Edited by - alvaro on March 16, 2010 1:09:04 PM]
Have you tried the FNV-1a hash?

I also agree with alvaro in that a good table-driven CRC implementation often works wonders.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Have you tried the FNV-1a [isthe.com] hash?

Yes I'm using it now in fact. It seems to be the best one so far

This topic is closed to new replies.

Advertisement