return (0x60000005*x + 0xd8163841*y +0x6b4a8be7*z) % numBuckets;
The constants are prime numbers, which is what the Realtime Collision Detection textbook suggests.
Now if I hash every permutation of x,y,z in the range 0-1000 (so 1 billion different combinations), I end up with a very even distribution. But the values of x,y,z typically come in multiples of 10 or 100. e.g. {700,200,-800}, etc. This hash function exhibits really awful bias in this case. When the multiples of 100 up to 1000 are chosen, and hashed to the range 0-47, only multiples of 4 are chosen! Although the distribution among the buckets which were chosen is very even.
How can I go about creating a hash function that produces even distributions, especially in the case where the 3 input keys are multiples of a power of ten? I'd also like to avoid the use of modulo if it's at all possible.
Need a good hash function
I want to map sets of 3 integers into the range 0-n with as few collisions as possible. The function I'm currently using is:
Well, have you tried different prime numbers? These are the ones I like:
You can get rid of the modulo if the number of buckets is a power of two, in that case:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) % numBuckets;
You can get rid of the modulo if the number of buckets is a power of two, in that case:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) & (numBuckets-1);
The problem with your algorithm is that it gives a disproportionate amount of weight to the lower bits of your inputs. To see what I mean, consider this question: if you have two numbers a and b and a is even, is there any value for b that when you multiply a and b you get an odd number? This is what's going on in your function. The lower bits of x, y and z can affect the upper bits of the hash before you mod it, but the upper bits of x, y and z can't affect the lower bits of the hash. This is compounded by the fact that a modulus also gives disproportionate weight to the lower bits if its input as well. A quick band-aid is to add in some terms that right shift some of your values. Ex: add in (x >> 2), (y >> 4) or (z >> 9).
CRCs are great hash functions, if you can stomach the CPU time (which isn't that bad). If you don't mind using Boost, it has an implementation (lookup `boost::crc_32_type').
SiCrane is right. What I think is hapening
Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.
I assume you have original coordinates in floating point, so you should be able to convert them like this:
where float grid_ratio = 1 / grid_size.
(P1 * x + P2 * y + P3 * z) % numBucketsbut, x ant y and z is multiples of (in this example) 100, so:x = a * 100, y = b * 100, z = c * 100(P1 * a * 100 + P2 * b * 100 + P3 * c * 100) % numBuckets((P1 * a + P2 * b + P3 * c) * 100) % numBucketsSo lets say numBuckets is 48, which has primes 2^4 * 3 as primesand 100 which has 5^2 * 2^2((P1 * a + P2 * b + P3 * c) * (5^2 * 2^2)) % (2^4 * 3)(((P1 * a + P2 * b + P3 * c) * 5^2) % (2^2 * 3)) * 2^2Which always has a multiple of 4!
Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.
I assume you have original coordinates in floating point, so you should be able to convert them like this:
int x = static_cast<int>(fx * grid_ratio);
where float grid_ratio = 1 / grid_size.
I'm getting better results with this:
Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.
return (((0xcb1ab31f*x) >> 16)^x + ((0xd8163841*y) >> 16)^y + ((0x6b4a8be7*z) >> 16)^z) % numBuckets;
Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.
You could also use 64-bit arithmetic, like this (gcc):
[Edited by - alvaro on March 16, 2010 1:09:04 PM]
return (x*860953607428367533ull +y*6504631217750812297ull +z*14401978177910355779ull)%numBuckets;
[Edited by - alvaro on March 16, 2010 1:09:04 PM]
Have you tried the FNV-1a hash?
I also agree with alvaro in that a good table-driven CRC implementation often works wonders.
I also agree with alvaro in that a good table-driven CRC implementation often works wonders.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement