# Need a good hash function

This topic is 3049 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I want to map sets of 3 integers into the range 0-n with as few collisions as possible. The function I'm currently using is:
return (0x60000005*x + 0xd8163841*y +0x6b4a8be7*z) % numBuckets;

The constants are prime numbers, which is what the Realtime Collision Detection textbook suggests. Now if I hash every permutation of x,y,z in the range 0-1000 (so 1 billion different combinations), I end up with a very even distribution. But the values of x,y,z typically come in multiples of 10 or 100. e.g. {700,200,-800}, etc. This hash function exhibits really awful bias in this case. When the multiples of 100 up to 1000 are chosen, and hashed to the range 0-47, only multiples of 4 are chosen! Although the distribution among the buckets which were chosen is very even. How can I go about creating a hash function that produces even distributions, especially in the case where the 3 input keys are multiples of a power of ten? I'd also like to avoid the use of modulo if it's at all possible.

##### Share on other sites
Well, have you tried different prime numbers? These are the ones I like:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) % numBuckets;

You can get rid of the modulo if the number of buckets is a power of two, in that case:
return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) & (numBuckets-1);

##### Share on other sites
The problem with your algorithm is that it gives a disproportionate amount of weight to the lower bits of your inputs. To see what I mean, consider this question: if you have two numbers a and b and a is even, is there any value for b that when you multiply a and b you get an odd number? This is what's going on in your function. The lower bits of x, y and z can affect the upper bits of the hash before you mod it, but the upper bits of x, y and z can't affect the lower bits of the hash. This is compounded by the fact that a modulus also gives disproportionate weight to the lower bits if its input as well. A quick band-aid is to add in some terms that right shift some of your values. Ex: add in (x >> 2), (y >> 4) or (z >> 9).

##### Share on other sites
CRCs are great hash functions, if you can stomach the CPU time (which isn't that bad). If you don't mind using Boost, it has an implementation (lookup boost::crc_32_type').

##### Share on other sites
SiCrane is right. What I think is hapening

(P1 * x + P2 * y + P3 * z) % numBucketsbut, x ant y and z is multiples of (in this example) 100, so:x = a * 100, y = b * 100, z = c * 100(P1 * a * 100 + P2 * b * 100 + P3 * c * 100) % numBuckets((P1 * a + P2 * b + P3 * c) * 100) % numBucketsSo lets say numBuckets is 48, which has primes 2^4 * 3 as primesand 100 which has 5^2 * 2^2((P1 * a + P2 * b + P3 * c) * (5^2 * 2^2)) % (2^4 * 3)(((P1 * a + P2 * b + P3 * c) * 5^2) % (2^2 * 3)) * 2^2Which always has a multiple of 4!

Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.

I assume you have original coordinates in floating point, so you should be able to convert them like this:

int x = static_cast<int>(fx * grid_ratio);

where float grid_ratio = 1 / grid_size.

##### Share on other sites
I'm getting better results with this:

return (((0xcb1ab31f*x) >> 16)^x + ((0xd8163841*y) >> 16)^y + ((0x6b4a8be7*z) >> 16)^z) % numBuckets;

Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.

##### Share on other sites
You could also use 64-bit arithmetic, like this (gcc):
  return (x*860953607428367533ull          +y*6504631217750812297ull          +z*14401978177910355779ull)%numBuckets;`

[Edited by - alvaro on March 16, 2010 1:09:04 PM]

##### Share on other sites
Have you tried the FNV-1a hash?

I also agree with alvaro in that a good table-driven CRC implementation often works wonders.

##### Share on other sites
Quote:
 Have you tried the FNV-1a [isthe.com] hash?

Yes I'm using it now in fact. It seems to be the best one so far

1. 1
2. 2
Rutin
22
3. 3
4. 4
JoeJ
15
5. 5

• 14
• 30
• 13
• 11
• 11
• ### Forum Statistics

• Total Topics
631776
• Total Posts
3002300
×