Back to General and Gameplay Programming

Need a good hash function

General and Gameplay Programming Programming

Started by taz0010 March 16, 2010 08:59 AM

7 comments, last by taz0010 14 years, 1 month ago

taz0010

277

Author

March 16, 2010 08:59 AM

I want to map sets of 3 integers into the range 0-n with as few collisions as possible. The function I'm currently using is:


return (0x60000005*x + 0xd8163841*y +0x6b4a8be7*z) % numBuckets;

The constants are prime numbers, which is what the Realtime Collision Detection textbook suggests. Now if I hash every permutation of x,y,z in the range 0-1000 (so 1 billion different combinations), I end up with a very even distribution. But the values of x,y,z typically come in multiples of 10 or 100. e.g. {700,200,-800}, etc. This hash function exhibits really awful bias in this case. When the multiples of 100 up to 1000 are chosen, and hashed to the range 0-47, only multiples of 4 are chosen! Although the distribution among the buckets which were chosen is very even. How can I go about creating a hash function that produces even distributions, especially in the case where the 3 input keys are multiples of a power of ten? I'd also like to avoid the use of modulo if it's at all possible.

DevFred

840

March 16, 2010 09:25 AM

Well, have you tried different prime numbers? These are the ones I like:

return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) % numBuckets;

You can get rid of the modulo if the number of buckets is a power of two, in that case:

return (0x84ed5f09*x + 0x8dc6152f*y + 0xca4e8c2f*z) & (numBuckets-1);

SiCrane

11,840

March 16, 2010 10:04 AM

The problem with your algorithm is that it gives a disproportionate amount of weight to the lower bits of your inputs. To see what I mean, consider this question: if you have two numbers a and b and a is even, is there any value for b that when you multiply a and b you get an odd number? This is what's going on in your function. The lower bits of x, y and z can affect the upper bits of the hash before you mod it, but the upper bits of x, y and z can't affect the lower bits of the hash. This is compounded by the fact that a modulus also gives disproportionate weight to the lower bits if its input as well. A quick band-aid is to add in some terms that right shift some of your values. Ex: add in (x >> 2), (y >> 4) or (z >> 9).

alvaro

21,604

March 16, 2010 10:23 AM

CRCs are great hash functions, if you can stomach the CPU time (which isn't that bad). If you don't mind using Boost, it has an implementation (lookup `boost::crc_32_type').

mzeo77

168

March 16, 2010 10:55 AM

SiCrane is right. What I think is hapening

(P1 * x + P2 * y + P3 * z) % numBucketsbut, x ant y and z is multiples of (in this example) 100, so:x = a * 100, y = b * 100, z = c * 100(P1 * a * 100 + P2 * b * 100 + P3 * c * 100) % numBuckets((P1 * a + P2 * b + P3 * c) * 100) % numBucketsSo lets say numBuckets is 48, which has primes 2^4 * 3 as primesand 100 which has 5^2 * 2^2((P1 * a + P2 * b + P3 * c) * (5^2 * 2^2)) % (2^4 * 3)(((P1 * a + P2 * b + P3 * c) * 5^2) % (2^2 * 3)) * 2^2Which always has a multiple of 4!

Is the multiple fixed for each hash map? If possible you should generate x, y and z coordinates for being multiple of 1 instead.

I assume you have original coordinates in floating point, so you should be able to convert them like this:

int x = static_cast<int>(fx * grid_ratio);

where float grid_ratio = 1 / grid_size.

taz0010

277

Author

March 16, 2010 11:06 AM

I'm getting better results with this:

return (((0xcb1ab31f*x) >> 16)^x + ((0xd8163841*y) >> 16)^y + ((0x6b4a8be7*z) >> 16)^z) % numBuckets;

Also making the bucket size a prime number results in a much more even distribution in most cases. Although patterns still sometimes occur.

alvaro

21,604

March 16, 2010 12:09 PM

You could also use 64-bit arithmetic, like this (gcc):

  return (x*860953607428367533ull          +y*6504631217750812297ull          +z*14401978177910355779ull)%numBuckets;

[Edited by - alvaro on March 16, 2010 1:09:04 PM]

iMalc

2,466

March 16, 2010 01:02 PM

Have you tried the FNV-1a hash?

I also agree with alvaro in that a good table-driven CRC implementation often works wonders.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

taz0010

277

Author

March 17, 2010 05:48 AM

Quote:Have you tried the FNV-1a [isthe.com] hash?

Yes I'm using it now in fact. It seems to be the best one so far

Need a good hash function

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Need a good hash function

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines