Sign in to follow this  
luca-deltodesco

Looking for a good hash for 2 integers

Recommended Posts

I have two integers (incremented from 0, so not uniformly distributed in general) which I want to combine into a single integer in a small range (0 -> 0xffff) to best avoid hash collisions.

I've tried different things like: ((~a)-b)&0xffff, (a*106039+b)&0xffff, etc. but I can't seem to get any better than an 80% collision rate, even if both integers are well below 0xffff

I do want this hash to be as lightweight as possible also as it's in a performance critical area

--- edit:

seems i made a big error in determining the collision rate, it's actually more like 1.3%

Share this post


Link to post
Share on other sites
I don't think it's practical.
Hash functions requires the interesting items count is rather smaller than all possible hash values.
So MD5 is still strong enough because it can hold 2^128 values.

What you are doing is to put 2^64 (or 2^32 values if A and B are in 0~0xffff) to 2^16 values, it's quite reasonable that you get a lot of conflicting.

You are 100% guaranteed to get conflicting to put N+1 elements to N values, no matter what hash algorithm you use.

So I suggest you change your underlying algorithm to avoid the need of the hash.

Share this post


Link to post
Share on other sites
Thanks to the [url="http://en.wikipedia.org/wiki/Birthday_problem"]Birthday Paradox[/url], if I've done my maths right, on average with an ideal hash you'll get a collision once you have hashed about 300 values. That means your 1.3% isn't too bad.

If both values tend to be small you might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8)). That will mean that if a and b are both under 255 you'll never get a collision. Reversing the order of the bits in b instead of just rotating by 8 might work even better, but it'll be slower too.

Share this post


Link to post
Share on other sites
wqking, i can do without the hash, but using the hash has shown to be reliably faster.

I'm now using a hash: (a*106039+b)&0xffff

and i get about 0.3% to 0.5% of insertions resulting in a collision now. even with the id's growing towards 100K (id's above 100K would be very rare, and havn't tested beyond that) The test case has both id's growing with about 1000 objects in the hash at any time

Adam_42 your suggestion gives roughly an 8% collision rate.

Share this post


Link to post
Share on other sites
You could try FNV:[code]u16 a, b;
u32 data = a<<16 | b;

u8* in=(u8*)&data;
u32 h = 0x811c9dc5;
h = (h ^ in[0]) * 0x1000193;
h = (h ^ in[1]) * 0x1000193;
h = (h ^ in[2]) * 0x1000193;
h = (h ^ in[3]) * 0x1000193;
return h;[/code]
Or murmur:[code]u32 h1 = seed;
u32 k1 = data;
k *= 0xcc9e2d51;
k = _rotl(k,15);
k *= 0x1b873593;
h ^= k;
h = _rotl(h,13);
h = h*5+0xe6546b64;
h ^= 4;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;[/code][hr]oops, i missed that you wanted a 16bit output; these are 32 bit hashes [img]http://public.gamedev.net/public/style_emoticons/default/unsure.gif[/img]
You could still perform the above hashes and then fold them over themselves though:[code]u32 h = ....;
u16 final = (u16)( (h>>16)^(h&0xffff) );[/code]

Share this post


Link to post
Share on other sites
[quote name='Adam_42' timestamp='1314014072' post='4852239']might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8))[/quote]
That's what I'd suggest too, though I would use different constants and [i]three [/i]shifts (for example +11, -19, +8 or +6, -3, +17) . That's equivalent to Marsaglia's xorshift generator, which will pipeline to hash a and b in parallel on all modern CPUs, and will give a distribution as good as you can reasonably get with fewer than ten thousand instructions.

Of course you will inevitably still have collisions, but that's just something you can't help. Mapping 64 to 16 just doesn't work otherwise.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this