Back to General and Gameplay Programming

Looking for a good hash for 2 integers

General and Gameplay Programming Programming

Started by luca-deltodesco August 21, 2011 11:08 PM

3 comments, last by Hodgman 12 years, 7 months ago

luca-deltodesco

638

Author

August 21, 2011 11:08 PM

I have two integers (incremented from 0, so not uniformly distributed in general) which I want to combine into a single integer in a small range (0 -> 0xffff) to best avoid hash collisions.

I've tried different things like: ((~a)-b)&0xffff, (a*106039+b)&0xffff, etc. but I can't seem to get any better than an 80% collision rate, even if both integers are well below 0xffff

I do want this hash to be as lightweight as possible also as it's in a performance critical area

--- edit:

seems i made a big error in determining the collision rate, it's actually more like 1.3%

wqking

778

August 22, 2011 06:20 AM

I don't think it's practical.
Hash functions requires the interesting items count is rather smaller than all possible hash values.
So MD5 is still strong enough because it can hold 2^128 values.

What you are doing is to put 2^64 (or 2^32 values if A and B are in 0~0xffff) to 2^16 values, it's quite reasonable that you get a lot of conflicting.

You are 100% guaranteed to get conflicting to put N+1 elements to N values, no matter what hash algorithm you use.

So I suggest you change your underlying algorithm to avoid the need of the hash.

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

Adam_42

3,664

August 22, 2011 11:54 AM

Thanks to the Birthday Paradox, if I've done my maths right, on average with an ideal hash you'll get a collision once you have hashed about 300 values. That means your 1.3% isn't too bad.

If both values tend to be small you might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8)). That will mean that if a and b are both under 255 you'll never get a collision. Reversing the order of the bits in b instead of just rotating by 8 might work even better, but it'll be slower too.

luca-deltodesco

638

Author

August 22, 2011 12:53 PM

wqking, i can do without the hash, but using the hash has shown to be reliably faster.

I'm now using a hash: (a*106039+b)&0xffff

and i get about 0.3% to 0.5% of insertions resulting in a collision now. even with the id's growing towards 100K (id's above 100K would be very rare, and havn't tested beyond that) The test case has both id's growing with about 1000 objects in the hash at any time

Adam_42 your suggestion gives roughly an 8% collision rate.

Hodgman

52,717

August 22, 2011 02:58 PM

You could try FNV:

u16 a, b;

u32 data = a<<16 | b;



u8* in=(u8*)&data;

u32 h = 0x811c9dc5;

h = (h ^ in[0]) * 0x1000193;

h = (h ^ in[1]) * 0x1000193;

h = (h ^ in[2]) * 0x1000193;

h = (h ^ in[3]) * 0x1000193;

return h;

Or murmur:

u32 h1 = seed;

u32 k1 = data;

k *= 0xcc9e2d51;

k = _rotl(k,15);

k *= 0x1b873593;

h ^= k;

h = _rotl(h,13); 

h = h*5+0xe6546b64;

h ^= 4;

h ^= h >> 16;

h *= 0x85ebca6b;

h ^= h >> 13;

h *= 0xc2b2ae35;

h ^= h >> 16;

return h;

[hr]oops, i missed that you wanted a 16bit output; these are 32 bit hashes

You could still perform the above hashes and then fold them over themselves though:

u32 h = ....;

u16 final = (u16)( (h>>16)^(h&0xffff) );

. 22 Racing Series .

samoth

9,833

August 22, 2011 03:02 PM

might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8))

That's what I'd suggest too, though I would use different constants and three shifts (for example +11, -19, +8 or +6, -3, +17) . That's equivalent to Marsaglia's xorshift generator, which will pipeline to hash a and b in parallel on all modern CPUs, and will give a distribution as good as you can reasonably get with fewer than ten thousand instructions.

Of course you will inevitably still have collisions, but that's just something you can't help. Mapping 64 to 16 just doesn't work otherwise.

Looking for a good hash for 2 integers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Looking for a good hash for 2 integers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines