# Looking for a good hash for 2 integers

This topic is 2674 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have two integers (incremented from 0, so not uniformly distributed in general) which I want to combine into a single integer in a small range (0 -> 0xffff) to best avoid hash collisions.

I've tried different things like: ((~a)-b)&0xffff, (a*106039+b)&0xffff, etc. but I can't seem to get any better than an 80% collision rate, even if both integers are well below 0xffff

I do want this hash to be as lightweight as possible also as it's in a performance critical area

--- edit:

seems i made a big error in determining the collision rate, it's actually more like 1.3%

##### Share on other sites
I don't think it's practical.
Hash functions requires the interesting items count is rather smaller than all possible hash values.
So MD5 is still strong enough because it can hold 2^128 values.

What you are doing is to put 2^64 (or 2^32 values if A and B are in 0~0xffff) to 2^16 values, it's quite reasonable that you get a lot of conflicting.

You are 100% guaranteed to get conflicting to put N+1 elements to N values, no matter what hash algorithm you use.

So I suggest you change your underlying algorithm to avoid the need of the hash.

##### Share on other sites
Thanks to the Birthday Paradox, if I've done my maths right, on average with an ideal hash you'll get a collision once you have hashed about 300 values. That means your 1.3% isn't too bad.

If both values tend to be small you might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8)). That will mean that if a and b are both under 255 you'll never get a collision. Reversing the order of the bits in b instead of just rotating by 8 might work even better, but it'll be slower too.

##### Share on other sites
wqking, i can do without the hash, but using the hash has shown to be reliably faster.

I'm now using a hash: (a*106039+b)&0xffff

and i get about 0.3% to 0.5% of insertions resulting in a collision now. even with the id's growing towards 100K (id's above 100K would be very rare, and havn't tested beyond that) The test case has both id's growing with about 1000 objects in the hash at any time

##### Share on other sites
You could try FNV:u16 a, b; u32 data = a<<16 | b; u8* in=(u8*)&data; u32 h = 0x811c9dc5; h = (h ^ in[0]) * 0x1000193; h = (h ^ in[1]) * 0x1000193; h = (h ^ in[2]) * 0x1000193; h = (h ^ in[3]) * 0x1000193; return h;
Or murmur:u32 h1 = seed; u32 k1 = data; k *= 0xcc9e2d51; k = _rotl(k,15); k *= 0x1b873593; h ^= k; h = _rotl(h,13); h = h*5+0xe6546b64; h ^= 4; h ^= h >> 16; h *= 0x85ebca6b; h ^= h >> 13; h *= 0xc2b2ae35; h ^= h >> 16; return h;[hr]oops, i missed that you wanted a 16bit output; these are 32 bit hashes
You could still perform the above hashes and then fold them over themselves though:u32 h = ....; u16 final = (u16)( (h>>16)^(h&0xffff) );

##### Share on other sites
might do better with a function that looks something like: a ^ ((b << 8) | (b >> 8))

That's what I'd suggest too, though I would use different constants and three shifts (for example +11, -19, +8 or +6, -3, +17) . That's equivalent to Marsaglia's xorshift generator, which will pipeline to hash a and b in parallel on all modern CPUs, and will give a distribution as good as you can reasonably get with fewer than ten thousand instructions.

Of course you will inevitably still have collisions, but that's just something you can't help. Mapping 64 to 16 just doesn't work otherwise.

1. 1
2. 2
Rutin
19
3. 3
khawk
15
4. 4
5. 5
A4L
13

• 13
• 26
• 10
• 11
• 44
• ### Forum Statistics

• Total Topics
633744
• Total Posts
3013654
×