Vertex hashing

Started by
5 comments, last by swiftcoder 8 years, 4 months ago
I've been doing some digging on this and the following piece of code seems to be the baseline off of which everyone is building their hash:


const unsigned int * h = (const unsigned int *)(&v);
unsigned int f = (h[0] + h[1]*11 - (h[2]*17))&0x7fffffff;     // avoid problems with +-0
return (f>>22)^(f>>12)^(f);
In short, it doesn't work and frankly I can't quite understand how it's supposed to work.

I mean, it is almost 5 AM, so I might be missing something grossly obvious, but even if the vertices are normalized, f can overflow (which would be fine, but has nothing to do with the sign bit). Which is to say that I don't even know how anding the sign bit of the hash would somehow filter out only +/- zero.

Here's my naive test that explicitly handles +/- zero. I've removed the bit mangling on the last line and changed one of the primes.

__int64 vtxhash(IN const math::vec3& v)
{
    const unsigned int* h = (const unsigned int*)(&v);
    __int64 h0 = ((h[0] & 0x7fffffff) == 0) ? 0 : h[0];
    __int64 h1 = ((h[1] & 0x7fffffff) == 0) ? 0 : h[1];
    __int64 h2 = ((h[2] & 0x7fffffff) == 0) ? 0 : h[2];
    return f = (h0 + h1 * 23 - (h2 * 17));
}
Anyone dealt with this before?

The reason I'm asking is mostly for peer review and because I'd like to make heads and tails of the reference snippet.

For reference - I'm not really dealing with millions of vertices, although no collisions would be a nice thing to have.
Advertisement

I mean, it is almost 5 AM, so I might be missing something grossly obvious, but even if the vertices are normalized, f can overflow (which would be fine, but has nothing to do with the sign bit). Which is to say that I don't even know how anding the sign bit of the hash would somehow filter out only +/- zero.

I don't think you are missing anything. It doesn't seem to work...
#include <iostream>

int hash(float v[3]) {
    const unsigned int * h = (const unsigned int *)(&v);
    unsigned int f = (h[0] + h[1]*11 - (h[2]*17))&0x7fffffff;     // avoid problems with +-0
    return (f>>22)^(f>>12)^(f);
}

int main() {
    float v1[] = {-0, 0, -0};
    float v2[] = {0, -0, -0};
    std::cout << hash(v1) << std::endl;
    std::cout << hash(v2) << std::endl;
}
Outputs 2004002401 and 2004002389, kind of disproving the +/- 0 ignoring thing.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

It seems to work fine for me when I pass in a pointer to an array of 3 floats.

It doesnt work if I make the function take an array (it just produces undeterministic garbage), but I dont at the moment see why (Is it taking the address of a pointer? Never really use arrays this way ph34r.png )

edit: yup, the "&" needs to go...

o3o

edit: yup, the "&" needs to go...

Oh goddamn it. Yes, don't know how I missed that.

So, thinking about this more objectively, neither multiplying by a constant or adding is going to affect the bit pattern of -0.0 at all (unsigned integer wrapping will cause the same bit to be set after multiplication). This should mean that its contribution can only be to the value of that exact same bit, and the bit ensures that it has no effect on the final value.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This should mean that its contribution can only be to the value of that exact same bit, and the bit ensures that it has no effect on the final value.

and it does, I think.

-0.0 == 0x80000000 as bit pattern. Also, sizeof(unsigned int) == 4 and sizeof(float) == 4. Then


unsigned int f = (h[0] + h[1]*11 - (h[2]*17)) & 0x7fffffff; // avoid problems with +-0

is equivalent to


unsigned int f = (h[0] + h[1] + h[1]*2 + h[1]*8 - h[2] - h[2]*16) & 0x7fffffff;

As you can see all 3 values get added/subtracted into f without multiplication factor one time. (With multiplication factor, bit 31 gets shifted out the 32 bit integer, and has no influence, as swiftcoder already said.) The highest bits of h[0], h[1], and h[2] thus end up in the unmasked term (eg inputs (-0.0, +0.0, +0.0), (+0.0, -0.0, +0.0), or (+0.0, +0.0, -0.0)). Masking the high bit solves that. However, it looks to me that you are effectively mapping a large (or the entire?) part of the negative space onto the positive space with this masking. Hashing results might be better if you only explicitly take out the -0.0 values instead.


... it looks to me that you are effectively mapping a large (or the entire?) part of the negative space onto the positive space with this masking

Indeed, this is what I'm getting from it. That effectively makes the algorithm moot unless you temporarily translate your entire data set into the positive range. Which might not be the worst idea if you're not limited by precision. I still feel like my take on it is more robust and hardly any more expensive.

The bit mangling on the last line flies past me as well - I'm not really sure how it enhances the stability of the hash. I feel more confident using larger primes and mapping the hash to a 64 bit range instead.

It all depends on what you need out of a hash function. Even with positive and negative numbers colliding, it's most likely good enough™ for most purposes.


Also, a 64 bit hash for a 96 bit vector seems a bit extreme for most uses.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement