Sign in to follow this  
Prune

Good hash function for integer triples? (.OBJ reindexing)

Recommended Posts

Prune    224
OBJ files use separate index arrays for separate vertex attributes. However, the majority of programs export OBJ files with primarily unified vertices where an occurence of index specifying vertex/texture_coord/normal of 14/98/6 would usually appear as 14/98/6 and only would in some instances one of the indexes be different, so 14/98/20 would be a rare thing (occurring in such cases as where there is a crease so the normals don't match). So it makes sense to reindex the array for VBO rendering with a single element buffer and clone the occasional vertices that have some different attributes. So obviously I need an unordered_map<tuple<size_t, size_t, size_t>, size_t> Since this is not a standard key type, the compiler insists I create my own hash function. I have no idea how to choose an appropriate one, especially one that would work best with the information that can be assumed a priori, which is that in almost all cases index numbers of the .obj will densely fill a range [1..N]--they're not sparsely dispersed among all possible 32-bit ints, for example. Since usually the range would be a fraction of the possible integer size, I could maybe do some shifting since some of the bits at least won't overlap... [Edited by - Prune on May 1, 2010 8:43:23 PM]

Share this post


Link to post
Share on other sites
Zahlman    1682
Quote:
Original post by Prune
one that would work best with the information that can be assumed a priori, which is that in almost all cases index numbers of the .obj will densely fill a range [1..N]


Do you know, or at least have a good estimate for, N ahead of time?

Share this post


Link to post
Share on other sites
Prune    224
In most cases, would be between on the order of ten and a million, and my best guess is a bell-curve distribution.

I looked at tuple_hash() in http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-006Spring-2008/2373DA3E-7A4F-48A0-9201-FDA9ADC702F2/0/recitation05.pdf which is for Python. I'm not sure about the -1 checks so I just omitted them in my adaptation of this code as size_t is unsigned, to get

typedef tuple<size_t, size_t, size_t> Index3;

struct Index3Hash : public unary_function<Index3, size_t >
{
public:
size_t operator()(Index3 const &p) const;
private:
static hash<size_t> hst;
};

size_t Index3Hash::operator()(Index3 const &p) const
{
size_t m(1000003), x(0x345678), y(hst(get<0>(p)));
x = (x ^ y) * m;
m += 82524;
y = hst(get<1>(p));
x = (x ^ y) * m;
m += 82522;
y = hst(get<2>(p));
x = (x ^ y) * m;
return x + 97531;
}

hash<size_t> Index3Hash::hst;


I think this is thread-safe even though hst is static since std::hash doesn't seem to have any data members or its operator() any static locals.

Well I guess another way is to put the tuple in a union with a string and pass the key as a string, but I don't know how that would compare to the above, given that digits are a very limited subset of string element (char).

And then there's the monster hash at http://burtleburtle.net/bob/hash/evahash.html
Maybe I can just use the mix()

Share this post


Link to post
Share on other sites
Prune    224
OK I'm really confused...

This compiles:

typedef unordered_map<Index3, size_t, Index3Hash> IndexMap;
IndexMap test;
IndexMap::value_type val(Index3(1, 2, 3), 1);
test.insert(val);

This doesn't:

typedef unordered_map<Index3, size_t, Index3Hash> IndexMap;
IndexMap test;
test.insert(IndexMap::value_type(Index3(1, 2, 3), 1));

There should be no difference!

On the second, I get:

error C2664: 'std::pair<_Ty1,_Ty2>::pair<const _Kty,_Ty>(std::pair<_Ty1,_Ty2> &)' : cannot convert parameter 1 from 'std::pair<_Ty1,_Ty2> ' to 'std::pair<_Ty1,_Ty2> &'
1> with
1> [
1> _Ty1=const Index3,
1> _Ty2=size_t,
1> _Kty=Index3,
1> _Ty=size_t
1> ]
1> and
1> [
1> _Ty1=const Index3,
1> _Ty2=size_t
1> ]
1> and
1> [
1> _Ty1=const Index3,
1> _Ty2=size_t
1> ]

Share this post


Link to post
Share on other sites
Prune    224
I can't even assign... makes me create new variable every time I want to insert:
this won't compile:
IndexMap::value_type val(make_tuple(1, 2, 3), 1);
test.insert(val);
val = IndexMap::value_type(Index3(1, 2, 3), 1);
test.insert(val);

The assignment fails, even though for tuples operator=() is defined...

This works test.insert(make_pair(make_tuple(1, 2, 3), 1)); but I don't see why the previous should not work, unless MSVC is buggy

Share this post


Link to post
Share on other sites
socky    135
The insert method takes a reference as parameter, so you need to pass an L-value (things that you can have on the left side of the assignment operator)

Correct me if I'm wrong but I guess you will also need to keep the inserted references around (or create them on the heap) as they will be destroyed as soon as they go out of scope....

[edit]
ok, now I'm confused too.. I tested your two insert methods and both work on g++ compiler.

[Edited by - socky on May 2, 2010 2:52:24 AM]

Share this post


Link to post
Share on other sites
Prune    224
On MSVC at least it has an overload with r-values &&
If I define val in an inner scope { ... val(...); ....insert(val); } then do a ...find(...) it still works even though val doesn't exist anymore, so probably like other STL containers in general it's doing a copy or move.
So I still think MSVC's implementation of tuple may be buggy.

BTW, any way to test how well a given hash works? I don't know how to get collision statistics from unordered_map.

insert() has another version that supplies a starting iterator, and MSDN says:
"The second member function returns insert(val).first, using where as a starting place within the controlled sequence to search for the insertion point. (Insertion can possibly occur somewhat faster, if the insertion point immediately precedes or follows where.)"
I don't see how that is useful for a hash table-based map, since a good hashing function would normally produce more or less randomly distributed values and the chance of the next insertion being right by the previous one is very small.

[Edited by - Prune on May 2, 2010 7:41:59 PM]

Share this post


Link to post
Share on other sites
Prune    224
As for the reindexing, given with the above declarations for Index3, I came up with


_v.reserve(v.size());
_n.reserve(n.size());
_t.reserve(t.size());
_f.reserve(f.size());
typedef unordered_map<Index3, size_t, Index3Hash> IndexMap;
IndexMap iMap;
int ind(0);
for (std::vector<Index3>::iterator itf = f.begin(); itf < f.end(); ++itf)
{
pair<IndexMap::iterator, bool> itm(iMap.insert(make_pair(*itf, ind)));
if (itm.second)
{
_v.push_back(v[get<0>(*itf)]);
_n.push_back(n[get<2>(*itf)]);
_t.push_back(t[get<1>(*itf)]);
_f.push_back(ind);
++ind;
}
else _f.push_back(itm.first->second);
}


where v, n, t, and f hold the as-read vertex positions, normals, texture coordinates, and face index triplets of triplets. My vectors are of types vector3f, vector3f, vector2f (see cml math library) and vector<Index3> so the latter is x3 the number of f lines in the OBJ file.

It seems to work when I dump the corresponding arrays into attribute and element VBOs. In several Maya-exported OBJs I tried, most (but not all) triplets were unique, so the "else" path was taken a small fraction of the time.

I figure there must be a way to do this in-place instead of using extra temporary vectors, but at first look I didn't think of an efficient approach...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this