[C++] resolving collisions with hash_map

Started by
21 comments, last by Sneftel 14 years, 11 months ago
This is my first time working with stdext::hash_map, so please be gentle. :) Basically, I'm writing some code to remove duplicate vertices from a vertex buffer. Before adding a vertex to the buffer, I first check to see if it exists in the hash_map or not:
int MyIndexer::locateVertex(const MyVec &vec)
{
	// locate the vertex in our hash map
	stdext::hash_map<MyVec, int>::iterator iter = mVertexMap.find(vec);

	// if iterator points to end of map, then the vertex was not found
	if (iter == mVertexMap.end())
		return INDEX_NOT_FOUND;

	// check if returned index matches the data we are looking for
	if (mVtxBuf[iter->second] != vec)
	{
		LOG("hash map collision");

		// ...add resolve code here...
	}

	// return index to vertex
	return iter->second;
}
The hash function I'm using to process the actual vertex data uses stdext::_Hash_value() -- the same function used for hashing strings. It seems to do quite a good job, considering one data set I'm dealing with which has 5.9 million vertices only came up with 137 collisions. Now, to my actual question: how do I actually resolve a collision? I thought that perhaps the iterator returned by find() would give me a set of pairs that all had the same key, but that didn't seem to be it. At worst case, I could simply do a linear search through the entire vertex buffer from start to end -- and with that only happening 137 times, it doesn't seem like it would be too bad -- but I'm sure there has to a proper way of doing it. I've scoured Google, read through the (poor) MSDN documentation, and searched GameDev's forums as much as I could, but I simply cannot figure out what to do from here. Any help would be greatly appreciated!
Advertisement
find() gives you iterator to exactly your element (or end() if none is found). Collisions are already resolved inside find.
hash_map requires you to define not only hashing operator, but also == operator. That's why find() can find exactly your element - first with hashing, and then with == operator to resolve collisions.
If what you say is true, then the if() check on the vertex data should never fail -- but it does, and that's how I managed to count (what I thought was) the number of collisions.

As a test, I stuck a breakpoint on MyVec::operator== before calling find() and stepped over it in the debugger, but my operator==() function seemed to never be called from hash_map::find().

So... am I doing something completely wrong here?
I added some more debug info to my function, and now I'm absolutely sure hash_map::find() is not returning an iterator to the proper data on a collision:
	// check if returned index matches the data we are looking for	const MyVec &vbufvec = mVtxBuf[iter->second];	if (vbufvec != vec)	{		LOG("hash map collision: {%.2f,%.2f,%.2f [%08Xh]} / {%.2f,%.2f,%.2f [%08Xh]}", vbufvec.x, vbufvec.y, vbufvec.z, (size_t)(vbufvec), vec.x, vec.y, vec.z, (size_t)(vec));	}


This is what I'm getting from the logger:
[MyIndexer::locateVertex]: hash map collision: {4784.40,0.00,-3639.60 [D9CFAF33h]} / {4720.20,0.00,-3102.60 [D9CFAF33h]}


The vertex data is different, but the hash value for them is exactly the same -- this is definitely a collision.

I'm using MSVC2005's implementation. Is there a bug here, perhaps?
Sorry, my mistake. It needs < operator, not == operator.
http://msdn.microsoft.com/en-us/library/1s1byw77.aspx

Do you have correctly implemented < operator for MyVec type?
Quote:
Basically, I'm writing some code to remove duplicate vertices from a vertex buffer. Before adding a vertex to the buffer, I first check to see if it exists in the hash_map or not:

There is a flaw in this that you may not have taken into consideration, which is floating point comparison. The compare operation is probably done using something like
if(!(a < b) && !(b <a)) found
but floating point comparison is not exact, in addition it is considered bad practice for a type to have a comparison/less than operation just for the sake of a container. Is there a meaningful less than operation for a vector?
Quote:Original post by dmail
in addition it is considered bad practice for a type to have a comparison/less than operation just for the sake of a container.
Maybe. If he cares about that, he can define his own Traits type, but for purposes of discussion, operator< is probably clearest.
Quote:Is there a meaningful less than operation for a vector?
If you're using it simply to define a partial order over values, yes. It looks like:
if(a.x < b.x) return true;if(a.x > b.x) return false;if(a.y < b.y) return true;if(a.y > b.y) return false;if(a.z < b.z) return true;if(a.z > b.z) return false;return false;
Quote:Original post by Sneftel
Quote:Original post by dmail
in addition it is considered bad practice for a type to have a comparison/less than operation just for the sake of a container.
Maybe. If he cares about that, he can define his own Traits type, but for purposes of discussion, operator< is probably clearest.
Quote:Is there a meaningful less than operation for a vector?
If you're using it simply to define a partial order over values, yes. It looks like:
if(a.x < b.x) return true;if(a.x > b.x) return false;if(a.y < b.y) return true;if(a.y > b.y) return false;if(a.z < b.z) return true;if(a.z > b.z) return false;return false;


Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
There is nothing wrong with comparing floating-point numbers, except perhaps if you run into NaNs.

Maybe the OP can try to prune the program down to a 10 or 20 line version that reproduce the problem, so we can be sure we have seen all the code we need...
Quote:Original post by dmail
Quote:Original post by Sneftel
if(a.x < b.x) return true;if(a.x > b.x) return false;if(a.y < b.y) return true;if(a.y > b.y) return false;if(a.z < b.z) return true;if(a.z > b.z) return false;return false;


Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
The code that Sneftel posted is fine. What is required in this case is a strict weak ordering, which is exactly what the code implements.

This topic is closed to new replies.

Advertisement