[C++] resolving collisions with hash_map

Started by
21 comments, last by Sneftel 14 years, 11 months ago
Quote:Original post by jyk
Quote:Original post by dmail
Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
The code that Sneftel posted is fine. What is required in this case is a strict weak ordering, which is exactly what the code implements.
Unfortunately you can't rely on anything like that when dealing with floating point numbers.
The classic counterexample is the x87 FPU which internally performs calculations on 80-bit registers, and typically only stores truncated 32-bit floats or 64-bit doubles in memory. This means that a < b and b < a can both be true, if the optimizer decides to spill the a or b register to memory between the comparisons.
And then there's NaNs of course, which always compare false. Not that you're likely to encounter any in this example but you can design some pretty nifty exploits by forcing one into, say, an algorithm which uses sentinels.
Advertisement
Quote:Original post by jyk
Quote:Original post by dmail
Quote:Original post by Sneftel
if(a.x < b.x) return true;if(a.x > b.x) return false;if(a.y < b.y) return true;if(a.y > b.y) return false;if(a.z < b.z) return true;if(a.z > b.z) return false;return false;


Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
The code that Sneftel posted is fine. What is required in this case is a strict weak ordering, which is exactly what the code implements.


Am I missing something here?
The hash map when finding/looking for the hash will compare the values to see if they are equal when there is collision in the hash. For two values if they are the same requires them to be equal, this maybe done using something like the logic I posted earlier using just the less than operator.

Quote:msdn
For a hash_map m, if two elements e1(k1, d1) and e2(k2, d2) are objects of type value_type, where k1 and k2 are their keys of type key_type and d1 and d2 are their data of type mapped_type, then m.value_comp( )(e1, e2) is equivalent to m.key_comp( ) (k1, k2). A stored object defines the member function

bool operator(value_type& _Left, value_type& _Right);

which returns true if the key value of _Left precedes and is not equal to the key value of _Right in the sort order.

Quote:Original post by implicit
Quote:Original post by jyk
Quote:Original post by dmail
Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
The code that Sneftel posted is fine. What is required in this case is a strict weak ordering, which is exactly what the code implements.
Unfortunately you can't rely on anything like that when dealing with floating point numbers.
The classic counterexample is the x87 FPU which internally performs calculations on 80-bit registers, and typically only stores truncated 32-bit floats or 64-bit doubles in memory. This means that a < b and b < a can both be true, if the optimizer decides to spill the a or b register to memory between the comparisons.
And then there's NaNs of course, which always compare false. Not that you're likely to encounter any in this example but you can design some pretty nifty exploits by forcing one into, say, an algorithm which uses sentinels.
That's an interesting point about the less-than comparison. I'd assumed dmail was talking about using tolerances in the comparison (which isn't necessary or desirable in this case), but I may have been wrong about that.
Quote:Original post by dmail
Am I missing something here?


Quoting your quote:
Quote:returns true if the key value of _Left precedes and is not equal to the key value of _Right in the sort order.
Quote:Original post by dmail
Sneftel, yes I know how to create a sorting based on the individual components of the vector and if I were to write one it would not look like what you suggest for the point which you have skipped; floating point comparison i.e. the precision of floating points.
Read the original post. He's looking to remove duplicated vertices, not welding vertices based on proximity. Floating point precision is not a relevant topic.
Quote:Original post by implicit
The classic counterexample is the x87 FPU which internally performs calculations on 80-bit registers, and typically only stores truncated 32-bit floats or 64-bit doubles in memory. This means that a < b and b < a can both be true, if the optimizer decides to spill the a or b register to memory between the comparisons.
IEEE-754 defines rounding in a way that doesn't allow this to happen... two values can go from unequal to equal when one or both of them is truncated, but they can't switch from a<b to a>b. In any case, this isn't an issue that arises in the context of containers, since the data is truncated for writing to memory.
Quote:Original post by Sneftel
Quote:Original post by implicit
The classic counterexample is the x87 FPU which internally performs calculations on 80-bit registers, and typically only stores truncated 32-bit floats or 64-bit doubles in memory. This means that a < b and b < a can both be true, if the optimizer decides to spill the a or b register to memory between the comparisons.
IEEE-754 defines rounding in a way that doesn't allow this to happen... two values can go from unequal to equal when one or both of them is truncated, but they can't switch from a<b to a>b. In any case, this isn't an issue that arises in the context of containers, since the data is truncated for writing to memory.
Are you sure? I'm not saying you're wrong, it just seems like an effect that would be hard to achieve when only one of the variables is truncated.
Say you've got two 80-bit floats differing by only a small amount (so they'll both be truncated down to the same 32-bit value), then spill the higher of the pair to memory, reload it into an 80-bit register at a later stage, and finally perform a comparison against the original version of the other value.
At any rate only the data actually inside of the container will have to be stored to memory, the search keys might well survive intact inside of registers.
Quote:Original post by implicit
Say you've got two 80-bit floats differing by only a small amount (so they'll both be truncated down to the same 32-bit value), then spill the higher of the pair to memory, reload it into an 80-bit register at a later stage, and finally perform a comparison against the original version of the other value.

Oh, I see what you are saying. I was thinking of both values ending up as the truncated length, when one or both of them had started longer.
Quote:At any rate only the data actually inside of the container will have to be stored to memory, the search keys might well survive intact inside of registers.
The data in question is already only 32 bits. It came out of a float variable. There is no floating point arithmetic going on here.

[Edited by - Sneftel on April 30, 2009 3:03:43 PM]

Floating point values are traditionally not regarded as good candidates for hash tables. What is the source of your vertex data ? If it is the result of a computation then you need to be very careful that the compuatation always gives a deterministic result or else use another data structure that allows you to control for round-off and imprecision in the least significant bits. For example an intersection test on two segments to produce a new vertex will yield a slightly different binary result (and therefore produce different hash values) depending on the ordering of calculation. The order of calculation here is determined by the ordering of each vertex in each segment and the order of segment argument pushing to the segment test function.

By contrast, if the vertex source is coming straight from deserializing a file then you should be ok. I recently wrote a polygon to polyline routine to eliminate duplicate adjacent segments in vectorial data using unordered_map which works fine - but I limit it to to unprocessed data with exactly represented values.

As a side note, I don't believe that a comparator less< T> test is a good choice for a hash data structure to resolve collisions. I expect that the designers of the hash_map were thinking about leaving open the possibility of using some type of balanced tree for improved bucket searching, but really the focus should really be on reducing collisions and then using simple traversal and match. The tr1 unordered hash maps use equality testing predicates (ie equal_to< T> ) rather than comparators which imho is a better choice.
Quote:Original post by bubu LV
Sorry, my mistake. It needs < operator, not == operator.

Aaah, that was it -- I didn't have an operator < for MyVec. I added it and hash_map::find() always returns a proper result now.

Sneftel, I hope you don't mind that I swiped your code. ;)
Quote: "This is my first time working with stdext::hash_map, so please be gentle. :)"

Just out of curiosity, why did you decide to use a hash map?

(I usually use a std::map to do the same kind of search)
I know what a hash map is (buckets and all that...), but I'm just being curious, maybe there's something I could learn from your decision-making on this choice.

Cheers.

This topic is closed to new replies.

Advertisement