using hash_map vs vector....

Started by
6 comments, last by phil_t 10 years, 2 months ago

Hello. Maybe it would be better to state the real problem here, because I can't formulate a better topic title @_@ (sorry guys!)

I'm coding my GUI system right now, and so far it's been so simple:

-user create gui (sys->createGUIobj()) blah blah and return pointer to the created GUI object (which could be window, button, label)

-since user got direct pointer, he could do input by sending message to that object

the GUI objects are stored inside a hierarchy tree. which has root->children relationship. there's no concept of ID here, since each GUI object got its own pointer, and it is the user responsibility to store the pointer and then use it to communicate with them

now, I'd rather that the user set the GUI object's ID at creation, and make it so there's no need to store its pointer. he would just call

sys->settext(player_name_id, "FUUUUU") or whatever just by sending message supplying the GUI object's id. but then, I need a way to point to an object simply by using its ID.

I can't think about it other than using hash_map. I might use something like


hash_map<unsigned int, GUIObj*> obj_tables;

and access is very simple, just obj_tables[ID], see if it's NULL, which means it doesn't exist.

However, I did a simple test using vector, hash_map, and simple_array. it's just a stupid sample to see the insertion and access time of every construct type, and of course the simple array wins, but the main concern for me is that the average access time for the hash_map is the highest, being 0.415 ms in my notebook. (which means that for 1000 access it's 415 ms!!!) this is unacceptable to me, however, I can't find a better approach.

btw, here's the source code of the simple sampler app I wrote

http://pastebin.com/kuuATHSy

Thus, I wonder how I would solve the problem.

TL:DR;

I want to refer to an object just by suplying its ID (which is quite arbitrary since it's supplied by the user)

Advertisement

and access is very simple, just obj_tables[ID], see if it's NULL, which means it doesn't exist.


That does not sound healthy. Using std::unordered_map the code would be the same you would use for a normal std::map:

auto it = myMap.find(key);
if (it != myMap.end())
   return it->second;
else
   return nullptr;
Just adding nullptr elements to the map (on every operator [] call which does not have a (key, value)-pair yet) because the element does not exist yet sounds like a bad idea.
Whether a hash map is worth the bother at all will also depend both on the number of elements you expect to deal with as well as the expected access patterns.

I made a set of unordered flat set/map containers that are here: http://sourceforge.net/projects/unorderedflatse/files/? They are like vectors with a set/map interface. I've found them to be quite useful for my own uses. I haven't looked at your code, and 415ms seems a bit long. That said unorderd_maps/unordered_sets can be slow as they are associative containers, which can cause a lot of cache thrashing. 415ms still seems alot though...

Ok a few things on closer inspection. The resolution on <time.h>'s clock is not high enough to properly profile the insert/delete's. I personally use the high performance counter in Windows. Other OS's would have to use something else. The C++11 chrono library can get pretty high resolution too AFAIK. My suggestion is to use a better timer, a MUCH larger sample count, and starting the timer at the very beginning and at the end, don't start and stop it between every 'sample'. I imagine most of the time in the code you posted it spent in the timer functions. Also assigning the look'ed-up value to a volatile variable will often prevent the compiler optimizing it completely out. If you're not sure what's going on a quick look at the assembly will usually tell you if the code is being executed or if a clever optimizer has simply optimized it all away on you, you'd be surprised at how clever compilers can be...

You will not have a 1000 access of the hash_map in one frame, if you have a screen with a 1000s interactable objects on it, start rethinking that screen because you did it wrong. Besides the code to test this stuff has overhead in it that you don't need remeber that clock() also has an overhead when called, its better to do the insertion for say 1 million elements and taking the clock reading outside of your loop.v Also insertion into the hash map has the overhead of having to hash the key, if you are just using numbers you are better of using a normal std::map as mentioned.

Choice of your container here all comes down to how you want to access it. In a previous project we split the render data from update data completely and the update side, the render data was stored as a normal scenegraph in which each scene contained a normal map from a hash_value of a node name to the pointer of an actual object. This allowed for efficient dispatch of message to the render nodes in big scenes, in the neighbourhood of a few ns for full address dispatch through deep trees. Message dispatch worked along the lines of:

1.-Can we find the object in the tree?

-No: Flag error and stop dispatch

-Yes: Is this the last element in the address?

-Yes: Dispatch message to actual node.

-No: is this a new scene node?

-No: Start at 1 again in this scenes map with the next token in the node.

-Yes: Start at 1 but use the map of the scene node you just found to look for the next token.

This was fast used on big scenes, 100s of nodes active at the same time, address was a list of hash tokens to pass in to a normal std::map::find function, minor RTTI checks to see node types.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

and access is very simple, just obj_tables[ID], see if it's NULL, which means it doesn't exist.


That does not sound healthy. Using std::unordered_map the code would be the same you would use for a normal std::map:

auto it = myMap.find(key);
if (it != myMap.end())
   return it->second;
else
   return nullptr;
Just adding nullptr elements to the map (on every operator [] call which does not have a (key, value)-pair yet) because the element does not exist yet sounds like a bad idea.
Whether a hash map is worth the bother at all will also depend both on the number of elements you expect to deal with as well as the expected access patterns.

I was initially going to do that, but I wonder if it's fast enough, since...umm I thought find() would do binary search like it does on std::map. Anyway heheh I thought the [] operator in hashmap didn't create new elements because I read somewhere that hash_map would internally create many elements, and would do like double once the pool is like 75% full. Anyway thanks for finding the flaws in my code! :D

Ok a few things on closer inspection. The resolution on <time.h>'s clock is not high enough to properly profile the insert/delete's. I personally use the high performance counter in Windows. Other OS's would have to use something else. The C++11 chrono library can get pretty high resolution too AFAIK. My suggestion is to use a better timer, a MUCH larger sample count, and starting the timer at the very beginning and at the end, don't start and stop it between every 'sample'. I imagine most of the time in the code you posted it spent in the timer functions. Also assigning the look'ed-up value to a volatile variable will often prevent the compiler optimizing it completely out. If you're not sure what's going on a quick look at the assembly will usually tell you if the code is being executed or if a clever optimizer has simply optimized it all away on you, you'd be surprised at how clever compilers can be...

I resort to that because I can't use C++11 yet, and I don't know the equivalent of QueryPerformanceCounter in linux :p I think it can't measure to nanoseconds, which is my suspicion to the high ms. Anyway my timing algorithm apparently sucks and needs to be rewritten. Thanks for the suggestion. I will move the timer out of the loop and increase the sample count now :)

You will not have a 1000 access of the hash_map in one frame, if you have a screen with a 1000s interactable objects on it, start rethinking that screen because you did it wrong. Besides the code to test this stuff has overhead in it that you don't need remeber that clock() also has an overhead when called, its better to do the insertion for say 1 million elements and taking the clock reading outside of your loop.v Also insertion into the hash map has the overhead of having to hash the key, if you are just using numbers you are better of using a normal std::map as mentioned.

Choice of your container here all comes down to how you want to access it. In a previous project we split the render data from update data completely and the update side, the render data was stored as a normal scenegraph in which each scene contained a normal map from a hash_value of a node name to the pointer of an actual object. This allowed for efficient dispatch of message to the render nodes in big scenes, in the neighbourhood of a few ns for full address dispatch through deep trees. Message dispatch worked along the lines of:

1.-Can we find the object in the tree?

-No: Flag error and stop dispatch

-Yes: Is this the last element in the address?

-Yes: Dispatch message to actual node.

-No: is this a new scene node?

-No: Start at 1 again in this scenes map with the next token in the node.

-Yes: Start at 1 but use the map of the scene node you just found to look for the next token.

This was fast used on big scenes, 100s of nodes active at the same time, address was a list of hash tokens to pass in to a normal std::map::find function, minor RTTI checks to see node types.

Of course in my GUI system, there won't be a lot of message passing happenned in user interaction. I want to test the strength of the hash_map because I wanted to use it for my game objects too, I need message passing between objects, scriptable. thus I need some way to refer to an object by its ID. thus the hash_map test :D

Although the test itself was technically full of inefficiency :p Thanks to you I will try to improve it. I think it works for me now, and I'll start profiling on a bigger scheme later, when I have a "real-life" scenario working. Many thanks to you guys!!


You will not have a 1000 access of the hash_map in one frame, if you have a screen with a 1000s interactable objects on it, start rethinking that screen because you did it wrong. Besides the code to test this stuff has overhead in it that you don't need remeber that clock() also has an overhead when called, its better to do the insertion for say 1 million elements and taking the clock reading outside of your loop.v Also insertion into the hash map has the overhead of having to hash the key, if you are just using numbers you are better of using a normal std::map as mentioned.

Choice of your container here all comes down to how you want to access it. In a previous project we split the render data from update data completely and the update side, the render data was stored as a normal scenegraph in which each scene contained a normal map from a hash_value of a node name to the pointer of an actual object. This allowed for efficient dispatch of message to the render nodes in big scenes, in the neighbourhood of a few ns for full address dispatch through deep trees. Message dispatch worked along the lines of:

1.-Can we find the object in the tree?

-No: Flag error and stop dispatch

-Yes: Is this the last element in the address?

-Yes: Dispatch message to actual node.

-No: is this a new scene node?

-No: Start at 1 again in this scenes map with the next token in the node.

-Yes: Start at 1 but use the map of the scene node you just found to look for the next token.

This was fast used on big scenes, 100s of nodes active at the same time, address was a list of hash tokens to pass in to a normal std::map::find function, minor RTTI checks to see node types.

Of course in my GUI system, there won't be a lot of message passing happenned in user interaction. I want to test the strength of the hash_map because I wanted to use it for my game objects too, I need message passing between objects, scriptable. thus I need some way to refer to an object by its ID. thus the hash_map test biggrin.png

Although the test itself was technically full of inefficiency tongue.png Thanks to you I will try to improve it. I think it works for me now, and I'll start profiling on a bigger scheme later, when I have a "real-life" scenario working. Many thanks to you guys!!

OK just as a real metric message dispatch in DiRT 3 for the HUD was taking about 5 ms of the frame, which graphics did not like at all. So UI systems generally are the worst when it comes to message or event handling.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion


the main concern for me is that the average access time for the hash_map is the highest, being 0.415 ms in my notebook. (which means that for 1000 access it's 415 ms!!!) this is unacceptable to me, however, I can't find a better approach.

As other folks have mentioned, your performance measurements must be off. There's no way it's that slow. A hash_map is designed for efficient retrieval. I don't imagine 1000 accesses would be an issue within the context of a frame (but use .find() to retrieve the items, as someone else mentioned).

So just use whatever is easiest.

Your timing code assumes CLOCKS_PER_SEC == 1000, which is apparently (evidently) not the case. Using clock() correctly, you divide by CLOCKS_PER_SEC (and multiply by 1000 to get milliseconds). See here: CLOCKS_PER_SEC

I wouldn't bother with the manual timing, however, if it is so much easier to crank up the number of iterations and see how many seconds the program runs in total.

When I run your program (after modifying it slightly so it will compile because you use hash_map which is a "detail" class, you probably meant unordered_map) on my computer, it executes in "no time at all".

Changing the sample count to one million makes the whole program (insert and lookup, everything together) finish after 0.9 seconds. In other words, it takes roughly 1/1000,000 seconds, or one microsecond for both tests (insert and lookup).

Fast enough?

EDIT:

Now, this is a little embarrassing... I just wondered why the one million looks so weird, and indeed that million has 7 zeroes. laugh.png

It is thus really 10 million iterations that run in 0.9 seconds. The time for one iteration (of both insert and lookup test) is therefore only 0.1 microseconds (ca. 100 ns, or 266 cycles @ 2.66 GHz).

This topic is closed to new replies.

Advertisement