Fastest map-like collection for <GUID, BaseClass*> lookup

Started by
3 comments, last by frob 10 years, 2 months ago

I currently have

std::map<const guid, CommWrapper >* comm_table;

where guid

 struct guid
 {
  unsigned char d[16];
  
  // Less-comparer for std::map with GUID keys
  bool operator<(const guid& rhs) const
  {
   for(unsigned c = 0; 16 != c; ++c)
   {
    if(d[c] == rhs.d[c]) {
  continue;
    }
    return d[c] < rhs.d[c];
   }
   return false;
  } 
 };

But it seems to be way slower than the approach i took before, where i just based the (gu)id on the index the pointer had in a std::vector.

for instance truck would be in comm_table[5], and thus its id would be 5.

But I kinda like the idea of using some big, fancy guids. It's just really slow for autonomous inter-instance communication,
and I'm wondering if I'm doing it right. Perhaps there's no reason to go this way at all.

Or am i just using a wrong collection?

Advertisement

The comparision method isn't the fastest one. For lookup heavy maps I would sugguest the use of a hashmap instead.


But I kinda like the idea of using some big, fancy guids.

Do you really need a global, unique id here ? What's wrong with a 32bit or 64bit application unique id ? It would be a lot faster and would make it even more useful in combination with a hashmap. Looks a little bit like over-engineering to me smile.png

The comparision method isn't the fastest one. For lookup heavy maps I would sugguest the use of a hashmap instead.


But I kinda like the idea of using some big, fancy guids.

Do you really need a global, unique id here ? What's wrong with a 32bit or 64bit application unique id ? It would be a lot faster and would make it even more useful in combination with a hashmap. Looks a little bit like over-engineering to me smile.png

Thanks for your input. Yeah, I have been over-engineering. It's probably because I've become so accustomed to these 128bit guids at work that I feel like I should be using those too. But you're absolutely right - less will definitely suffice. I'll moderate this, and overhaul this communicator thing with hashmaps and shorter uids.

I only want them to be unique application-wide, after all.

I've found 64bit unsigned ints are more than enough for any GUID I've had. I tend to use them as bit fields as well to help identify type and which machine they come from, so I could imagine I could get away with 32 bits in a simpler game. The nice thing is that the comparison is simply an integer comparison, which is really fast.

As for quick look-up, you need to think about how many objects you have total. If you have less than 100 objects, a simple array of objects may actually be faster than using std::map (which if I recall, is a red-black tree data structure). However, if you have many objects (10000-1000000) you may need to test which is better, a std::map or some kind of hash table.

Cheers,

Bob


[size="3"]Halfway down the trail to Hell...
As Scourge mentioned, the number of items stored is important.


std::map is a tree sorted by the key value. It takes O(log N) time to find the value. It makes comparisons as it walks down the tree in a binary search.
std::unordered_map (c++11) is a hash table ordered by key value. It computes a hash of they key, then looks up the slot directly in a hash table.

And finally, storing them sorted in a std::vector or a non-dynamic array allows you to do a binary search if it is sorted, or a linear search if it is not.


So to figure out fastest, you need to figure out how lookups are going to happen. How many items (N) are you storing? A linear search takes N/2 comparisons on average, but benefits from cache effects as you run through the data. The cost for doing 1 may be the same cost as doing 16 or 32 compares thanks to the cache helping out a linear accessing pattern. A binary search takes log N comparisons but they don't get a cache bonus as they are not linear. Finally, a hash table has a near-constant time cost, but it must go through the effort of computing a hash, then looking in the bucket that has the hash; if there are multiple items in the bucket it must search through those few items.

Which of those is fastest depends on the cost to compare the keys (a single machine word is much faster than a string based comparison) and on the cache friendliness of the data (a machine word is smaller than a string, and a string can possibly require more memory lookups) and finally on the cost to compute a hash.

I will add my vote to using a machine word size for the key as well. It is the preferred efficient size for processing on the machine.



Incidentally, many games use a layered approach to resource lookup so that it happens as infrequently as possible.. They have a long name, such as a file name or a resource key or some other value that is used outside the system. When a request for the named object is made the object gets looked up (hash table) to see if it is in the resource list already. If no, the item is added to the resource list and the key and list index are added to the hash table; the program now has the resource index and can use that for the rest of the running of the application. The resource list is not actually the data, but a proxy to the data that can be loaded and unloaded as needed based on game-specific factors. The net result is that the game gets either a fixed index (or even a fixed pointer) in the resource list so hopefully lookup needs to happen only once, ever.

This topic is closed to new replies.

Advertisement