Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

String IDs for objects, events etc.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
18 replies to this topic

#1 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 21 March 2011 - 05:38 AM

Hello,

it's about unique IDs in game engines again...

I've been using enums for some IDs (GUI controls, events) and std::string IDs for game objects/entities and resources (textures etc.). Now I would like to get rid of enums (because it's hard to add new events etc with them) and for performance reasons also get rid of strings; and everyone seems to be recommending hashed strings for identifiers.
And here comes my problem - I'm at the end of my abilities, I've spent days (well, weeks in fact) searching for solutions, reading books, threads here etc., but I still don't understand how does it work. Everyone always just says "hashed strings" or "hash table", but nobody mentions how is it implemented in fact, not even in the (otherwse great) book Game Engine Architecture by Jason Gregory.

It will be maybe stupid questions for most of you, but I'm probably missing some basic point of the whole hashing stuff :)
Please note that I don't need a whole code, I just need you to point me to the right direction, in few specific implementation details.

1. Storing the elements.

Now I have a simple
std::map<std::string, BlaBlaClass*>
.
What would I use with hashed string ids instead of this?

2. Accessing the elements.

Besides iterating (of course), I can access any element by directly using its string id, something like (pseudocode)
resourceManager.GetTexture("bricks.tga")->DoSomething();
The key "bricks.tga" is found simply by map::Find() or map::Operator[]
How would that be done with hashed ids?

3. Identifying elements.

For example with event types as an enum, I can decide what to do in an event handling callback method by simply making a switch statement:
switch (e.eventType) {
  case EVENT_SOMETHING:
	break;
  case EVENT_SOMETHINGELSE:
	break;
}
(Would be similar with strings, just with if-else instead of switch.)
How could that be done with hashed strings?


That you very much very any input.

(Btw, as far as theory goes, I know what a hash function is and I know how to calculate a hash value of lets say a string.)

Ad:

#2 haegarr   Members   -  Reputation: 1771

Like
0Likes
Like

Posted 21 March 2011 - 06:13 AM

Some thoughts:


A hash value is an integer number, so theoretically you can replace each occurrence of std:string with a, say, 32-bit uint. However, the problem with hash values is that they are "condensed" strings, i.e.they have less information than the original string. A consequence is that the backward mapping hash -> string is not unique for a given hash value. Hence it may not be sufficient to just use the hash value.

Computing one and the same hash value from 2 or more originals in the same context is called a collision. You have at least 2 ways to deal with collisions.

The one way is to group all objects with the same hash value. Hence passing the hash value only addresses the group but not an individual object. Because you need an individual, you need to pass both the hash value and the original string, using the hash value just to reduce the amount of full string comparisons to the candidates in the addresses group.

The other way is to react on collisions by applying a strategy to get another hash, e.g. the next free hash value. Such a strategy requires that enough different hash values are available to represent all possible original strings, and often is also required that the order of computing hash values (when applied more than once) is the same. The advantage is that each computed hash value is unique and hence addresses exactly 1 object.

In your case it is probably so that the amount of original strings is much smaller than, say, the 4 billions possibilities of 32-bit integers (probably also 16-bits would be sufficient). So you may go the way to register all strings in a initialization phase, and then query them once for each occurrence and cache the result. So the strings are of interest mainly for you as the programmer, while later (after the initialization) only the hash value play a role.

Well ... in the above scenario the hash value has lost some of its sense. You could yield in something similar without computing the hash values by registering the strings and returning a pointer to the registered string for each query in the initialization phase (hence using a kind of string pool). Then a pointer comparison would do the job. However, a pointer is 4 or 8 bytes long compared to perhaps 2 bytes of a hash value, if this is an issue for you.

#3 buggypixels   Members   -  Reputation: 103

Like
0Likes
Like

Posted 21 March 2011 - 07:00 AM

You might take a look here: http://gamesfromwithin.com/managing-data-relationships.
Since your example is about resource management this might fit perfectly.
Overall using strings and std::map is not the best solution here.
If you still need to support strings (better to say to support name lookups) you can easily
extend the system described above to include a name/handler mapping.

#4 Hodgman   Moderators   -  Reputation: 13458

Like
0Likes
Like

Posted 21 March 2011 - 07:26 AM

Here's the code I use for fast string IDs. Instead of a hashing algorithm, I copy the strings into a set, and the use pointers to the set's elements as IDs.

This has the advantage of having no collisions and being able to convert from a "hash" back into the original string data if needed. However, the downside is that they can't be saved to disk like hashes can, as the pointer values could be different each time you run the program. You can use them in pretty much the same way that you'd use either a normal string, but comparisons between them are extremely fast.
class CStringTable;
class CFixedString
{
public:
	CFixedString();
	CFixedString( const CFixedString& );
	CFixedString& operator=( const CFixedString& );
	bool operator!=( const CFixedString& ) const;
	bool operator==( const CFixedString& ) const;
	bool operator< ( const CFixedString& ) const;//ordered randomly, may differ from run-to-run, but consistent within each run.
	operator bool() const { return m_pString && !Str().empty(); }
	const std::string& Str() const { return *m_pString; }
private:
	friend class CStringTable;
	CFixedString( const std::string& s );//can only be created by a string table
	const std::string* m_pString;
};

class CStringTable
{
public:
	CFixedString FixedString( const std::string& );//use this function to turn a std::string into a "hash"
private:
	typedef std::set<std::string>                StringSet;
	typedef std::pair<StringSet::iterator, bool> StringSetInsertResult;
	StringSet m_Table;
};

inline CFixedString::CFixedString( const std::string& s ) : m_pString(&s) {}
inline CFixedString::CFixedString( const CFixedString& s ) : m_pString(s.m_pString) {}
inline CFixedString::CFixedString() : m_pString() {}
inline CFixedString& CFixedString::operator=( const CFixedString& s ){ m_pString = s.m_pString; return *this; }
inline bool CFixedString::operator!=( const CFixedString& o ) const { return m_pString != o.m_pString; }
inline bool CFixedString::operator==( const CFixedString& o ) const { return m_pString == o.m_pString; }
inline bool CFixedString::operator< ( const CFixedString& o ) const { return m_pString <  o.m_pString; }//ordered randomly, may differ from run-to-run, but consistent within each run.
inline CFixedString CStringTable::FixedString( const std::string& str )
{
	StringSetInsertResult r = m_Table.insert( str );
	return CFixedString( *r.first );
}


#5 NEXUSKill   Members   -  Reputation: 291

Like
0Likes
Like

Posted 21 March 2011 - 09:11 AM

I would recommend you to take a look at the Game Programming Gems series of books, I've read them a couple of years ago and I remember there was a very nifty gem on the topic of unique object IDs, although I cant recall it right now.
I would seriously advice against string names, they may be user friendly for end-user editors and such, but I've work with engines that used them and as soon as you start pushing the performance of the target platforms that feature will come to bite you in places you wont like.
Game making is godlike

LinkedIn profile: http://ar.linkedin.com/pub/andres-ricardo-chamarra/2a/28a/272



#6 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 21 March 2011 - 09:18 AM

A hash value is an integer number, so theoretically you can replace each occurrence of std:string with a, say, 32-bit uint.
....
So you may go the way to register all strings in a initialization phase, and then query them once for each occurrence and cache the result.
...
Well ... in the above scenario the hash value has lost some of its sense. You could yield in something similar without computing the hash values by registering the strings and returning a pointer to the registered string for each query in the initialization phase (hence using a kind of string pool). Then a pointer comparison would do the job. However, a pointer is 4 or 8 bytes long compared to perhaps 2 bytes of a hash value, if this is an issue for you.

So, I would implement some way how to "turn" a string into an UINT, either as a hash function or some table, I understand that. Let's call that function
UINT MakeID(const string& str);
My game objects or resources can be still held in a map, but now it would be map<UINT, BlaBlaClass*> which should be faster than when the key was string.

Now, when I need for example in the game logic code to do something with a particular object, i can either get the UINT ID "on the fly", if I need it just occasionally (pseudocode):
EntityManager.GetEntity(MakeID("TheBigRedDoor"))->DoSomething();
Or, if I need that object every frame, I would either cache directly the pointer to that object, or at least cache the UINT ID for example this way:
static UINT bigRedDoorId = MakeID("TheBigRedDoor");
// then it's just:
EntityManager.GetEntity(bigRedDoorId)->DoSomething();
Similarly it could be done in the event handling method, I would be comparing the type of the actual event (UINT ID) with static precomputed UINT IDs.

Did I get it all right? :D

#7 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 21 March 2011 - 09:23 AM

You might take a look here: http://gamesfromwith...a-relationships.
Since your example is about resource management this might fit perfectly.
Overall using strings and std::map is not the best solution here.
If you still need to support strings (better to say to support name lookups) you can easily
extend the system described above to include a name/handler mapping.


One of the examples is about resource management, but I'm looking for and ID system for multiple subsystems, including event types, resources and game entities.
But the article is indeed interesting, I've already heard about handles and I'll certainly give the article more attention, looks usefull.


Here's the code I use for fast string IDs. Instead of a hashing algorithm, I copy the strings into a set, and the use pointers to the set's elements as IDs.

Thanks for that example, that looks like one viable way how to implement that "UINT MakeId(const string& str)" function I mentioned in my first reaction in this thread.

#8 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 21 March 2011 - 09:27 AM

I would recommend you to take a look at the Game Programming Gems series of books, I've read them a couple of years ago and I remember there was a very nifty gem on the topic of unique object IDs, although I cant recall it right now.
I would seriously advice against string names, they may be user friendly for end-user editors and such, but I've work with engines that used them and as soon as you start pushing the performance of the target platforms that feature will come to bite you in places you wont like.


Thx for your reaction, too.

Yea, that's why I'm trying to get rid of strings. It was OK in my previous project where the number of elements in the map was really low, something around 20 or so max.

I know about Game Programming Gems, but there are already at least 7 of them and I cannot get them all to find where is the part about IDs :(

#9 haegarr   Members   -  Reputation: 1771

Like
0Likes
Like

Posted 21 March 2011 - 09:43 AM

...
Now, when I need for example in the game logic code to do something with a particular object, i can either get the UINT ID "on the fly", if I need it just occasionally (pseudocode):

EntityManager.GetEntity(MakeID("TheBigRedDoor"))->DoSomething();
Or, if I need that object every frame, I would either cache directly the pointer to that object, or at least cache the UINT ID for example this way:
static UINT bigRedDoorId = MakeID("TheBigRedDoor");
// then it's just:
EntityManager.GetEntity(bigRedDoorId)->DoSomething();
Similarly it could be done in the event handling method, I would be comparing the type of the actual event (UINT ID) with static precomputed UINT IDs.

Did I get it all right? :D

In principle, yes, that's one of the ways. However, as mentioned in the previous post, you have of course to make sure that MakeID(...) returns a unique ID for every and each string. You must take extra work to guarantee this uniqueness in the case of hash values. The other way (that is also mentioned by Hodgman if I understood it correctly) is to use a string pool (a.k.a. "string interning").

#10 NEXUSKill   Members   -  Reputation: 291

Like
0Likes
Like

Posted 21 March 2011 - 09:45 AM

Aren't the chapter index publicly available? I'll see if I can find where I left those and point you to the specific book / gem
Game making is godlike

LinkedIn profile: http://ar.linkedin.com/pub/andres-ricardo-chamarra/2a/28a/272



#11 haegarr   Members   -  Reputation: 1771

Like
0Likes
Like

Posted 21 March 2011 - 09:50 AM

...
I know about Game Programming Gems, but there are already at least 7 of them and I cannot get them all to find where is the part about IDs :(

At this site there is an overview about all the Gems and AI Wisdom books. The overviews are a bit scarce for the early Gem books, but perhaps it may still help you out.

#12 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 22 March 2011 - 04:53 AM

Seems there could be something in Game Programming Gems 5, but I cannot buy it for its price (plus shipping) just for one article, which even isn't guaranteed to be helpful to me.

I'll try to implement the ideas from this thread, or maybe somebody will come with something else?

What about using unordered_map (hash_map)?

#13 Gorbstein   Members   -  Reputation: 116

Like
1Likes
Like

Posted 22 March 2011 - 06:39 AM

I found hash tables to be the way forward for this particular thing.
I use a custom hashtable and StringHandle class, that avoids re-calculating any hashes at runtime, except for say when you need to load in new resources.

Here's roughly how it works

HashTable table(1000);
GameObject* myObject = new GameObject(blah blah blah);

StringHandle handle = table.add("some kind of object", myObject);


StringHandle then contains the original string plus the original full hash value. You could store that inside the GameObject so you can quickly grab it back at any point.

then later somewhere else I can do...

table.get(handle);

The get function uses the original hash value as an index straight into an array so it's nice and quick.

I also found it useful to store the hash function I use inside the StringHandle class so that I can do:

handle = "name of some object";

Which automatically calculates the hash value for that string and stores it, so that it can be generated before it is added to any tables and held for the whole lifetime of the object. The hash value should be compatible with any table in the game of any size since I use the same hash function everywhere.

I also added the possibility to do

handle = anotherHandle

using operator overloading, which again avoids calculating a hash since it just copies the existing one over.

D

#14 typedef struct   Members   -  Reputation: 230

Like
0Likes
Like

Posted 22 March 2011 - 08:25 AM

I'm curious if anyone has run into performance problems using GUIDs (128-bit random numbers). I use them all the time without issue.
Anthony Umfer

#15 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 22 March 2011 - 10:13 AM

I found hash tables to be the way forward for this particular thing.
I use a custom hashtable


How is your custom hashtable implemented? I mean just the basic principle of it. You say you use the hash directly as an array index? You cannot have such a huge almost empty array if using 4 bytes long hash values.

#16 Antheus   Members   -  Reputation: 2369

Like
1Likes
Like

Posted 22 March 2011 - 10:27 AM

I mean just the basic principle of it. You say you use the hash directly as an array index? You cannot have such a huge almost empty array if using 4 bytes long hash values.


Of course it's possible - it's what hash tables are.

int hash = ...

string hash_table[200];  // 200 buckets

string item = hash_table[ hash % 200 ];

Obviously, more than one hash can now map to same bucket. See various strategies on dealing with this.

#17 kdmiller3   Members   -  Reputation: 176

Like
1Likes
Like

Posted 22 March 2011 - 11:13 AM

If you don't want to make your own custom hash-table implementation, you could always use the STL hash_map container...

As for hash algorithms, I happen to like FNV-1a since it's small, simple, fast, performs no table lookups, and has good dispersion for short strings and small data blocks like a single integer:
inline unsigned int Hash(const unsigned char byte, unsigned int hash = 2166136261u)
{
    	hash ^= byte;
    	hash *= 16777619u;
    	return hash;
}
inline unsigned int Hash(const void *data, size_t len, unsigned int hash = 2166136261u)
{
    	const unsigned char * d = static_cast<const unsigned char *>(data);
    	const unsigned char * const e = d + len;
    	while (d < e)
            	hash = Hash(*d++, hash);
    	return hash;
}
inline unsigned int Hash(const char *string, unsigned int hash = 2166136261u)
{
    	if (string == 0)
            	return 0;
    	for (const char *s = string; *s != 0; ++s)
            	hash = Hash(unsigned char(tolower(*s)), hash);
    	return hash;
}
Here's a little command-line utility that returns the hash of its command line argument along with a comment containing the original string:
#include <stdio.h>
#include "Hash.h"

int main(int argc, const char* argv[])
{
    	for (int i = 1; i < argc; i++)
    	{
            	printf("%s0x%08x /* \"%s\" */", i > 1 ? "\n" : "", Hash(argv[i]), argv[i]);
    	}
    	return 0;
}
(This assumes the hashing functions I listed above are in the header Hash.h)

I use it as an external tool in Visual C++ with Arguments set to $(CurText) and the Use Output Window option checked, bound to Shift-Ctrl-Alt-H. I select whatever string I want to hash in the editor, press Shift-Ctrl-Alt-H, copy the hashed string from the output window, and paste it into my code. Very convenient! :)

#18 Gorbstein   Members   -  Reputation: 116

Like
1Likes
Like

Posted 22 March 2011 - 11:20 AM

How is your custom hashtable implemented? I mean just the basic principle of it. You say you use the hash directly as an array index? You cannot have such a huge almost empty array if using 4 bytes long hash values.


1. you allocate an array of whatever size you think would suit the situation. It can be 10 (ie: guns in an entity) or 1000 (ie: list of everything in the game) entries depending on where you use it.

2. when passed a string, you can generate its hash, which gives you a 4 byte int. You then use hash%sizeoftable to get the array index of this table. Of course if you have already generated the hash at program load, you can just skip the potentially expensive hashing step and use hash%sizeoftable.

3. Sometimes hash%sizeoftable will map two different hashes to the same index (a collision). I store both objects at the same location by linking them with a next pointer like a linked list. When a collision happens it slows down lookups and storage slightly but the trick is in choosing a hash function that distributes everything evenly to limit this situation. Using a bigger table than necessary also helps.

Your choice of hash function determines how well it works. Usually it's a balance of speed of generating vs avoiding collisions. I use MurmurHash2 (link) but you can find plenty of algorithms and comparisons kicking about.

I recommend dumping table contents often at runtime to see how well the distribution is working, then you can tweak table size and hash function until you get the right balance.

D

#19 Tom KQT   Members   -  Reputation: 636

Like
0Likes
Like

Posted 22 March 2011 - 01:10 PM

Thank you very much for your input, people!

It seems that I was at a dead end because I thought it was all about using std::unordered_map (which is like a hash_map), but "problem" with it is that when you want to find some entry, it always computes the hash from the given key and then performs the bucket lookup. That made me really confused, but when I think now about what Antheus and Gorbstein said, when I make my own table, I can search it even with already precomputed hash value. Then it's name "hash table" is a bit misleading, btw, because there's no hasing anymore (at that level) :)

I'll be off for one day and then I'll return to this and try to implement something finally.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS