Unique ID per type

Started by
8 comments, last by Ryan_001 8 years, 3 months ago
Hi,
I have a Scene graph of SceneNodes. Every SceneNode has a vector of Component*.
Component* base class of all types of components in my scene, like a CameraComponent, LightComponent, CharacterController, etc. (Like Unity.)
I am working serializing the data so I can store in something like an XML or JSON file.
The only issue I have right now is creating a unique ID for every type of component that is cross platform and is the same on every run.
I know there is the hacky way of having a big enum, in which you create an ID for every class you create. However I don't like this at all. Is there a better way to do this, where it generates id's automatically?
Thanks in advance!
EDIT:
I think I have found a way :D
I simply generate a hash based on the class name. I can also replace the RTTI with a static function for every type if I really want to. (however I don't really see a point ATM.)
It also allows me to change to a different hashing algorithm, should this one fail me.

struct TypeHash
{
  //check if types are equal
  friend inline bool operator==(const TypeHash& l, const TypeHash& r)
  { return l.id == r.id; }
  //check if types are different
  friend inline bool operator!=(const TypeHash& l, const TypeHash& r)
  { return !operator==(l, r); }


  //for sorting
  friend inline bool operator<(const TypeHash& l, const TypeHash& r)
  { return l.id < r.id; }


  template<typename T> friend TypeHash HashType();
private:
  TypeHash(uint64_t id) : id(id) {}
  uint64_t id;
};


//create a hash based of a type name
inline uint64_t HashTypeName(const char* str)
{
  //djb2 algorithm from here:
  //http://www.cse.yorku.ca/~oz/hash.html
  uint64_t hash = 5381;
  int c;
  while((c = *str++) != 0)
    hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
  return hash;
}


template<typename T> inline TypeHash HashType()
{
  //a way of generating a hash that is cross platform
  static TypeHash hash = HashTypeName(typeid(T).raw_name());
  return hash;
}
Advertisement

That should work (except there is no such thing as raw_name in std::type_info, it's just name).

A few notes:

  • As it stands, your approach will generally "work fine" but the program is ill-formed because you don't include <typeinfo> prior to evaluating a typeid expression (your compiler should warn about that?).
  • You are caching the hash in a static variable. Keep it that way. The temptation exists to instead make the hash function constexpr. Don't do that because it will not work on the stricter compilers as the name is not a constant expression (it sure is, how could it possibly not be... but the standard and the compiler think differently) and will be vastly inferior on the more forgiving compilers (which will compile fine, but evaluate at runtime).
  • std::type_info has a member function hash_code which just looks like what you want, only better, faster, and standard. Yeah right. Don't fall for that. The hash provided by the standard not only isn't portable (that would be quite hard to do, admittedly), but it does not even guarantee that the value stays the same between different runs of the same executable. Which, frankly, is total shit.
  • The standard also provides std::type_index, which suggests by its name and its description that it could be very useful (index? as in unique number? great!), but it is really just a wrapper around std::type_info which adds operators <= and >= in terms of std::type_info.before(). Other than for using it as key in unordered standard containers, it's pretty useless.
  • Instead of using std::type_info.name(), you could use the well-known hack with a helper class that contains a static function which evaluates some flavour of __func__ or __PRETTY_FUNCTION__ or whatever you want to use, optionally adding an offset into the string constant to strip off the ugly mangling. These string constants are not constexpr either (although I think they should be, unless the name of a type can change during a program's execution which would be a big WTF, they are pretty darn constant expressions), but it is less bloat than using std::type_info (especially with GCC which has a very poor implementation), and you save yourself from including another header. I seem to remember someone even posted a complete, usable implementation of the __func__ hack on this site not long ago.
  • From the most pedantic point of view, using the __func__ hack even makes your program a little more robust. The typeid operator does not guarantee that the same type_info object is returned for different invocations in the same program with the same type. This sounds like something you could take for granted, and this is probably what happens anyway, but in the strictest sense, that's not the case. The standard merely says that some type_info (or derived) object with static storage duration is returned (and leaving unspecified whether destructors are called), and that the objects from different typeid expressions with the same type compare equal. That doesn't mean that they are equal, or that that name() returns the same value (or even the same pointer).


That should work (except there is no such thing as raw_name in std::type_info, it's just name).

A few notes:

  • As it stands, your approach will generally "work fine" but the program is ill-formed because you don't include <typeinfo> prior to evaluating a typeid expression (your compiler should warn about that?).
  • You are caching the hash in a static variable. Keep it that way. The temptation exists to instead make the hash function constexpr. Don't do that because it will not work on the stricter compilers as the name is not a constant expression (it sure is, how could it possibly not be... but the standard and the compiler think differently) and will be vastly inferior on the more forgiving compilers (which will compile fine, but evaluate at runtime).
  • std::type_info has a member function hash_code which just looks like what you want, only better, faster, and standard. Yeah right. Don't fall for that. The hash provided by the standard not only isn't portable (that would be quite hard to do, admittedly), but it does not even guarantee that the value stays the same between different runs of the same executable. Which, frankly, is total shit.
  • The standard also provides std::type_index, which suggests by its name and its description that it could be very useful (index? as in unique number? great!), but it is really just a wrapper around std::type_info which adds operators <= and >= in terms of std::type_info.before(). Other than for using it as key in unordered standard containers, it's pretty useless.
  • Instead of using std::type_info.name(), you could use the well-known hack with a helper class that contains a static function which evaluates some flavour of __func__ or __PRETTY_FUNCTION__ or whatever you want to use, optionally adding an offset into the string constant to strip off the ugly mangling. These string constants are not constexpr either (although I think they should be, unless the name of a type can change during a program's execution which would be a big WTF, they are pretty darn constant expressions), but it is less bloat than using std::type_info (especially with GCC which has a very poor implementation), and you save yourself from including another header. I seem to remember someone even posted a complete, usable implementation of the __func__ hack on this site not long ago.
  • From the most pedantic point of view, using the __func__ hack even makes your program a little more robust. The typeid operator does not guarantee that the same type_info object is returned for different invocations in the same program with the same type. This sounds like something you could take for granted, and this is probably what happens anyway, but in the strictest sense, that's not the case. The standard merely says that some type_info (or derived) object with static storage duration is returned (and leaving unspecified whether destructors are called), and that the objects from different typeid expressions with the same type compare equal. That doesn't mean that they are equal, or that that name() returns the same value (or even the same pointer).

I wouldnt use type info for this, you can do this through the preprocessor with something like:


//create a hash based of a type name
inline uint64_t HashTypeName(const char* str)
{
  //djb2 algorithm from here:
  //http://www.cse.yorku.ca/~oz/hash.html
  uint64_t hash = 5381;
  int c;
  while((c = *str++) != 0)
  {
    hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
  }
  return hash;
}
 
#define HASHTYPE(type)\
    inline TypeHash HashType()\
    {\
      //a way of generating a hash that is cross platform\
      static TypeHash hash = HashTypeName(#type);\
      return hash;\
    }
 
Class MyType
{
public:
    HASHTYPE(MyType);
 
};

This needs less template magic than your solution and is generally easier to use because typeid names contain namespaces and such which can make it hard to figure out what a hash belongs too when you are debugging the code. In my case you can just run the HashTypeName("MyType") function in the watch window to find the hash of your type when on a breakpoint.

Alot of game engines have stuff like Runtime type info turned off in compile settings which will make typeid not work for dynamic types.

If you use a constexpr hash function all of the preprocessor stuff and dynamic lookup of typeid at runtime will be change to static compile time implementations.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

You could use something like this is you dont need compile time ID


template<typename T>
struct quick_type_id{
    static void functionAsID() {}
}

and use


quick_type_id<int>::functionAsID

@samoth What you are describing is kind of what I am finding on the internet. I completely agree with your points.

@NightCreature83 I had to turn RTTI on yeah, and would prefer to have it off. I might actually go with a solution like that.

@imoogiBG I can't use that since it's not guaranteed that I get the same ID for a class on every run.

If you're talking about file I/O and format compatibility, you absolutely want manual IDs. Automatic IDs might rearrange or reallocate when you add a new component type, or after upgrading to a newer version of C++, or so on, and break compatibility with all your files.

In general, for file I/O and serialization purposes, you want as much versioning and explicitness as you're comfortable with. A good choice then is to manually manage a map from string names (or UUIDs or the like) to class factories. That lets you register components under a stable explicitly-chosen name. It lets you register the same component under multiple names so you can maintain backwards compatibility even if you decide that you simply must rename some component.

Avoid RTTI and typeid. The names are implementation-defined and may change after compiler upgrades. Avoid using the class name directly without having some way to easily override it, since you may rename classes down the line during code maintenance. Avoid static custom type-id tricks, as they are only stable for a specific build of the game.

Many of the common tricks you find online - or that have been brought up in this thread - for handling component type ids are focused on runtime ids and are not meant for on-disk stability and change resilience.

Sean Middleditch – Game Systems Engineer – Join my team!

  • From the most pedantic point of view, using the __func__ hack even makes your program a little more robust. The typeid operator does not guarantee that the same type_info object is returned for different invocations in the same program with the same type. This sounds like something you could take for granted, and this is probably what happens anyway, but in the strictest sense, that's not the case. The standard merely says that some type_info (or derived) object with static storage duration is returned (and leaving unspecified whether destructors are called), and that the objects from different typeid expressions with the same type compare equal. That doesn't mean that they are equal, or that that name() returns the same value (or even the same pointer).

Isn't this the entire point of type_index, so that type_index's made from different typeid's to the same type do compare equal? From my understanding the real issue with type_info's being different for different types becomes relevant when were talking same types used from different dll's.

For type checking in my Lua bindings, I use hashes generated by parsing __FUNCTION__


template<class T>
struct TypeName
{
	static void Get(const char*& begin, const char*& end)
	{
		begin = __FUNCTION__;
		for(++begin; *begin && *(begin-1) != '<'; ++ begin);
		for(end = begin; *end; ++ end);
		for(; end > begin && *end != '>'; -- end);
	}
	static const char* Get(Scope& a)
	{
		const char* begin=0, *end=0;
		Get(begin, end);
		uint length = end-begin;
		char* buf = (char*)a.Alloc(length+1);
		memcpy(buf, begin, length);
		buf[length] = 0;
		return buf;
	}
	static void Get(char* buf, uint bufLen)
	{
		const char* begin=0, *end=0;
		Get(begin, end);
		uint length = end-begin;
		eiASSERT( length+1 <= bufLen );
		memcpy(buf, begin, length);
		buf[length] = 0;
	}
	static const char* Get()
	{
		static const char* value = 0;
		if( !value )
		{
			static char buffer[256];
			Get(buffer, 256);
			//todo - memory fence
			value = buffer;
		}
		return value;
	}
};

template<class T>
struct TypeHash
{
	static u32 Get_()
	{
		const char* begin;
		const char* end;
		TypeName<T>::Get(begin, end);
		return Fnv32a(begin, end);
	}
	static u32 Get()
	{
		static const u32 value = Get_();
		return value;
	}
};

If you're talking about file I/O and format compatibility, you absolutely want manual IDs.

^^ This. Everything based on __func__, typeid, etc, is compiler dependent. You upgrade your compiler, or port to another platform, and all your type ID's will change.

SeanMiddleditch is 100% correct. If you want to be able to load saved game states (eg: save games), you need to define the Id's in such a way that they can never change. The only way to do this is to assign them yourself.

Generating a hash from the name of the type seems like a good idea, but what if you ever decide to change the name of your class? For example if you decide that "LightComponent" should be "LightingComponent"?

Another thing you will need to consider is hash collisions. They are rare, but they can happen. Check out the hash collision tests done in the following stackexchange post: http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

They found 7 collisions in the english dictionary using the DJB2 algorithm you're considering:

DJB2 collisions

  • hetairas collides with mentioner
  • heliotropes collides with neurospora
  • depravement collides with serafins
  • stylist collides with subgenera
  • joyful collides with synaphea
  • redescribed collides with urites
  • dram collides with vivency
[size="2"]Currently working on an open world survival RPG - For info check out my Development blog:[size="2"] ByteWrangler

Isn't this the entire point of type_index, so that type_index's made from different typeid's to the same type do compare equal? From my understanding the real issue with type_info's being different for different types becomes relevant when were talking same types used from different dll's

That's right, the entire point of type_index is to make it work when it "shouldn't" work but that is what is intended. Means that two thing that are the same aren't the same, but the container still sees them as equal, so it somehow "works".

DLLs are not something C++ cares about (and many argue that they're "broken" because of mangling anyway). Though of course they might be the exact reason why typeid is deliberately specified in such a needlessly obnoxious way. I wouldn't know. But even so, that would be "mostly harmless" since nobody would expect types in any haphazard DLL possibly written by someone else to be identical with the ones in your program. How should that be possible.

This topic is closed to new replies.

Advertisement