Fast dyanamic_cast

Started by
10 comments, last by SiCrane 11 years, 2 months ago

I'm working on a project that requires dynamic_cast. I understand its quite slow and should generally be avoided, that said I see no other way. To that end I thought I could perhaps speed up dynamic_cast by caching the results. So I put together a small class that seems to work for the limited test scenarios I threw at it. None-the-less its on the borderline of kosher and was wondering what other people think.


class FastDynamicCast {

	private:
		typedef const std::type_info* Type;
		typedef std::tuple<Type,Type,Type> InterfaceId;		// actual object type, src interface, dest interface
		typedef std::map<InterfaceId,int> Table;
		Table table;

	public:
		template <typename TR, typename T> TR Cast (T* ptr) {
			static_assert(std::is_pointer<TR>::value,"TR must be a pointer type.");
			static const int flag = std::numeric_limits<int>::max();

			InterfaceId id(&typeid(*ptr),&typeid(T*),&typeid(TR));
			auto i = table.find(id);
			if (i == table.end()) {
				TR r = dynamic_cast<TR>(ptr);
				if (r) {
					unsigned char* t0 = reinterpret_cast<unsigned char*>(ptr);
					unsigned char* t1 = reinterpret_cast<unsigned char*>(r);
					int diff = static_cast<int>(t1 - t0);
					table.insert(Table::value_type(id,diff));
					}
				else {
					table.insert(Table::value_type(id,flag));
					}
				return r;
				}
			else if (i->second == flag) return nullptr;
			else {
				unsigned char* t = reinterpret_cast<unsigned char*>(ptr);
				t += i->second;
				return reinterpret_cast<TR>(t);
				}
			}
	};

Despite the fact that the underlying implementation isn't specified in the standard, for a given object, src, and dest interface the pointer transformation needs to be the same... right? I mean you can't have objects magically changing around their structure at runtime. Anyways I just thought I'd throw it out here to see what people thought and/or critique it.

Advertisement

Have you profiled this? I highly doubt that this would outperform the compiler's implementation. In particular, what makes you think that typeid() will be significantly faster than dynamic_cast?

When people say that dynamic cast is slow, they mean that it is slower then doing no work at all, which is the case of a static cast.
With this comparision, its infinitely slower, but that doesn't mean that the dynamic cast will matter in your project.

If you do any reasonable amount of work with the pointer after (doesn't even have to be that much) the cost of the dynamic cast will quickly disappear.
It's not that slow, its just much slower then doing no work :)

That said, dynamic casts are often avoidable, and the need to use them usually points to something fishy in your design.

Have you profiled this? I highly doubt that this would outperform the compiler's implementation. In particular, what makes you think that typeid() will be significantly faster than dynamic_cast?

Not just typeid(), but the cost in terms of cache performance of using a map to hold the lookups is going to bite you pretty quick.

dynamic_cast<> on "slow" compilers is really just a strcmp. If you're really hurting for performance, rolling your own RTTI is the only way to really win.


But 99% of the time, you're not hurting that bad :-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I always wondered about something related. Assuming at some Point you Need to know what class an object actually is of. Then, as far as I understand, I could try dynamic_cast<> to all possible classes. And that's O(n), if n is the number of possible classes. So having a map or hashtable to just look it up should theoretically be faster for large enough n. Is this correct or am I missing a possible way to use dynamic_cast<> efficiently for this?

Confrontation Unlimited - MMORTS - http://www.confrontation-unlimited.net

I always wondered about something related. Assuming at some Point you Need to know what class an object actually is of. Then, as far as I understand, I could try dynamic_cast<> to all possible classes. And that's O(n), if n is the number of possible classes. So having a map or hashtable to just look it up should theoretically be faster for large enough n. Is this correct or am I missing a possible way to use dynamic_cast<> efficiently for this?

Or have the object tell you its type directly.

I always wondered about something related. Assuming at some Point you Need to know what class an object actually is of. Then, as far as I understand, I could try dynamic_cast<> to all possible classes. And that's O(n), if n is the number of possible classes. So having a map or hashtable to just look it up should theoretically be faster for large enough n. Is this correct or am I missing a possible way to use dynamic_cast<> efficiently for this?

If you are really hurting by this roll your own and type checking then boils down to a pointer address compare, which you can find in Game Programming Gems 2. This relies on staticly declared pointers in your RTTI capable classes, which you have to register your self. This is all the DECLARE_CLASS(CEnemy, CMoveableObject) macros that you see in the Half-Life codebase (https://developer.valvesoftware.com/wiki/Authoring_a_Logical_Entity).

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

dynamic_cast<> on "slow" compilers is really just a strcmp.

Well it's more like a bunch of strcmps in a loop. Worst case that I know of is a linear search of the target type in a list of all the legal class names of the dynamic type.

I'm working on a project that requires dynamic_cast. I understand its quite slow and should generally be avoided, that said I see no other way.

There is almost certainly another way. In my experience, 'unavoidable' use of dynamic_cast is due to design flaws elsewhere in the system.

In your usage, do you only need exact type matches, or do you need to take inheritance hierarchies into account?

e.g.

Given the inheritance hierarchy:

C is a B
B is a A

And the code:

C derived;
A* basePtr = &derived;
B* cast = dynamic_cast<B*>( basePtr );
Using regular dynamic_cast (or a custom implementation that also takes inheritance hierarchies into account), then this cast will succeed, because the object is an A, a B and a C.

With a custom implementation based on exact matches only, then this cast would fail, because the object only identifies as a C -- the advantage of these kinds of RTTI systems is that they are a lot faster.

Well it's more like a bunch of strcmps in a loop. Worst case that I know of is a linear search of the target type in a list of all the legal class names of the dynamic type.

And this is why I ban dynamic_cast in my coding guidelines...

Imagine the language didn't have RTTI at all -- instead there was a free, cross-platform, but closed-source library that gave you easy RTTI. Despite it being closed source, people had stepped through the Asm to see how it worked, and found that on Windows it often resulted in a long loop of string comparisons.
How many users would that library have, compared to a simpler library, or compared to people just using their own extremely simple home-made RTTI systems based on enums, etc?
The only reason this horrible library does have any users at all, is because it's shipped with and integrated into the language. I believe in Keep-It-Simple-Stupid as a strong guideline, but in this case, keeping it simple for me would be to reinvent the wheel, to replace an over-complicated and bloated system with a simple one.

I think it is worth pointing out an additional item in regards to some of the suggestions that using dynamic_cast is not so bad because it won't be a notable performance problem later. This is unfortunately a very bad way to look at things, especially with this particular bit of functionality. You need to approach this from the usage side of things to see why accepting dynamic_cast performance is a bad idea. How do you usually use dynamic_cast? Well, usually you have a list of some object and you are processing them in order using dynamic_cast to perform various functionality on each object. So, in your code you might only be using dynamic_cast once in a while, the calls to it become huge and the performance issues add up. This is so common a pattern in games that I simply don't bother with standard RTTI anymore, well in any core processing systems that is.

This is a case of massive gains in the long run if you correct the problem early on. Mikes rule: "Early optimization is the root of all evil.", well ignoring the obvious is beyond evil in this case. Especially if you consider that in games the most likely usage is in entities which will be called thousands of times a game loop.

.02

This topic is closed to new replies.

Advertisement