[OOP] Object information get lost (dynamic_cast / instanceof / is)

Started by
18 comments, last by Alberth 5 years, 7 months ago

Imagine a manufacturer producing candy. He puts the candy in a package and sends it to his customer through the local postal service. After some days the package arrives and the customer has to decide what to do with the arbritary contents of the package without knowing what is inside. This is exactly my problem:

I have this OOP problem where a "delivery service" delivers objects from a source to a consumer - multithreaded with a delay. The objects are buffered.
However, the delivery service has no class information about the stuff being delivered so the consumer has no idea what he is getting. However, this is crucial information in order to decide what to do with the delivered items, i.e. where to store them.

I was told that stuff like dynamic_cast (C++), instanceof (Java) or "is" (C#) is in general very bad OOP but I don't know how to solve this problem another way. The first thing that came into my mind was using Generics but I think this is impossible because giving the package generic information is one thing but retaining those information when the package is processed in the central delivery system is another. Dynamic casts seem to be the best solution but I am hesitating because of what I was taught from day one.

Any thoughts?

Advertisement
5 minutes ago, IceCave said:

I was told that stuff like dynamic_cast (C++), instanceof (Java) or "is" (C#) is in general very bad OOP but I don't know how to solve this problem another way. Dynamic casts seem to be the best solution but I am hesitating because of what I was taught from day one.

It's not that you should be looking for a different solution... If the best solution is those tools, then it's the problem that's wrong.

Tell us more about the problem. Why are you in this situation where you have one system telling another "Here is that [UNKNOWN] that you needed!"?

To go back to your candy example though... Orders have context. I order a package, I get an order number. The warehouse gets an order number. I keep it in my email. The warehouse prints it and puts it on the package. When I get the package, I can check it against my emails to tell which order it is before I've opened it.

This is just a callback. Typically, when a user requests a callback, they also provide a bit of "userdata" along with the request, which you return to them in your eventual response, so they can identify the response and/or carry along any kind of contextual data that they will require when processing the response.

Thanks for your answer, to your question: that is complicated...

I am programming a framework to help working with a multithreaded environment and that includes sending data (objects) between threads. Normally those objects are then put in a container for further execution on this specific thread and their "execute()" method is called on them in order to process whatever they want to do on this specific thread.

Basically it goes like this: Object O#42 wants to execute stuff on Thread T#42. The framework then manages the whole process of introducing O#42 to T#42 and T#42 then calls execute() - "Hey you are now on this thread, now do what you wanted to do here".

So far so good but I would like to have the option to program custom containers in order to do more advanced stuff then just "execute()". Right now it works like this that in the execute() method the object removes itself from the standard container and readds itself to a known target container (candy adds itself to candyshop_container).
This works! Now why change a running system? Security!

I noticed that this system is very unstable from a static coding perspective. The developer might very well accidentially (or by intention) tell the framework to call an objects "execute()" on the thread candyshop_container is located at but then - in the execute() method - add the candy to another container located on another thread. The framework was developed in order to avoid such concurrency.
I figured it is best to hide the whole "delivery" process from the developer and not give him any control about the threads data flow. Which brings us back to the mentioned problem:

1 hour ago, Hodgman said:

"Here is that [UNKNOWN] that you needed!"?

My framework only knows to deliver [object] to [container] on the thread that is associated with [container].

The solution could be to let the container make a dynamic cast in order to find out what object was delivered.

1 hour ago, IceCave said:

I was told that stuff like dynamic_cast (C++), instanceof (Java) or "is" (C#) is in general very bad OOP

Well, it's generally considered as a bad practice. It means that the consumer of the object has to know what kind of object it receives, which generally translate to bad design of the system.

However if it is really necessary then I don't think there's other way. For example, if what you get from the producer is an instance of Object in Java, then there is no useful way to use this instance rather than to cast it to something else first. Anyway, you should be able to let the consumer know before hand what kind of type it is, rather than just keep casting until non-null value is get.

Anyway it is good to reconsider the design first. Maybe there's a good way to pass the instance around with the type information instead of pass it as an Object instance.

http://9tawan.net/en/

1 hour ago, IceCave said:

Basically it goes like this: Object O#42 wants to execute stuff on Thread T#42. The framework then manages the whole process of introducing O#42 to T#42 and T#42 then calls execute() - "Hey you are now on this thread, now do what you wanted to do here".

So far so good but I would like to have the option to program custom containers in order to do more advanced stuff then just "execute()".

Instead of sending O to T, with the hard coded reaction of T calling O->execute, you can send a function-object/function-pointer/lambda/delegate along too. 

e.g. In C++: Send( object, 42, [](Widget& w){ w. Execute2(); } );

Or: Send( object, 42, &Widget::Execute2 );

That's saying that this object would like to transfer ownership to this thread, and then execute this bit of code. 

A lot of people tell you not to use dynamic_cast. I'm not such a stickler for these OOPish rules and I use it on rare occasions, however supposedly dynamic_cast is slow if you care about performance.  I've tested it verses some simple cases where I could either use dynamic_cast or use a virtual function call to get the same functionally. I found that even in fairly simple cases dynamic_cast is in fact slower than the function call (which somewhat surprised me)

@Gnollrunner Considering speed before you found the proper design is typically the wrong order. A bad design is often much worse in performance than any low-level profit you gain with such trade-offs. There is a place for such low-level considerations, but it's after all the bigger things have been addressed.

 

@IceCave You could add a recipe to the object it that contains instructions for the framework what to do with the object. In that way, you can do special things while you still know what you have. The object doesn't have to know about containers, and the container doesn't have to be smart about the objects it gets.

 

8 hours ago, Gnollrunner said:

I've tested it verses some simple cases where I could either use dynamic_cast or use a virtual function call to get the same functionally. I found that even in fairly simple cases dynamic_cast is in fact slower than the function call (which somewhat surprised me)

When you need to use dynamic cast in a design it is a code smell.  It isn't necessarily wrong, but there's a high probability a better solution exists.  In the general sense every derived object should be interchangeable for the intended purpose.  Also in the general sense, there is a chance your code will never know about the actual derived class because code completely outside your control is able to create new derived classes; code should always assume they are a completely unknown concrete type rather than one of the types you know is in the code right now today.

As an example, let's say you're working on a D3D graphics system. They follow the principles rather well.  Let's say you've got the handle to your device with D3D12CreateDevice(). You get back an ID3D12Device pointer.  From that point on everything in the code you can do with an ID3D12Device pointer you can do with your object. Any device is interchangeable with any other device.  Your code should never need to use a dynamic cast to see if the underlying type is actually a GeForce 970, or an AMD R9 250, or some other specific card. If you attempted to individually handle all the cards on the market today your code would break in the future when the next generation of graphics cards is released. If you have a handle to a base class then that should be all you need, they should be interchangeable.

As for speed, virtual calls are very fast because of their indirection design. Virtual calls were implemented by following the best practice of high performance indirections, then CPU designers improved the hardware to handle them nearly for free. There is one indirection to the function's vtable, which in turn points to the function. On most modern chips that indirection will be in a reserved section of the CPU's cache since virtual functions are called so frequently. On modern chips with the out-of-order core the indirection has zero cost when they're already warmed in the CPU's cache or only have the cost of a single cache lookup (perhaps about 7 ns) the first time they're encountered.

On the other hand, looking up the type for conversion for dynamic_cast is a much more involved operation: Functions to look up the type must be called, and the type information must be loaded; the exact concrete type is tested first, followed by a test for the primary base type, both tests are relatively fast; if neither of those work, a series of operations and lookups take place along the entire inheritance tree until the result is found, or none is found, and none of that information will already be warmed up in the CPU's cache.

 

Design the base classes so you don't need to know what concrete type they are. Issue commands or queries on the interface and use the virtual functions to drive behavior. 

 

Trying to move back on topic, Hodgman's got the standard solution.  Include both the object and the function to call. The appropriate source can then call the function in the correct context.

31 minutes ago, frob said:

On the other hand, looking up the type for conversion for dynamic_cast is a much more involved operation: Functions to look up the type must be called, and the type information must be loaded; the exact concrete type is tested first, followed by a test for the primary base type, both tests are relatively fast; if neither of those work, a series of operations and lookups take place along the entire inheritance tree until the result is found, or none is found, and none of that information will already be warmed up in the CPU's cache.

One of the standard implementations of this in C++ involves doing string comparisons on the mangled type names, too. So every dynamic_cast is a tree-traversing recursive string comparing loop. That's so bad that you really should pretend that it was never added to the language. If you ever do have a legit need for using RTTI, it's often worth reinventing the wheel rather than using C++'s version! 

To be fair, those were implementations from about two decades ago. After much open mocking and derision they were fixed, and I'm not aware of any compilers in the last decade that have used string compares for dynamic casting or other RTTI operations, except for those where the standard explicitly requires the name.  All of it is implementation defined details.

The typical implementation these days have vtable entries for their own class and the first base class, so those tests are a single pointer comparison (plus the cache miss since they're likely not in the cache already). If it isn't either of those, a series of tests against parent classes which goes through each node in the hierarchy.  There are compile-time optimizations using prime number sequences that can reduce the processing time as well, and most major compilers have implemented them. Often you're talking on the order of 50-100 nanoseconds. That is a cost, but it's the kind of cost paid for any cache miss rather than the cost of enormous string operations.

Usually there are better solutions, but sometimes dynamic casts are the right solution. In those cases use the right tool for the job rather than re-inventing what the language provides.

 

This topic is closed to new replies.

Advertisement