Sign in to follow this  
Estraven

Virtual function overhead. How bad is it?

Recommended Posts

Hi all, I just wanted to ask some questions about virtual funcitons and polymorphism. How much overhead is it to lookup a function in the vtable when I make a call to a virtual function? I know this is a majorly debated issue when it comes to speed, but how bad is it really? If I have one virtual function call in my main game loop, is it really going to degrade performance that much? I'm of the opinion that I should use them, but not overuse them. What do you all think? Thanks, Dave

Share this post


Link to post
Share on other sites
Quote:
Original post by Estraven
How much overhead is it to lookup a function in the vtable when I make a call to a virtual function?

Do you mean conceptually or quantitatively? Conceptually it's the same as getting a value from an array before you make the function call. Quantitatively, I don't know, you might want to measure it yourself.
Quote:
Original post by Estraven
I know this is a majorly debated issue when it comes to speed, but how bad is it really?

It depends entirely on the context in which they're being used.
Quote:
Original post by Estraven
If I have one virtual function call in my main game loop, is it really going to degrade performance that much?

You can have a thousand virtual function calls in your main loop and you won't notice any speed difference whatsoever (on anything later than a 60 Mhz Pentium).

Share this post


Link to post
Share on other sites
Quote:
Original post by Estraven
I'm of the opinion that I should use them, but not overuse them. What do you all think?


I agree with everything that has been said. I feel that the overhead is really not a problem. Same goes with using .dlls, there's overhead, but look at all of the benefits. It makes it worth it. As for your opinion of not overusing them, that should be applied with anything, so I agree. So in short, it's not bad at all. From my experiences, using VFuncs can even speed up your code.

- Drew

Share this post


Link to post
Share on other sites
Your apps memory and usage footprint has a significant effect as well - when your app is small, or uses code in a very localized way (like runs a tight AI algorithm for 1000 loops, then a renedering algorithm, etc ... (in a nice chunked way), then the vtables it's using most of the time will always be in cache, and therefore very efficient (just a few clock cycles lost). However, like any cache miss in a performance critical section, when they are not cached, the cost is great. I have often wanted to see how modern compilers place the vtables in the EXE image to see if they attempt to improve cache coherence, or if they just go where they go with no adjustment.

Share this post


Link to post
Share on other sites
Unless your compiling for a 386 don't bother worrying about it. Seriously people get all uptight about how virtual functions are so slow but I've never had a situation where I could even tell! In my code I use virtual functions all the time. Each entity uses them for their update and render calls (so that can be upto 100 calls each tick), each gui window type uses them, as well as lots of other things. I once hard coded my game to only use a certain type of entity so that there were no virtual function calls and then timed it ... there was no difference that I could detect. I'm sure if I added maybe 1,000 to 1,000,000 entities to my game world then I'd see a difference but it saves me so much time that its definatly worth it. I'm sure I'd speed up my game way more by fixing the physics or the AI code rather than getting rid of virtual functions. How often do you worry about accessing an array item? thats about the same cost (or so I believe) :P

Share this post


Link to post
Share on other sites
Quote:
Original post by Estraven
I know this is a majorly debated issue when it comes to speed...
Always take a good look at who is debating. Most of the people who speak of "optimization" and "speed" write slow code because of their attempts at silly optimizations.

Use them where they make sense.

Share this post


Link to post
Share on other sites
Another "me too!" post, but here goes:

If the object is predominantly of one particular type, branch prediction in modern (Athlon, P3 and newer) CPUs will actually take care of it to the extent that it's practically free. Even if you often call different objects, you shouldn't have any trouble, any workarounds are going to be more expensive.

~phil

Share this post


Link to post
Share on other sites
Hey,

As stated, the effect is negligable. But do take note that there are situations where silly virtual inheritence just complicates the whole issue - it's sometimes difficult to determine where a virtual function comes from if you're inheriting from many classes which have the same virtual function pointer, and AFAIK most compilers won't give you warnings to ambiguous virtual functions, they just replace the VTable entry with the last constructed object which overloads it.

CJM

Share this post


Link to post
Share on other sites
Quote:
Original post by JonnyQuest
If the object is predominantly of one particular type, branch prediction in modern (Athlon, P3 and newer) CPUs will actually take care of it to the extent that it's practically free. Even if you often call different objects, you shouldn't have any trouble, any workarounds are going to be more expensive.
Not to mention, workarounds might actually be more expensive with new CPUs. This goes for pretty much every workaround optimization. Before you start rolling your own, remember that compilers and CPUs will continue to be developed with speeding up the standard method. By avoiding that, you may actually impede optimizations you would have recieved for free.

Share this post


Link to post
Share on other sites
unless your trying to make a real time 3d rendering libaray, operationg system or something like that i wouldn't worry about optimizing on that low of a level, on a game there are far more important thing to spend your time on than being able to rendering one extra polygone per frame

Share this post


Link to post
Share on other sites
Hi all, since we're talking about virtual methods here, I was wondering, if I have an interface that defines a virtual destructor, if I do this :

class cMyClass : public iMyInterface
{
...
}

iMyInterface *instance = new cMyClass();

delete(instance);


what's the impact of the delete statement? Does it also call the destructor of the inheriting child class first? Will there be an effect on the stack or heap?

Share this post


Link to post
Share on other sites
when you create any kind of pointer it starts out at a null value and points to nothing
the new keyword finds some free memory, points the pointer to it and reserves it for the objects exclusive use
the delete keyword does the opposite and frees the memory for other thing to use

Share this post


Link to post
Share on other sites
calling virtual functions in a tight loop rather than a normal function call can lead to a drop in permormance of as much as 25% or more.

I tested this not too long ago. I'm fairly sure that's about the right number. I do recall it being ridiculously severe.

Share this post


Link to post
Share on other sites
Quote:
Original post by Thrawn80
Hi all, since we're talking about virtual methods here, I was wondering, if I have an interface that defines a virtual destructor, if I do this :

class cMyClass : public iMyInterface
{
...
}

iMyInterface *instance = new cMyClass();

delete(instance);


what's the impact of the delete statement? Does it also call the destructor of the inheriting child class first? Will there be an effect on the stack or heap?


As long as you define the destructor of the base class with virtual, it will be called (if you forget it, only superclass destructor will be called). I did an example of this:


#include <stdio.h>

class Interface {
public:
Interface() { }
virtual ~Interface() {
printf("Delete from Interface\n");
}
};

class Subclass : public Interface {
public:
Subclass() { }
~Subclass() {

printf("Delete from Subclass\n");
}
};

int main() {

Interface* object = new Subclass;
delete object;

return 0;
}





And here is the output:


thec@5[temp]$ ./a.out
Delete from Subclass
Delete from Interface

(btw, that's logical, since you might to use some variables in the superclass that you use in the dsestructor in the object, but that's easy to say after the test, hehe)

Albert

Share this post


Link to post
Share on other sites
Just want to add that using virtual functions defeats inlining. The equivilant non-virtual function may be appropriate to inline and hence be faster.

If you're making appropriate use of virtual functions then there really arent any faster ways to get the same results though.

Just something to be aware of in the design process. And as others have said, the performance hit is negligable to the point where it will make no difference unless they're being called in long, tight loops.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ravyne
Just want to add that using virtual functions defeats inlining. The equivilant non-virtual function may be appropriate to inline and hence be faster.


Just wanted to clarify that virtual functions can be inlined if the compiler can see the type it is operating on, or if the function is called explicitly:


#include <iostream>

struct Base {
inline virtual void Call() {
std::cout << "Base::Call" << std::endl;
}
};

struct Thing : Base {
inline virtual void Call() {
std::cout << "Thing::Call" << std::endl;
}
};

struct AnotherThing : Base {
inline virtual void Call() {
Base::Call();//could be inlined as it is calling a specific function
}
};





int main() {
Thing thing;
thing.Call();//virtual function call which could be inlined
}




I've been thinking about this a bit though recently as I've been looking into templates and generative programming techniques. In the past, in my quest to be object oriented, I've made everything have a virtual interface. So my renderer is virtual, my input devices are virtual, my network connection is virtual, my game world is virtual, my map loader is virtual, etc, etc. I'm writing a general purpose engine which really doesn't need to be so dynamically inter changeable when it comes down to actually making an application.

I'm starting to think that a lot of stuff should be done with templates so that at compile time those objects and types will be tied down to specific things. Of course there are some things which are better off being dynamic but most of the things I mentioned above probably could do without.

Just a thought!

Share this post


Link to post
Share on other sites
Hello,

Consider this code:

class A
{
public:
virtual void B1();
void B2();
};

int main()
{
A a;
a.B1();
a.B2();
}


VC7 will generate something which look like this:

B1:
mov eax, [ecx+4] // get the address of the virtual B1 - ecx is this
jmp [eax] // jump ti this address

main:
mov ecx, &a // set ecx to this
call B1

mov ecx, &a // set ecx to this
call B2


As you can see, the difference between the two function call is one mov and a jump. While this can be seen as an overhead, the number of wasted cycle is still very low. The difference may be bigger is B2 is inlined (of course).

(edit: quote, edit, quote... these buttons are just too close).

HTH,

Share this post


Link to post
Share on other sites
Regarding the virtual functions are 25% slower message, this number tells nothing without knowing what you tested. If you wrote a test like this:

struct Tester {
void normalFunction() {}
virtual void virtualFunction() = 0;
};
struct DerivedTester : Tester {
void virtualFunction() {}
};

void benchmark(Tester &the_tester) {
// do your test here, if the compiler isn't very clever it
// will not generate optimized versions of this function
}



it will only tell you the relative performance between a function call and a virtual function call. Even if it is 100% slower, the entire thing comes down to such a low number of clock cycles compared to any function body you would normally write that it doesn't matter at all.

For most C++ compilers on x86, a virtual call causes an overhead of exactly two instructions, namely an 'inc'/'dec' plus a 'jmp'. As has been said, most CPUs are optimized for the typical code flow in a virtual call because it is shared by most current software. Granted, you shouldn't use virtual functions nested three levels deep if you're blitting pixels, but I've never ever seen a virtual function to be the root of evil, or incurrent any noticable impact at all, when making profiler runs on my projects.

-Markus-

Share this post


Link to post
Share on other sites
If you want to know if making a function virtual will hurt your performance, look at how many times you call it per game loop. If it is a low level operation you call it tens of thousands of times per game loop, then you might see a performance hit. Also, look at how big you function is. If it is a large function, any lost performance will be overshadowed by the slowness of the function itself. Frankly, any function that is so critical that it can't be virtual should probably be inlined if at all possible.

On the other hand, for long functions that get called once per game loop, you just wont see a difference. I have virtual functions all over my code, and there has been no significant impact on performance.

Also, use virtual where it makes sense. If you need a function to be virtual, make it virtual. Use regular functions everywhere else.

Share this post


Link to post
Share on other sites
I just depends how many times a particular function is being called. For example, in a program that I am working on I had a "core" function that was virtual that was being called thousands of times while I was parsing a massive amount of text. After profiling, I saw that this was the bottleneck. When I simply changed the function to no longer be virtual (I had to rework my class design a little bit) I saw a huge difference in performance.

Share this post


Link to post
Share on other sites
Quote:
Original post by JohnBolton
Quote:
Original post by Promit
In terms of concrete costs:

A vtable read costs one pointer dereference. Just one.


Don't forget that it also costs a cache flush/fill.


Why does it take a cache flush/fill?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this