Sign in to follow this  
Followers 0
Astrof

When to use Virtual Functions?

22 posts in this topic

Ok, so a simple Game Design Question (I thought this fit more in "general programming" than game programming): When should I use virtual functions in C++? I moved from Java (where all methods were virtual) to C++ about 6months ago and am trying to design a game. In this game the player has different weapons. Now, each weapon is a different class (extending the base Weapon class) as each weapon has different attributes, etc. So, should I use virtual functions for the GetAttack function or should I use typeid and/or dynamic casting (possibly use some sort of enum to keep track of what type of weapon it is and cast it based on that) and not use virtual functions for these methods? I've been warned to only use virtual functions in the direst of scenarios.
0

Share this post


Link to post
Share on other sites
First, you should prefer a shallow inheritance tree, or not inheritance at all in this case -- instead, you should think about weapons as a whole in terms of their traits, and then implement these commonalities with a more data-driven design.

The typical deep-hierarchy approach might be to have a base Weapon class, and from it a Sword, Bow, Flail, Spear, Crossbow, and Club classes are derived, and from these classes specific flavors of each might be further derived (ShortSword, LongSword, SwordOfPwning, etc) and this deep derivation can go on add-nauseum (SwordOfRedPwning, etc).

If we make a list of the traits that weapons have, it might look something like this:
- Power
- Range
- Defense
- Accuracy
- Damage Type
- Recover Time
- Success Rate
- Etc.

So we can see now that, at the least, classes of weapons, for instance, Swords, are all just swords with different traits -- The long sword is perhaps just a short sword which is longer and more powerful, but less accurate and with a longer recover time. Perhaps the Sword of Pwning is just a longsword with 3x Power.

We can extend this line of thinking further -- perhaps we like to think that a spear is not so much unlike a sword, after all, in reality its pretty much a short sword at the end of a long stick. Perhaps Flails and Clubs are much like Spears and Swords, except that they do Smashing damage, rather than Cutting/Stabbing damage.

Taking this line of reasoning to its logical conclusion, we might like to notice that even ranged weapons like bows and crossbows can be thought of in terms analogous to swords and spears -- Range becomes the range of the projectile (rather than of the weapon itself), Defense drops to little or nothing (as a bow isn't able to act in a defensive manner), Recover Time becomes reload time, Damage Type becomes Piercing.

If you adopt this final version, no inheritance is needed at all. If you adopt some middle-ground, only one layer of inheritance is necessary -- Thinking in broad categories, the primary weapon classes might be Sword, Melee and Ranged. Which version works for you and your game is up to you, but certainly a shallow inheritance tree is far less unwieldy than one which is very deep. Further, if you rely on inheritance to make your weapons distinct, you will likely come across situations in which multiple inheritance could be applied -- and then you will be faced with duplicating functionality, or opening up the can of worms that is multiple inheritance.


I think that addresses the "design issue" aspect of your question -- as to the question of when to use virtual functions, the answer is easy: You use them when they are the correct tool to implement the design. Virtual functions are used to delegate responsibility to a concrete implementation through a base class pointer. The base class defines an interface which derived classes may (or must, depending on whether or not a default implementation is defined in the base) implement to provide behavior specific to the derived class.


Quote:
Original post by Astrof
So, should I use virtual functions for the GetAttack function or should I use typeid and/or dynamic casting (possibly use some sort of enum to keep track of what type of weapon it is and cast it based on that) and not use virtual functions for these methods? I've been warned to only use virtual functions in the direst of scenarios.


I made the top of this post, and then came back and edited to address this point.

Certainly, you should avoid virtual functions where possible because they incur a performance penalty and also can make debugging, and simply getting your head around your code, more difficult. To this extent, the advice to avoid virtual functions is sound.

However, the avoidance of virtual functions is no excuse to implement your own system for ad-hoc polymorphism. Virtual functions are the standard and built-in method for accomplishing what you would be doing with all your casting.

If the shoe fits, wear it -- There's no reason to role your own when a nice pair of sneakers has already been provided.
0

Share this post


Link to post
Share on other sites
Note that implementing a faster/leaner virtual function object system in C++ is possible, but (A) it isn't worth it in most every case, and (B) it isn't easy to get right.

Quote:
typeid and/or dynamic casting (possibly use some sort of enum to keep track of what type of weapon it is and cast it based on that)

Don't do that. Dynamic casting and the like is something you should do when you have to do doubly virtual dispatch, and other annoying corner-cases.

Either go with data-driven, or if you prefer, make GetAttack() virtual.

The costs to make a method virtual are not that high, unless you are doing something on a per-pixel and per-frame basis.
0

Share this post


Link to post
Share on other sites
Wow, thanks, that explains the situation a lot better. Let me clarify my post though, the game I'm making is a 2d scrolling shooter thing (like 1945 or Raiden, etc, first games are usually some 2d cliche game like spaceshooters ;) ). So I'll have a few weapon types (Laser, etc) and will not know at runtime what the player will use.


However, if I understand your suggestion, should I have like, one Weapon class and maybe different functions that set the attributes of the weapon? (Like:
Weapon laser=Weapon::Laser(//attributes go here);

This actually does make a bit more sense. However, what happens if I need to specify some function behavior of each weapon? (IE lasers act differently than missiles or homing shots) so would I make just these functions virtual (if I really needed the function)?

The only reason I ask about virtual functions is in Java (more like in the AP classes) they shoved polymorphism down our throats (everything extended everything, even though some classes weren't even really used on their own; they were just one of many abstract classes used to show class hierarchies).


And hmm, so virtual functions are used with pointers...damn. I try to use pointers only when needed (as I frequently forget to use delete).



EDIT: just saw your post NotAYakk, I think the function could be called on a per frame basis (I mean I (will) check for collision every frame, and if it did collide then I return the attack).

I think I might have to use a virtual function as the attack of the weapon really depends on the type and level of the weapon. I can either use a virtual function of have an epic switch statement and use enums.
0

Share this post


Link to post
Share on other sites
Quote:
Don't do that. Dynamic casting and the like is something you should do when you have to do doubly virtual dispatch, and other annoying corner-cases.

Doing double dispatch with dynamic_cast is just plain silly.
The visitor pattern exists for a reason.

As for when to use virtual functions, the answer should be obvious when you're using OOP.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Astrof
I [...] will not know at runtime what the player will use.

You mean compile time, right?

Quote:
Original post by Astrof
And hmm, so virtual functions are used with pointers...damn.

Just as they are in Java. The difference is that pointers are implicit in Java, and they are called "references" instead.

Weapon x = new Sword(); // x is a pointer to a Weapon, here it points to a Sword



Quote:
Original post by Astrof
I try to use pointers only when needed (as I frequently forget to use delete).

boost::shared_ptr to the rescue!
0

Share this post


Link to post
Share on other sites
yes sorry I did mean compile time.

hmm I'm a little bit weary of using too many libraries, but I'll look into the boost shared ptr.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Astrof
EDIT: just saw your post NotAYakk, I think the function could be called on a per frame basis (I mean I (will) check for collision every frame, and if it did collide then I return the attack).

Something that gets called once per frame is generally not a problem, actually. It's the stuff in your inner loop that gets called hundreds or thousands of times per frame that will most affect your performance (typically, but profiling always has the final say on this). It's extremely unlikely that this particular function is going to have any significant effect on your performance, so worry about doing it correctly, not doing it the way that saves you a couple of cycles of CPU. The overhead of using virtual functions is actually quite small (basically a couple of pointer dereferences per call) so in the vast majority of cases it's better to use it than to reimplement it (probably poorly) yourself. Of course, that's if you need it at all as Ravyne pointed out, but if you do need virtual behavior then virtual functions should be your first stop, not your last.

Quote:
I can either use a virtual function of have an epic switch statement and use enums.

The latter is generally considered poor design and it will bite you at some point in the future when (for example) you add a weapon type and forget to add it to the switch. That way lies madness.:-)

Note that I speak from experience on this point, unfortunately.;-)
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Astrof
hmm I'm a little bit weary of using too many libraries, but I'll look into the boost shared ptr.

1) You shouldn't be.

2) boost is something like the playground for the next official C++ libraries, shared_ptr will be part of C++0x.
0

Share this post


Link to post
Share on other sites
shared_ptr is already available in visual studio 2008 SP1.You include <memory> and use it.It's in tr1 namespace if I'm not mistaken.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Astrof

However, if I understand your suggestion, should I have like, one Weapon class and maybe different functions that set the attributes of the weapon? (Like:
Weapon laser=Weapon::Laser(//attributes go here);

This actually does make a bit more sense. However, what happens if I need to specify some function behavior of each weapon? (IE lasers act differently than missiles or homing shots) so would I make just these functions virtual (if I really needed the function)?

I'm not sure if this was explicitly addressed or not, but your attribute... attribution (?) should go on at a higher level, i.e. you decide that in your game, laser type X does n damage and has a cool-down rate of k seconds, and so on. Game logic-wise, you might have a file or something that contains weapon definitions (and I really would advise doing it that way) and at runtime you basically just do something like
playerWeapon.Damage = n;
playerWeapon.cooldown = k;

Then just look that stuff up when you need it. Effects might be a slightly different beast, (I would argue that polymorphism is actually the right tool for the job in this case) but only if different effect types *behave* radically different. A good example of doing this right would probably be having different base classes set up for particle emitters, such as spray/cones, bursts/clouds, lines and the like. Then you just store the number of particles and do particle position calculations, etc. in a virtual function. You theoretically could have a non-virtual draw function that just sends the particle positions to your renderer, too. The moral of the story is that if data gets handled similarly, it's usually worth it to keep things concrete and give polymorphism/inheritance a miss.

This is also a case study for why I strongly dislike Java... or at least the way it's commonly taught. As you yourself mentioned, they really inflate the importance of what should really only be corner-case solutions to problems in the guise of "make sure you know how this works!"
0

Share this post


Link to post
Share on other sites
I think the most attractive place to use virtual member functions is when using an interface concept to write more general code. This can reduce the amount of code that needs to be written, and consequentially can reduce bugs and increase efficiency.

However, there's also some loss of performance when using virtual functions and inheritance, so I would avoid them for doing anything that's computationally intensive and could be a bottleneck.

For example, you should not make a pure virtual Matrix class with different implementations for sparse, static, dynamic, symmetric, etc.

However, it might be a reasonable decision to make a virtual class to interface a component with different sources of input that all have completely different implementations, if it's not performance critical.

Any time you can use virtual methods, there are alternatives you could also use which are probably more efficient which you should also consider (eg, templates). The advantage you get with virtual methods is that in some cases the code can be simpler to write and the compiler can give more intuitive error messages to help in debugging.
0

Share this post


Link to post
Share on other sites
Note that the "and" was meant logically. Virtual function overhead requires something done on a per-frame and per-pixel basis. Ie, if on every frame, you call the virtual function once per pixel, then you are starting to look at the the point where you should worry about virtual function overhead.

Before that (or other similar cases), it isn't a concern.

...

No, don't do epic switch statements. Virtual functions, virtual factories (storing a pointer to/reference to/name of the weapon type factory so you can do weapon upgrades), etc -- both are far better options.

...

The standard visitor pattern pulls off reverse single-dispatch, not double dispatch, barring manually writing a compile-time switch in the base class via function overriding.

Ie:

struct B;
struct C;
struct D;
struct E;
struct A {
virtual void StartIt( A &other ) = 0; // child must call { other.DoIt(*this); }
virtual void DoIt( A & ) = 0;
virtual void DoIt( B & b ); // calls DoIt( A & );
virtual void DoIt( C & c ); // calls DoIt( B & );
virtual void DoIt( D & d ); // calls DoIt( A & );
virtual void DoIt( E & e ); // calls DoIt( A & ); // oops, a bug
};

struct B: A {
virtual void StartIt( A & other ) { other.DoIt( *this ); }
virtual void DoIt( A & ) { cout << "B does it with A\n"; }
virtual void DoIt( D & d ) { cout << "B does it with D\n"; }
};

struct C: B {
...
};

struct D: A {
...
};

// note: bug in class A, where it forwards DoIt(E &) to DoIt(A &) instead of
// DoIt(D &). The downside of this pattern is that you have to maintain the
// class hierarchy in two different spots, and an error in either causes random
// behavior...
struct E: D {
...
};




Note that the above source block is not anything anyone in "For Beginners" should be using. (And anyone else should be careful about using -- the above example is pretty pathological).

[Edited by - NotAYakk on February 18, 2009 10:53:39 AM]
0

Share this post


Link to post
Share on other sites
Quote:
Original post by NotAYakk
Note that implementing a faster/leaner virtual function object system in C++ is possible, but (A) it isn't worth it in most every case, and (B) it isn't easy to get right.

Do you now mean a virtual-function object system or a virtual function-object system? Anyways, I can guess what you mean.

Just out of curiosity, do you have any evidence for that? Because I have a hard time to believe that one can program a system for virtual functions that is faster and leaner and at the same time equally mighty. At least not inside valid C++. But see the following.

Quote:
Original post by NotAYakk
Virtual function overhead requires something done on a per-frame and per-pixel basis. Ie, if on every frame, you call the virtual function once per pixel, then you are starting to look at the the point where you should worry about virtual function overhead.

Before that (or other similar cases), it isn't a concern.

...


Calling virtual functions on per pixel per frame basis doesn't mean anything. Actually, virtual functions are not that bad compared to ordinary, non-inlined calls, when they actually contain code. For example, if you do some dot-products, square roots, Reinhard operators, and the usual stuff one does at per-pixel level, then the cost gets neglibigle.


Let's ask Dr.GCC about the cost of virtual function calls


Preparation
#include <cstdio>

#ifdef __GNUC__
#define MARK(x) do{asm volatile ("# "#x);}while(0)
#define NO_OPTIMIZE_AWAY() do{asm("");}while(0)
#else
#error "dies"
#endif

struct Interface {
virtual int fun() const = 0;
virtual int fun2 (int, int) const = 0;
virtual float dot3 (const float *, const float *) const = 0;
};

struct Foo : public Interface {
int const ff;
Foo (int ff) : ff (ff) {}

int fun() const {
return ff;
}

int fun2 (int, int) const {
return ff;
}

float dot3 (const float *lhs, const float *rhs) const {
return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];
}
};

struct Oof : public Foo {
int const ff;
Oof (int ff) : Foo (0), ff (ff) {}

int fun() const {
return ff;
}
};

struct Bar {
int const ff;
Bar (int ff) : ff (ff) {}
int fun() const { return ff; }
int fun2 (int, int) const {
return ff;
}
float dot3 (const float *lhs, const float *rhs) const {
return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];
}
};


Tests
int main () {
int entropy;
scanf ("", &entropy);
Interface &foo = *new Foo(entropy);
MARK("Foo starts here ... ");
entropy = foo.fun();
MARK("... and here it ends.");
printf ("", entropy);
delete &foo;

scanf ("", &entropy);
Interface &oof = *new Oof(entropy);
MARK("Oof starts here ... ");
entropy = oof.fun();
MARK("... and here it ends.");
printf ("", entropy);
delete &oof;

scanf ("", &entropy);
Bar &bar = *new Bar(entropy);
MARK("Bar starts here ... ");
entropy = bar.fun();
MARK("... and here it ends.");
printf ("", entropy);
delete &bar;
}



Results



# call to Foo::fun()
movl -12(%ebp), %eax
movl (%eax), %edx
movl -12(%ebp), %eax
movl %eax, (%esp)
movl (%edx), %eax
call *%eax
movl %eax, -8(%ebp)

# call to Oof::fun()
movl -16(%ebp), %eax
movl (%eax), %edx
movl -16(%ebp), %eax
movl %eax, (%esp)
movl (%edx), %eax
call *%eax
movl %eax, -8(%ebp)

# call to Bar::fun()
movl -20(%ebp), %eax
movl %eax, (%esp)
call __ZNK3Bar3funEv
movl %eax, -8(%ebp)


We see that the op-count grew from 4 to 7 (increased by a factor of 1.75). The costs for the pure call will partially vanish with an increasing number of operands.


Adding Parameters


Let's add another test-case (you can copy this into the function main() if you're trying yourself):
*snip*
int main () {
*snip*
int entropy2, entropy3;
scanf ("", &entropy, &entropy2, &entropy3);
Interface &foo2 = *new Foo(entropy);
MARK("Foo2 starts here ... ");
entropy = foo2.fun2 (entropy2,entropy3);
MARK("... and here it ends.");
printf ("", entropy, entropy2, entropy3);
delete &foo2;

scanf ("", &entropy, &entropy2, &entropy3);
Bar &bar2 = *new Bar(entropy);
MARK("Bar2 starts here ... ");
entropy = bar2.fun2 (entropy2,entropy3);
MARK("... and here it ends.");
printf ("", entropy, entropy2, entropy3);
delete &bar2;
}

The results for this:

# Foo::fun2(int,int)
movl -32(%ebp), %eax
movl (%eax), %edx
addl $4, %edx
movl -28(%ebp), %eax
movl %eax, 8(%esp)
movl -24(%ebp), %eax
movl %eax, 4(%esp)
movl -32(%ebp), %eax
movl %eax, (%esp)
movl (%edx), %eax
call *%eax
movl %eax, -8(%ebp)
# Bar::fun2(int,int)
movl -28(%ebp), %eax
movl %eax, 8(%esp)
movl -24(%ebp), %eax
movl %eax, 4(%esp)
movl -36(%ebp), %eax
movl %eax, (%esp)
call __ZNK3Bar4fun2Eii
movl %eax, -8(%ebp)

This time, it increased by 1.5 (12 ops virtual, 8 ops non-virtual).


Adding code at the target site


This was just the call! Now we'll add upp the code of the functions themselves. The function definitions look all the same, so I just show one of them:

__ZNK3Bar4fun2Eii:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl (%eax), %eax
popl %ebp
ret


We see that to the call-ops, we have to add 6 more ops, so our previous growage of 1.75 decreases to (7+6) / (4+6) = 1.3, and 1.5 decreaes to (12+6) / (8+6) = approx. 1.29.


Adding actual code to the target site


Now let's have a look at the assembly produced for our dot-product-functions. Remember, they looked as simple as:
float dot3 (const float *lhs, const float *rhs) const {
return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];
}


The assembly for all versions again look the same, so here is just one of them:

pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
movl 16(%ebp), %edx
flds (%eax)
fmuls (%edx)
movl 12(%ebp), %eax
addl $4, %eax
movl 16(%ebp), %edx
addl $4, %edx
flds (%eax)
fmuls (%edx)
faddp %st, %st(1)
movl 12(%ebp), %eax
addl $8, %eax
movl 16(%ebp), %edx
addl $8, %edx
flds (%eax)
fmuls (%edx)
faddp %st, %st(1)
popl %ebp
ret


And we find 22 instructions, for a simple dot-product. Our additional cost is now:
  • (12+22) / (8+22) = approx. 1.1333

So instead of 40 frames per second, you get "only" 36, but you get many benefits. Also, just add some more operations, and the relative overhead further decreases. And the blanket statement "virtual functions at per pixel per frame level considered harmful" (not sic!) vanishes.

We learn today (how pathetic :D) that the cost of calling a virtual function compared to a non-inlined non-virtual function vanishes with the number of arguments you pass to the function, the complexity of what is inside the function and with the number of return values (in C/C++ either 0 or 1)



Quote:
No, don't do epic switch statements.

But sometimes, only sometimes, epic switch statements are the right thing. For example if you write a performant software based virtual machine, with many trivial micro operations (e.g. add or mul), then you want jump tables. And jump tables you generally only get with epic switch statements, or by relying on pointers-to-labels (GCC has support for them, but they are not standard).


[Edited by - phresnel on February 19, 2009 6:53:32 AM]
0

Share this post


Link to post
Share on other sites
call *%eax

<snip>

call __ZNK3Bar4fun2Eii


The first causes (often) a pipeline stall. The second never does. Please try to *profile* your speed tests before making any statements. You're not working on 386's, 486's or Pentiums, your CPU is not trivial anymore.
0

Share this post


Link to post
Share on other sites
Quote:
So instead of 40 frames per second, you get "only" 36, but you get many benefits. Also, just add some more operations, and the relative overhead further decreases. And the blanket statement "virtual functions at per pixel per frame level considered harmful" (not sic!) vanishes.


I'd say it's still a bad plan to call virtual functions per pixel. The main reason for getting a big performance gain by replacing a virtual function with a non-virtual one that the compiler can inline the non-virtual function call. Inlining can give huge performance improvements, especially for short functions that are called from inside loops. In the best case it could work out that the function result is actually loop invariant and hoist it outside the loop.

In most cases virtual functions won't ever be inlined, although some compilers may do it in limited circumstances (e.g. when there's no inheritance actually happening).

For those reasons I'd say that for a big function being virtual or not is unlikely to make a noticeable performance difference, but for small ones avoid it where you can, especially where the function is called thousands of times a frame.
0

Share this post


Link to post
Share on other sites
so...as long as I don't spam virtual functions everywhere I should be fine?
(Sorry, I did not mean to start a war, I was just wondering). I guess I'll have to take more care in planning my classes.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by dascandy
call *%eax

<snip>

call __ZNK3Bar4fun2Eii


The first causes (often) a pipeline stall. The second never does. Please try to *profile* your speed tests before making any statements. You're not working on 386's, 486's or Pentiums, your CPU is not trivial anymore.


They key concepts here are the Branch History Table (as in "your CPU is not trivial anymore" [sic!]) on modern CPU and the fact that the advantage of non-virtual functions decreases with increasing complexity of the function itself.

You are right, I should have used a profiler, but unfortunately, I don't have one handy on this box. Hence I wrote the following benchmark (mostly Std-C++ compatible, except for the asm() macro and the __attributes() for the sake of benchmarking).


Benchmark

I tried to be as careful as possible, if I have done something bogus, please let me know. So, here it is:

#ifdef __GNUC__
#define MARK(x) do{asm volatile ("# "#x);}while(0)
#define NO_OPTIMIZE_AWAY() do{asm("");}while(0)
#else
#error "dies"
#endif

#include <ctime>
#include <iostream>
#include <cmath>

struct StaticNoInline {
float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) {
float const diff [] = {
rhs [0] - lhs [0],
rhs [1] - lhs [1],
rhs [2] - lhs [2],
};
float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];
return sqrt (lenSq);
}
};

struct StaticForceInline {
float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((forceinline)) {
float const diff [] = {
rhs [0] - lhs [0],
rhs [1] - lhs [1],
rhs [2] - lhs [2],
};
float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];
return sqrt (lenSq);
}
};

struct IVirtual {
virtual float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) = 0;
};

struct Virtual : public IVirtual {
float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) {
float const diff [] = {
rhs [0] - lhs [0],
rhs [1] - lhs [1],
rhs [2] - lhs [2],
};
float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];
return sqrt (lenSq);
}
};

template <typename T, typename I, uint32_t count>
void test () {
static volatile float dist;
static volatile float lhs [3];
static volatile float rhs [3];

::std::cout << "Beginning test for " << typeid (T).name() << ", count is " << count << '\n';
I &subject = *new T ();
clock_t const beginT = clock();
MARK("entering loop");
for (uint32_t i=0; i<count; ++i) {
dist = subject.dist (lhs, rhs);
}
MARK("left loop");
clock_t const endT = clock();
delete &subject;
::std::cout << "Test ended, total time " << endT - beginT << "msec.\n";
}

int main () {
const uint32_t count = 1<<27;
test<Virtual, IVirtual, count> ();
test<StaticNoInline, StaticNoInline, count> ();
test<StaticForceInline, StaticForceInline, count> ();
// Re-execute to take care of CPU heat and priority switches
test<Virtual, IVirtual, count> ();
test<StaticNoInline, StaticNoInline, count> ();
test<StaticForceInline, StaticForceInline, count> ();
::std::cout << ::std::flush;
}



Results

I get the following results:

  Beginning test for 7Virtual, count is 134217728
Test ended, total time 5828msec.
Beginning test for 14StaticNoInline, count is 134217728
Test ended, total time 5839msec.
Beginning test for 17StaticForceInline, count is 134217728
Test ended, total time 5118msec.
Beginning test for 7Virtual, count is 134217728
Test ended, total time 5798msec.
Beginning test for 14StaticNoInline, count is 134217728
Test ended, total time 5809msec.
Beginning test for 17StaticForceInline, count is 134217728
Test ended, total time 5097msec.
clock() is not too exact (especially on Windows(R)), so I ran it for over 5 seconds (134,217,728 loops). In more readable form, the results are:
  • virtual: 5.828 sec, 5.798 sec = 5.813 sec on average
  • static, not inlined: 5.839 sec, 5.809 sec = 5.824 sec on average
  • static, inlined: 5.118 sec, 5.097 sec = 5.1075 sec on average
The static and the virtual variant run at nearly the same speed (roughly 5.8185 sec on average), the inlined version runs in only roughly 87.78 % of the time.


Upshot

Certainly, inlining helps with a bunch of other optimizations, like auto-vectorisation or re-use of data, plus it increases locality, but has the disadvantage of potential code-bloat. As this is a game-dev site, the latter might be negligible :D .

At the time where the compiler decides to not inline a function, the performance difference between virtual and non-virtual decreases (w.r.t. several parameters and configurations), especially in tight loops. So if it would be a great benefit for your application to use virtual functions, then you might decide to use them.


Example

Just to give an example (blatant self promotion, sorry), my ray tracer picogen uses virtual functions for one of its shader systems (basically an executable abstract syntax tree; the performance drop is not too big as the shader is only called for definitive correct intersections, where in the meanwhile many other operations happened), plus it uses virtual functions for its intersectable objects (which include a quadtree, a BIH, a kd-tree oops that was the ray tracer I wrote before that, sorry, spheres of course, a cube, a simple DDA based heightmap, and some hacks where I tried things out). The general BIH is an aggregate that can house any other type of Intersectable, including specific and general BIH. Also, I implemented a Whitted-style Ray Tracer and a Path Tracer (called "Surface Integrators", like pbrt calls it), which definitely are called for each pixel, even if there is no intersection at all.

Without virtual functions, this all would have been far clumsier, and not necessarily faster. Had I organized those objects in multiple lists, code could have even run slower than with virtual functions.


Quote:
Astrof
so...as long as I don't spam virtual functions everywhere I should be fine?
(Sorry, I did not mean to start a war, I was just wondering). I guess I'll have to take more care in planning my classes.


No war at all :D
Maybe my above post could help you. IMHO, the answer is yes, as long as you don't use them excessively, it should be fine.

[Edited by - phresnel on February 19, 2009 10:38:18 AM]
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Adam_42

I'd say it's still a bad plan to call virtual functions per pixel. The main reason for getting a big performance gain by replacing a virtual function with a non-virtual one that the compiler can inline the non-virtual function call. Inlining can give huge performance improvements, especially for short functions that are called from inside loops. In the best case it could work out that the function result is actually loop invariant and hoist it outside the loop.


No, none of that is relevant, since rule of a thumb applies elsewhere. Virtual functions add some overhead in some cases, and none in others. But it's not the virtual function mechanisms that cause the real performance impact.

Run-time polymorphism is about choice of code path being unknown at compile-time.

If your code looks like this:
for (...) {
switch (actions[i]) {
case 0: pixels[i] = 0; break;
case 1: pixels[i] = 1; break;
}
}
or
for (...) {
pixels[i]->doAction();
}
Makes minor difference. Obviously the virtual function implementation will determine the exact overhead, but in either case, the choice needs to be made at run time.

Switch may be slightly more efficient simply because it carries more contextual information. Compiler is hence given more information on how to optimize the code, which may not be possible when using polymorphic function. It would know, for example, that there are only 2 values vs. arbitrary pointer in case of polymorphism.

If it talks like a duck, walks like a duck, .... then it's a duck. Both of the above examples are run-time polymorphic behavior.


Consider non-polymorphic behavior:
// in some handler:
void onClick(int x, int y) {
color_to_set[y*h+x] = 1;
}

// in loop
for (...) {
pixels[i] = color_to_set[i];
}
memset(color_to_set, 0, n);
or even just
memcpy(pixels, color_to_set, n);
memset(color_to_set, 0, n);


Here, we move the choice outside of main loop, and simply presume that color_to_set can be completely pre-calculated elsewhere. Although it may seem no choice is involved, it is actually performed by the UI handler and OS, which chooses receives mouse input, finds the component, translates mouse coordinates into screen coordinates, etc....


There is *no* criteria that would allow to generalize whether virtual function has performance impact. Implementations of virtual dispatch can be compared, but as soon as you need polymorphic behavior you pay the cost of run-time decision.

The only criteria is: Can I make the decision at compile-time?.
Both, switch statement and virtual function cannot.

Switch statements trade off control over branching for ease of use. They are a perfectly valid alternative to language provided virtual call.

Alternative implementation of virtual dispatch that is useful if decision space is small (common networking idiom):
struct Foo {
Foo(bool as_local) : is_local(as_local) {}
void bar() {
is_local ? local_bar() : remote_bar();
}
private:
void local_bar();
void remote_bar();

bool is_local;
};
In this case we provide compiler with several very string hints/guarantees.
- all types are concrete
- compiler knows that it's an either/or choice.
- if is_local is provided at compile-time (Foo foo(true);), then compiler can often deduce entire code-path.

But again - this is not because virtual is slow. It is because we told the compiler exactly what can happen, how and why, rather than simply declaring a virtual, which is basically: "yea, whatever".
0

Share this post


Link to post
Share on other sites
Please note: "start to worry" doesn't mean "ban it and refuse to use it".

It means that _short of_ doing something on a per-pixel and per-frame basis (with reasonable assumptions about the ratio between processor power and display resolution), worrying about virtual function overhead is probably silly.

Similarly, worrying about branch overhead before you are at a per-pixel and per-frame basis is also pretty likely to be silly.

Virtual functions are run-time polymorphic behaviour. And most other solutions that cause run-time polymorphic behaviour are equally costly. The issue is that the decision about which run-time polymorphic behaviour you will be using might be decided at a higher level of abstraction than what your virtual function call allows, and it is difficult for your compiler to detect that.

So if you are doing something on a per-pixel per-frame basis, then you _might_ find that virtual function overhead (or branch overhead) is causing a problem. If that is the case, then you can look into the alternatives to using a virtual function.

...

A manually implemented virtual dispatch system for objects can be leaner/faster and not as mighty, simply because you are working on a far more limited problem domain. As an easy example, throwing out multiple and virtual inheritance from the object model could simplify your problems.

You might be able to build a singleton virtual class system based off of shared libraries. Depending on your runtime environment (and shared library loader), this might result in a lower at-call overhead. (This is a very limited set of circumstances -- it works in theory, but I cannot think of a concrete example. The idea is that your singleton class is described as set of functions to a library. When the library is loaded, the placeholder calls into the library in the loading binary are edited at run-time to point to the actual methods. This is virtual, because which shared library you load is not determined until run time. And yes, this is pretty pathological. But I think most shared library systems involve changing a jump table instead of every call into the shared library in the loading binary. And then the overhead reduction is pretty minimal (but it can help with pipeline stalls)).

I've heard that MS's MFC dynamic dispatch system was partially due to virtual function overhead. I suspect that this, if true, was an artifact of C++ being a young language at the time.

On the other side of the coin, C++ is missing run-time generated classes and/or per-instance classes. If you implement your own object system, this can fall out pretty cheaply. (Being able to dynamically decide that one particular method in a particular class and/or instance changes from one mode to another can be powerful, and efficient, yet very hairy.)

As noted, these cases are hard, often difficult to implement in any sane way. You don't want to do these things usually.

:)

And yes, huge switch statements can be used in some corner cases. Often it can be a good idea to write a tool that writes the switch statement for you, and have the tool read in input that is less pathological. Last I looked at them, bison/flex (or yacc/lex) used this strategy.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by NotAYakk
On the other side of the coin, C++ is missing run-time generated classes and/or per-instance classes. If you implement your own object system, this can fall out pretty cheaply. (Being able to dynamically decide that one particular method in a particular class and/or instance changes from one mode to another can be powerful, and efficient, yet very hairy.)


The easy way is using the State pattern, or the Strategy pattern. Both are found in GoF.

The middle way is by replacing some functions with std::function objects and to compose them into new functions (std::function is like boost::function).

The hard way is using techniques like mocking frameworks, that can replace a function altogether without any residual overhead over the original virtual function. Try Hippo Mocks.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
If your code looks like this:
for (...) {
switch (actions[i]) {
case 0: pixels[i] = 0; break;
case 1: pixels[i] = 1; break;
}
}
....


But you should also consider if you can simplify the logic to eliminate branching, such as


pixels[i] = actions[i];


Of course, I assume you didn't mean for your example to reduce so trivially. But real-world cases often reduce more easily than initially expected, with only a little additional thought. :)
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0