When to use Virtual Functions?

Started by
21 comments, last by Zahlman 15 years, 1 month ago
but if I wanted to make it cross platform I'd have to use boost right?
Advertisement
Quote:Original post by Astrof

However, if I understand your suggestion, should I have like, one Weapon class and maybe different functions that set the attributes of the weapon? (Like:
Weapon laser=Weapon::Laser(//attributes go here);

This actually does make a bit more sense. However, what happens if I need to specify some function behavior of each weapon? (IE lasers act differently than missiles or homing shots) so would I make just these functions virtual (if I really needed the function)?

I'm not sure if this was explicitly addressed or not, but your attribute... attribution (?) should go on at a higher level, i.e. you decide that in your game, laser type X does n damage and has a cool-down rate of k seconds, and so on. Game logic-wise, you might have a file or something that contains weapon definitions (and I really would advise doing it that way) and at runtime you basically just do something like
playerWeapon.Damage = n;playerWeapon.cooldown = k;

Then just look that stuff up when you need it. Effects might be a slightly different beast, (I would argue that polymorphism is actually the right tool for the job in this case) but only if different effect types *behave* radically different. A good example of doing this right would probably be having different base classes set up for particle emitters, such as spray/cones, bursts/clouds, lines and the like. Then you just store the number of particles and do particle position calculations, etc. in a virtual function. You theoretically could have a non-virtual draw function that just sends the particle positions to your renderer, too. The moral of the story is that if data gets handled similarly, it's usually worth it to keep things concrete and give polymorphism/inheritance a miss.

This is also a case study for why I strongly dislike Java... or at least the way it's commonly taught. As you yourself mentioned, they really inflate the importance of what should really only be corner-case solutions to problems in the guise of "make sure you know how this works!"
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.
I think the most attractive place to use virtual member functions is when using an interface concept to write more general code. This can reduce the amount of code that needs to be written, and consequentially can reduce bugs and increase efficiency.

However, there's also some loss of performance when using virtual functions and inheritance, so I would avoid them for doing anything that's computationally intensive and could be a bottleneck.

For example, you should not make a pure virtual Matrix class with different implementations for sparse, static, dynamic, symmetric, etc.

However, it might be a reasonable decision to make a virtual class to interface a component with different sources of input that all have completely different implementations, if it's not performance critical.

Any time you can use virtual methods, there are alternatives you could also use which are probably more efficient which you should also consider (eg, templates). The advantage you get with virtual methods is that in some cases the code can be simpler to write and the compiler can give more intuitive error messages to help in debugging.
Note that the "and" was meant logically. Virtual function overhead requires something done on a per-frame and per-pixel basis. Ie, if on every frame, you call the virtual function once per pixel, then you are starting to look at the the point where you should worry about virtual function overhead.

Before that (or other similar cases), it isn't a concern.

...

No, don't do epic switch statements. Virtual functions, virtual factories (storing a pointer to/reference to/name of the weapon type factory so you can do weapon upgrades), etc -- both are far better options.

...

The standard visitor pattern pulls off reverse single-dispatch, not double dispatch, barring manually writing a compile-time switch in the base class via function overriding.

Ie:
struct B;struct C;struct D;struct E;struct A {  virtual void StartIt( A &other ) = 0; // child must call { other.DoIt(*this); }  virtual void DoIt( A & ) = 0;  virtual void DoIt( B & b ); // calls DoIt( A  & );  virtual void DoIt( C & c ); // calls DoIt( B  & );  virtual void DoIt( D & d ); // calls DoIt( A  & );  virtual void DoIt( E & e ); // calls DoIt( A  & ); // oops, a bug};struct B: A {  virtual void StartIt( A & other ) { other.DoIt( *this ); }  virtual void DoIt( A & ) { cout << "B does it with A\n"; }  virtual void DoIt( D & d ) { cout << "B does it with D\n"; }};struct C: B {...};struct D: A {...};// note: bug in class A, where it forwards DoIt(E &) to DoIt(A &) instead of// DoIt(D &).  The downside of this pattern is that you have to maintain the// class hierarchy in two different spots, and an error in either causes random// behavior...struct E: D {...};


Note that the above source block is not anything anyone in "For Beginners" should be using. (And anyone else should be careful about using -- the above example is pretty pathological).

[Edited by - NotAYakk on February 18, 2009 10:53:39 AM]
Quote:Original post by NotAYakk
Note that implementing a faster/leaner virtual function object system in C++ is possible, but (A) it isn't worth it in most every case, and (B) it isn't easy to get right.

Do you now mean a virtual-function object system or a virtual function-object system? Anyways, I can guess what you mean.

Just out of curiosity, do you have any evidence for that? Because I have a hard time to believe that one can program a system for virtual functions that is faster and leaner and at the same time equally mighty. At least not inside valid C++. But see the following.

Quote:Original post by NotAYakk
Virtual function overhead requires something done on a per-frame and per-pixel basis. Ie, if on every frame, you call the virtual function once per pixel, then you are starting to look at the the point where you should worry about virtual function overhead.

Before that (or other similar cases), it isn't a concern.

...


Calling virtual functions on per pixel per frame basis doesn't mean anything. Actually, virtual functions are not that bad compared to ordinary, non-inlined calls, when they actually contain code. For example, if you do some dot-products, square roots, Reinhard operators, and the usual stuff one does at per-pixel level, then the cost gets neglibigle.


Let's ask Dr.GCC about the cost of virtual function calls


Preparation
#include <cstdio>#ifdef __GNUC__#define MARK(x) do{asm volatile ("# "#x);}while(0)#define NO_OPTIMIZE_AWAY() do{asm("");}while(0)#else#error "dies"#endifstruct Interface {    virtual int fun() const = 0;    virtual int fun2 (int, int) const = 0;    virtual float dot3 (const float *, const float *) const = 0;};struct Foo : public Interface {    int const ff;    Foo (int ff) : ff (ff) {}    int fun() const {        return ff;    }    int fun2 (int, int) const {        return ff;    }    float dot3 (const float *lhs, const float *rhs) const {        return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];    }};struct Oof : public Foo {    int const ff;    Oof (int ff) : Foo (0), ff (ff) {}    int fun() const {        return ff;    }};struct Bar {    int const ff;    Bar (int ff) : ff (ff) {}    int fun() const { return ff; }    int fun2 (int, int) const {        return ff;    }    float dot3 (const float *lhs, const float *rhs) const {        return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];    }};


Tests
int main () {    int entropy;    scanf ("", &entropy);    Interface &foo = *new Foo(entropy);    MARK("Foo starts here ... ");    entropy = foo.fun();    MARK("... and here it ends.");    printf ("", entropy);    delete &foo;    scanf ("", &entropy);    Interface &oof = *new Oof(entropy);    MARK("Oof starts here ... ");    entropy = oof.fun();    MARK("... and here it ends.");    printf ("", entropy);    delete &oof;    scanf ("", &entropy);    Bar &bar = *new Bar(entropy);    MARK("Bar starts here ... ");    entropy = bar.fun();    MARK("... and here it ends.");    printf ("", entropy);    delete &bar;}



Results


# call to Foo::fun()	movl	-12(%ebp), %eax	movl	(%eax), %edx	movl	-12(%ebp), %eax	movl	%eax, (%esp)	movl	(%edx), %eax	call	*%eax	movl	%eax, -8(%ebp)# call to Oof::fun()	movl	-16(%ebp), %eax	movl	(%eax), %edx	movl	-16(%ebp), %eax	movl	%eax, (%esp)	movl	(%edx), %eax	call	*%eax	movl	%eax, -8(%ebp)# call to Bar::fun()	movl	-20(%ebp), %eax	movl	%eax, (%esp)	call	__ZNK3Bar3funEv	movl	%eax, -8(%ebp)


We see that the op-count grew from 4 to 7 (increased by a factor of 1.75). The costs for the pure call will partially vanish with an increasing number of operands.


Adding Parameters


Let's add another test-case (you can copy this into the function main() if you're trying yourself):
*snip*int main () {    *snip*    int entropy2, entropy3;    scanf ("", &entropy, &entropy2, &entropy3);    Interface &foo2 = *new Foo(entropy);    MARK("Foo2 starts here ... ");    entropy = foo2.fun2 (entropy2,entropy3);    MARK("... and here it ends.");    printf ("", entropy, entropy2, entropy3);    delete &foo2;    scanf ("", &entropy, &entropy2, &entropy3);    Bar &bar2 = *new Bar(entropy);    MARK("Bar2 starts here ... ");    entropy = bar2.fun2 (entropy2,entropy3);    MARK("... and here it ends.");    printf ("", entropy, entropy2, entropy3);    delete &bar2;}

The results for this:
# Foo::fun2(int,int)	movl	-32(%ebp), %eax	movl	(%eax), %edx	addl	$4, %edx	movl	-28(%ebp), %eax	movl	%eax, 8(%esp)	movl	-24(%ebp), %eax	movl	%eax, 4(%esp)	movl	-32(%ebp), %eax	movl	%eax, (%esp)	movl	(%edx), %eax	call	*%eax	movl	%eax, -8(%ebp)# Bar::fun2(int,int)	movl	-28(%ebp), %eax	movl	%eax, 8(%esp)	movl	-24(%ebp), %eax	movl	%eax, 4(%esp)	movl	-36(%ebp), %eax	movl	%eax, (%esp)	call	__ZNK3Bar4fun2Eii	movl	%eax, -8(%ebp)

This time, it increased by 1.5 (12 ops virtual, 8 ops non-virtual).


Adding code at the target site


This was just the call! Now we'll add upp the code of the functions themselves. The function definitions look all the same, so I just show one of them:
__ZNK3Bar4fun2Eii:	pushl	%ebp	movl	%esp, %ebp	movl	8(%ebp), %eax	movl	(%eax), %eax	popl	%ebp	ret


We see that to the call-ops, we have to add 6 more ops, so our previous growage of 1.75 decreases to (7+6) / (4+6) = 1.3, and 1.5 decreaes to (12+6) / (8+6) = approx. 1.29.


Adding actual code to the target site


Now let's have a look at the assembly produced for our dot-product-functions. Remember, they looked as simple as:
float dot3 (const float *lhs, const float *rhs) const {        return lhs[0]*rhs[0] + lhs[1]*rhs[1] + lhs[2]*rhs[2];}


The assembly for all versions again look the same, so here is just one of them:
	pushl	%ebp	movl	%esp, %ebp	movl	12(%ebp), %eax	movl	16(%ebp), %edx	flds	(%eax)	fmuls	(%edx)	movl	12(%ebp), %eax	addl	$4, %eax	movl	16(%ebp), %edx	addl	$4, %edx	flds	(%eax)	fmuls	(%edx)	faddp	%st, %st(1)	movl	12(%ebp), %eax	addl	$8, %eax	movl	16(%ebp), %edx	addl	$8, %edx	flds	(%eax)	fmuls	(%edx)	faddp	%st, %st(1)	popl	%ebp	ret


And we find 22 instructions, for a simple dot-product. Our additional cost is now:
  • (12+22) / (8+22) = approx. 1.1333

So instead of 40 frames per second, you get "only" 36, but you get many benefits. Also, just add some more operations, and the relative overhead further decreases. And the blanket statement "virtual functions at per pixel per frame level considered harmful" (not sic!) vanishes.

We learn today (how pathetic :D) that the cost of calling a virtual function compared to a non-inlined non-virtual function vanishes with the number of arguments you pass to the function, the complexity of what is inside the function and with the number of return values (in C/C++ either 0 or 1)



Quote:No, don't do epic switch statements.

But sometimes, only sometimes, epic switch statements are the right thing. For example if you write a performant software based virtual machine, with many trivial micro operations (e.g. add or mul), then you want jump tables. And jump tables you generally only get with epic switch statements, or by relying on pointers-to-labels (GCC has support for them, but they are not standard).


[Edited by - phresnel on February 19, 2009 6:53:32 AM]
call *%eax

<snip>

call __ZNK3Bar4fun2Eii


The first causes (often) a pipeline stall. The second never does. Please try to *profile* your speed tests before making any statements. You're not working on 386's, 486's or Pentiums, your CPU is not trivial anymore.
Quote:So instead of 40 frames per second, you get "only" 36, but you get many benefits. Also, just add some more operations, and the relative overhead further decreases. And the blanket statement "virtual functions at per pixel per frame level considered harmful" (not sic!) vanishes.


I'd say it's still a bad plan to call virtual functions per pixel. The main reason for getting a big performance gain by replacing a virtual function with a non-virtual one that the compiler can inline the non-virtual function call. Inlining can give huge performance improvements, especially for short functions that are called from inside loops. In the best case it could work out that the function result is actually loop invariant and hoist it outside the loop.

In most cases virtual functions won't ever be inlined, although some compilers may do it in limited circumstances (e.g. when there's no inheritance actually happening).

For those reasons I'd say that for a big function being virtual or not is unlikely to make a noticeable performance difference, but for small ones avoid it where you can, especially where the function is called thousands of times a frame.
so...as long as I don't spam virtual functions everywhere I should be fine?
(Sorry, I did not mean to start a war, I was just wondering). I guess I'll have to take more care in planning my classes.
Quote:Original post by dascandy
call *%eax

<snip>

call __ZNK3Bar4fun2Eii


The first causes (often) a pipeline stall. The second never does. Please try to *profile* your speed tests before making any statements. You're not working on 386's, 486's or Pentiums, your CPU is not trivial anymore.


They key concepts here are the Branch History Table (as in "your CPU is not trivial anymore" [sic!]) on modern CPU and the fact that the advantage of non-virtual functions decreases with increasing complexity of the function itself.

You are right, I should have used a profiler, but unfortunately, I don't have one handy on this box. Hence I wrote the following benchmark (mostly Std-C++ compatible, except for the asm() macro and the __attributes() for the sake of benchmarking).


Benchmark

I tried to be as careful as possible, if I have done something bogus, please let me know. So, here it is:

#ifdef __GNUC__#define MARK(x) do{asm volatile ("# "#x);}while(0)#define NO_OPTIMIZE_AWAY() do{asm("");}while(0)#else#error "dies"#endif#include <ctime>#include <iostream>#include <cmath>struct StaticNoInline {    float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) {        float const diff [] = {            rhs [0] - lhs [0],            rhs [1] - lhs [1],            rhs [2] - lhs [2],        };        float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];        return sqrt (lenSq);    }};struct StaticForceInline {    float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((forceinline)) {        float const diff [] = {            rhs [0] - lhs [0],            rhs [1] - lhs [1],            rhs [2] - lhs [2],        };        float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];        return sqrt (lenSq);    }};struct IVirtual {    virtual float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) = 0;};struct Virtual : public IVirtual {    float dist (float const volatile *lhs, float const volatile *rhs) __attribute__((noinline)) {        float const diff [] = {            rhs [0] - lhs [0],            rhs [1] - lhs [1],            rhs [2] - lhs [2],        };        float const lenSq = diff [0]*diff [0] + diff [1]*diff [1] + diff [2]*diff [2];        return sqrt (lenSq);    }};template <typename T, typename I, uint32_t count>void test () {    static volatile float dist;    static volatile float lhs [3];    static volatile float rhs [3];    ::std::cout << "Beginning test for " << typeid (T).name() << ", count is " << count << '\n';    I &subject = *new T ();    clock_t const beginT = clock();    MARK("entering loop");    for (uint32_t i=0; i<count; ++i) {        dist = subject.dist (lhs, rhs);    }    MARK("left loop");    clock_t const endT = clock();    delete &subject;    ::std::cout << "Test ended, total time " << endT - beginT << "msec.\n";}int main () {    const uint32_t count = 1<<27;    test<Virtual, IVirtual, count> ();    test<StaticNoInline, StaticNoInline, count> ();    test<StaticForceInline, StaticForceInline, count> ();    // Re-execute to take care of CPU heat and priority switches    test<Virtual, IVirtual, count> ();    test<StaticNoInline, StaticNoInline, count> ();    test<StaticForceInline, StaticForceInline, count> ();    ::std::cout << ::std::flush;}



Results

I get the following results:

  Beginning test for 7Virtual, count is 134217728  Test ended, total time 5828msec.  Beginning test for 14StaticNoInline, count is 134217728  Test ended, total time 5839msec.  Beginning test for 17StaticForceInline, count is 134217728  Test ended, total time 5118msec.  Beginning test for 7Virtual, count is 134217728  Test ended, total time 5798msec.  Beginning test for 14StaticNoInline, count is 134217728  Test ended, total time 5809msec.  Beginning test for 17StaticForceInline, count is 134217728  Test ended, total time 5097msec.
clock() is not too exact (especially on Windows(R)), so I ran it for over 5 seconds (134,217,728 loops). In more readable form, the results are:
  • virtual: 5.828 sec, 5.798 sec = 5.813 sec on average
  • static, not inlined: 5.839 sec, 5.809 sec = 5.824 sec on average
  • static, inlined: 5.118 sec, 5.097 sec = 5.1075 sec on average
The static and the virtual variant run at nearly the same speed (roughly 5.8185 sec on average), the inlined version runs in only roughly 87.78 % of the time.


Upshot

Certainly, inlining helps with a bunch of other optimizations, like auto-vectorisation or re-use of data, plus it increases locality, but has the disadvantage of potential code-bloat. As this is a game-dev site, the latter might be negligible :D .

At the time where the compiler decides to not inline a function, the performance difference between virtual and non-virtual decreases (w.r.t. several parameters and configurations), especially in tight loops. So if it would be a great benefit for your application to use virtual functions, then you might decide to use them.


Example

Just to give an example (blatant self promotion, sorry), my ray tracer picogen uses virtual functions for one of its shader systems (basically an executable abstract syntax tree; the performance drop is not too big as the shader is only called for definitive correct intersections, where in the meanwhile many other operations happened), plus it uses virtual functions for its intersectable objects (which include a quadtree, a BIH, a kd-tree oops that was the ray tracer I wrote before that, sorry, spheres of course, a cube, a simple DDA based heightmap, and some hacks where I tried things out). The general BIH is an aggregate that can house any other type of Intersectable, including specific and general BIH. Also, I implemented a Whitted-style Ray Tracer and a Path Tracer (called "Surface Integrators", like pbrt calls it), which definitely are called for each pixel, even if there is no intersection at all.

Without virtual functions, this all would have been far clumsier, and not necessarily faster. Had I organized those objects in multiple lists, code could have even run slower than with virtual functions.


Quote:Astrof
so...as long as I don't spam virtual functions everywhere I should be fine?
(Sorry, I did not mean to start a war, I was just wondering). I guess I'll have to take more care in planning my classes.


No war at all :D
Maybe my above post could help you. IMHO, the answer is yes, as long as you don't use them excessively, it should be fine.

[Edited by - phresnel on February 19, 2009 10:38:18 AM]

Quote:Original post by Adam_42

I'd say it's still a bad plan to call virtual functions per pixel. The main reason for getting a big performance gain by replacing a virtual function with a non-virtual one that the compiler can inline the non-virtual function call. Inlining can give huge performance improvements, especially for short functions that are called from inside loops. In the best case it could work out that the function result is actually loop invariant and hoist it outside the loop.


No, none of that is relevant, since rule of a thumb applies elsewhere. Virtual functions add some overhead in some cases, and none in others. But it's not the virtual function mechanisms that cause the real performance impact.

Run-time polymorphism is about choice of code path being unknown at compile-time.

If your code looks like this:
for (...) {  switch (actions) {    case 0: pixels = 0; break;    case 1: pixels = 1; break;  }}
or
for (...) {  pixels->doAction();}
Makes minor difference. Obviously the virtual function implementation will determine the exact overhead, but in either case, the choice needs to be made at run time.

Switch may be slightly more efficient simply because it carries more contextual information. Compiler is hence given more information on how to optimize the code, which may not be possible when using polymorphic function. It would know, for example, that there are only 2 values vs. arbitrary pointer in case of polymorphism.

If it talks like a duck, walks like a duck, .... then it's a duck. Both of the above examples are run-time polymorphic behavior.


Consider non-polymorphic behavior:
// in some handler:void onClick(int x, int y) {  color_to_set[y*h+x] = 1;}// in loopfor (...) {  pixels = color_to_set;}memset(color_to_set, 0, n);
or even just
memcpy(pixels, color_to_set, n);memset(color_to_set, 0, n);


Here, we move the choice outside of main loop, and simply presume that color_to_set can be completely pre-calculated elsewhere. Although it may seem no choice is involved, it is actually performed by the UI handler and OS, which chooses receives mouse input, finds the component, translates mouse coordinates into screen coordinates, etc....


There is *no* criteria that would allow to generalize whether virtual function has performance impact. Implementations of virtual dispatch can be compared, but as soon as you need polymorphic behavior you pay the cost of run-time decision.

The only criteria is: Can I make the decision at compile-time?.
Both, switch statement and virtual function cannot.

Switch statements trade off control over branching for ease of use. They are a perfectly valid alternative to language provided virtual call.

Alternative implementation of virtual dispatch that is useful if decision space is small (common networking idiom):
struct Foo {  Foo(bool as_local) : is_local(as_local) {}  void bar() {    is_local ? local_bar() : remote_bar();      }private:  void local_bar();  void remote_bar();  bool is_local;};
In this case we provide compiler with several very string hints/guarantees.
- all types are concrete
- compiler knows that it's an either/or choice.
- if is_local is provided at compile-time (Foo foo(true);), then compiler can often deduce entire code-path.

But again - this is not because virtual is slow. It is because we told the compiler exactly what can happen, how and why, rather than simply declaring a virtual, which is basically: "yea, whatever".

This topic is closed to new replies.

Advertisement