avoiding virtual functions

Started by
7 comments, last by Cedric Perthuis 17 years, 2 months ago
I was surfing the web and I ended up on that thread talking about virtual functions http://www.gamedev.net/community/forums/topic.asp?topic_id=419087 The original poster was trying to avoid the usage of virtual function calls. half of the people who replied , obviously know only a little about the cost of virtual function calls, and what it represents on a game console for instance. so in the end no one provided any solution. I am not sure it's unsolvable, I am actually pretty sure they are few things to try for a game developer for the function case, it's not for C++ evangelists, just for pragmatic people :) here's the problem: it would be nice to be able to give such a header to the rest of the world:

class Renderer
{
public: 
	void render();
int resourceA;
};
while internally you implement your class with this header

class Win32Renderer
{
public: 
	void render();

private:
	void internalRender();
	int resourceA;
	int resourceB;
};
Since this is a compile time only thing, this is completly unjustified to use virtual functions which are good for choosing the implementation at runtime. So what does virtual function really cost ? A virtual function call is "just" one more indirection, the "this" pointer contains the address of the virtual function table specific of the derived class. the problem is all about that "one" more indirection. First it costs one more assembly instruction, it increases your code size, and slows down your engine. Then on game console where cost reduction has always been important, the instruction caches are usually pretty small. and hitting an instruction which is not in the cache is having an instruction taking about 50x to 100x what it would have taken if it was already in the cache. we usually have in the order of 128kb of fast instruction cache. it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions. ok, now does it matter that much ? how many function call do we have in a frame ? let's do a quick computation with some real "next gen" game figures. 1 frame = 40 000 shader program parameter changes we must have a scene graph too, right, probably about 5000 nodes, we easily need to do 15 000 virtual functions calls to recurse into that tree, etc.. 100 cycles times 50 000 = 5 millions cycles = 1.6ms at 3Ghz and we have a 16ms frame at 60Hz, so we would have lost 3FPS if the proc was the bottleneck,which is really often the case on PC. So the virtual function call cost is a big problem for game developers, it is something to try to avoid for all the high frequency functions. On big games, compile time can easily exceed the hour, there are some great advantage to separate the internals from the interfaces, to avoid to force the 20 programmers to recompile their entire tree when they update their tree, or just the developer of the internal library, which doesn't want to rebuild his entire source tree each time he makes a modification in a header. So using interfaces is a big advantage too, but it conflicts with our virtual function call problem. So I was thinking that an idea to try with extreme care in some very specific cases, would be to create those 2 versions of the headers, a public only one, and the regular one ( public and private ). the lib user could compile with the public one only. The link step will still be correct. In most case I think it will just work. It won't work if you have data that you want to keep only in private header, if the user is able to create or copy the c++ objects. For a renderer class, this is easy to solve, the c++ object can be created by a static factory provided by the lib. If the user can really copy and call new/delete on those objects, there's no other solution than having in the public header a "char reserved[size_of_private_data];" field. C++ is just a tool. in the end the game is executed by a microprocessor which won't judge your style, but will be the result of your organisation, coordination and programming skills :)
Advertisement
Quote:
Much has been made about the overhead of virtual functions. A single virtual function in a class requires that every class object and derived objects contain an extra pointer. For simple classes, this can as much as double the size of the object. Creating virtual objects costs more than creating non-virtual objects, because the virtual function table must be initialized. And it takes slightly longer to call virtual functions, because of the additional level of indirection.

However, when you really need virtual functions, you're unlikely to develop a faster mechanism than the virtual function dispatching built into C++. Think about it. If you needed to track an object's type, you'd have to store a type flag in the class, adding to the size of the object. To duplicate virtual functions, you'd probably have a switch statement on the flag, and the virtual function table is faster - not to mention less error prone and much more flexible - than a switch statement.


It is just like anything else in programming, there is nothing wrong with virtual functions, just the way some people use them. You can misuse anything if you try hard enough. There are times in which virutal functions are the only way to go and other times when they would be are really bad idea. The trick is to learn which each time is.

Everyone keeps talking about cost reduction, but forget there is more then one type of cost reduction. There is computer resource cost reduction (CPU time and memory) and then there is coding resource cost reduction. It does not matter if you make the most atomic functions ever if you never get done with the program you are making.

Most of a functions time cost is not going to be in the virtual function costs but in the function itself. Good algorithums will save a lot more computer resources then getting rid of virutal functions will.

theTroll
Quote:I am not sure it's unsolvable, I am actually pretty sure they are few things to try for a game developer for the function case, it's not for C++ evangelists, just for pragmatic people :)


The pragmatic people would just use the wonderful private keyword and just provide the later definition, as they realize trying to fight the language to implement this is a waste of time. Differing sizes, altering the most basic of properties used in pointer arithmetic (and thus by extension anything having to do with arrays) just make this insanely difficult whilst providing no benifit of note (since changes in size would need to recompile all said pointer math, allocation, etc anyways). It boils down to a repetition of what ToohrVyk was saying, coupled with the fact that you will introduce bugs and coder mantinence overhead maintaining this DRY violation, which is exactly what the slated goal is to get away from (as long compile times are annoying).

If you must, you can hide the entire class decleration and just have free functions operating on the class/structure in question as the public interface:

//Interfaceclass Renderer;void render( Renderer* , Object* );Renderer* create( ... );void destroy( Renderer* );//Implementationclass Renderer {    ...};void render( Renderer* r , Object* o ) {    ...}


Client code can only interact via pointers, and cannot do direct array based addressing, but it solves the slated "problem". Of course, add a nice wrapper with forwarding functions and you've realized you just manually arrived at the PIMPL pattern they'd described in the thread you've linked (which does not necessitate virtual functions, if you only have one implementation per compiled target).

If you only want to hide the functions (thus allowing stack instantiation and other things like that), you can be a bastard about it like so:
//Interfaceclass Renderer {    struct Helpers;    friend struct Helpers;public:    ...private:    ...};//Implementationstruct Renderer::Helpers {    static void foo( Renderer* , ... );    static void bar( Renderer* , ... );    static void baz( Renderer* , ... );};
Quote:If you really must for reasons not listed (tm), you can hide the entire class decleration and just have free functions operating on the class/structure in question as the public interface


it's a good idea. the only little disadvantage, is that this is one additional function call, which is difficult to inline if you keep the class definition in a .h and the implementation in a .cpp ( if I am correct only visual can do it at link time, not sure ). but it's probably not as bad as the virtual funtion for the instuction cache.

the problem really was not to know whether we like or not to disclose the private functions, the problem was, knowing we don't want to show the private functions, is that possible to deal with it.

Quote:Original post by Cedric Perthuis
it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions.


It's easy to conceive, but wrong in this situation. There is one additional indirection here, which consists in jumping into a vtable and fetching an address there. In themselves, vtables are pretty small and only a few different ones exist.

In short, if you call a small group of virtual functions 5000 times in a row, the processor would probably cache-miss on the first access to each vtable, cache it, and hit the cache for the others. So, the more you call your virtual functions, the more they behave like normal ones. Of course, if you flush the cache often enough for vtable access to be performance-adverse, you've got bigger problems on your hands than virtual function overhead (that is, the cache flushing itself).

On the other hand, if you do need runtime polymorphism, you'll have to pay for it (either as virtual functions, or as your own in-house approach, which is usually a bad starting point), and if you don't need runtime polymorphism, you shouldn't be using virtual functions at all.

Quote:Original post by Cedric Perthuis
it's a good idea. the only little disadvantage, is that this is one additional function call, which is difficult to inline if you keep the class definition in a .h and the implementation in a .cpp ( if I am correct only visual can do it at link time, not sure ). but it's probably not as bad as the virtual funtion for the instuction cache.


If you choose not to provide implementation details in the header file, then you will always have this problem: the definition will not be available for inlining. So, if your intent is to hide a high-performance tool this way, too bad.

Note that additional function calls made inside the implementation can be inlined: it's only the interface functions that cannot be.

And now, for something completely different: a variation on the pimpl idiom, with exactly zero strategy-specific implementation details in the interface.

// renderer_facade.hpp// No private strategy detailstemplate <class RenderingStrategy>class RendererFacade{  RenderingStrategy::RendererImpl impl;public:  void Render()    {     impl.Render();   }  Texture<RenderingStrategy> Load(const std::string& file)   {    return impl.Load(file);  }};template <class RenderingStrategy>class TextureFacade{  RenderingStrategy::TextureImpl impl;public:  // And so on};// GLRenderer.hpp// Private details for GL strategy only#include "renderer_facade.hpp"class GLStrategy {public:  // Renderer implementation  class RendererImpl  {    // Implementation data here  public:    void Render();    Texture<GLStrategy> Load(const std::string& file);  };  // Texture implementation  class TextureImpl  {    // Other members  };};// renderer.hpp#ifdef OPENGL  #include "GLRenderer.hpp"  typedef GLStrategy RendererStrategy;#elif DIRECTX  #include "DXRenderer.hpp"  typedef DXStrategy RendererStrategy;#endiftypedef RendererFacade<RendererStrategy> Renderer;typedef TextureFacade<RendererStrategy> Texture;


The benefits of this are:
  1. Anything that would have been inlined in a straightforward implementation will be inlined here (including all the wrapper calls) on any decent optimizer.
  2. The interface is split into two files: a strategy-agnostic main file, which merely describes the interface (your public header file), and a strategy-specific helper file that discusses actual data members (your private header file).
  3. It's automatic: the size of the Renderer class is adjusted by the compiler, its constructors, destructors and assignment operators are safely forwarded, and you can alter the private implementation without having to refer to the standard every few seconds.
  4. You can add new renderers without having to modify the facade class: you only have to alter the compiler switch in renderer.hpp to accomodate a new typedef.
  5. You can ensure consistency: an OpenGL Renderer cannot be mistakenly given a DirectX Texture to work on.
  6. You have an interface-implementation separation for each strategy, so changing a strategy doesn't force you to recompile everything.


I usually refer to C++ purists as C++ victims: people who've been bitten in the ass by C++ so many times that they can actually smell a bad idea when they see one. Your quote:

Quote:So I was thinking that an idea to try with extreme care in some very specific cases, would be to create those 2 versions of the headers, a public only one, and the regular one ( public and private ). the lib user could compile with the public one only. The link step will still be correct. In most cases I think it will just work.


is enough to send shivers down my spine, my chair, the floor in my room, plus one or two additional floors in my apartment building all down to the basement, and keep'em there for an hour or two. Knowing that when this kind of contraption blows up (and it always does), you'll be out in the cold, naked (without a compiler/debugger/linker/standard to help you out).
Quote:Original post by ToohrVyk
And now, for something completely different: a variation on the pimpl idiom, with exactly zero strategy-specific implementation details in the interface.

*** Source Snippet Removed ***

The benefits of this are:
  1. Anything that would have been inlined in a straightforward implementation will be inlined here (including all the wrapper calls) on any decent optimizer.
  2. The interface is split into two files: a strategy-agnostic main file, which merely describes the interface (your public header file), and a strategy-specific helper file that discusses actual data members (your private header file).
  3. It's automatic: the size of the Renderer class is adjusted by the compiler, its constructors, destructors and assignment operators are safely forwarded, and you can alter the private implementation without having to refer to the standard every few seconds.
  4. You can add new renderers without having to modify the facade class: you only have to alter the compiler switch in renderer.hpp to accomodate a new typedef.
  5. You can ensure consistency: an OpenGL Renderer cannot be mistakenly given a DirectX Texture to work on.
  6. You have an interface-implementation separation for each strategy, so changing a strategy doesn't force you to recompile everything.



That, sir, is brilliant. :D I love these simple yet powerful ideas. I'd also like to point out the way that renderer.hpp is set up to isolate the conditional compilation logic. This is encapsulation of a kind not normally discussed, and is also where the "ensuring consistency" property comes from. :)
Quote:Original post by ToohrVyk
Quote:Original post by Cedric Perthuis
it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions.


It's easy to conceive, but wrong in this situation. There is one additional indirection here, which consists in jumping into a vtable and fetching an address there. In themselves, vtables are pretty small and only a few different ones exist.

In short, if you call a small group of virtual functions 5000 times in a row, the processor would probably cache-miss on the first access to each vtable, cache it, and hit the cache for the others. So, the more you call your virtual functions, the more they behave like normal ones. Of course, if you flush the cache often enough for vtable access to be performance-adverse, you've got bigger problems on your hands than virtual function overhead (that is, the cache flushing itself).


:) of course the 5000 same functions are not called 5000 times in a row, in the case of rendering objects there can be easily few thousands instructions in between. you jump somewhere else in the code, you increase the risk of hitting a cache line used in the code of the caller. Sure instruction cache misses performance hits aren't as big of an issue on PCs which have huge and extremely performant caches, still on gaming console this is a problem. it's greatly aggravated by virtual functions calls. having linear codes greatly helps, in the range of few ms / frame. It's difficult to quantify it without an example for sure.

Quote:Original post by ToohrVyk
I usually refer to C++ purists as C++ victims: people who've been bitten in the ass by C++ so many times that they can actually smell a bad idea when they see one. Your quote is enough to send shivers down my spine, my chair, the floor in my room, plus one or two additional floors in my apartment building all down to the basement, and keep'em there for an hour or two. Knowing that when this kind of contraption blows up (and it always does), you'll be out in the cold, naked (without a compiler/debugger/linker/standard to help you out).


very true, as I said it depends on the case, with a private constructor and a factory, there's no way to get wrong.

still your suggestion is nice, at least there is one true solution. thanks to the loophole of the language :), it reminds me of a hack used to call private functions legally.

[Edited by - Cedric Perthuis on January 30, 2007 2:42:48 AM]
Hi ToohrVyk,
You should understand that while "last-gen" consoles had a piddly, measly I-Cache size of 16kb, after 6 years of technological breakthroughs our monster-sized "Next-Gen" I-cache is a whopping 32kb (shared between 2 hardware threads, just to show it is a really poor improvement). And the cache-line is quadrupled from 32 bytes to 128. So grabbing a few bytes here and there randomly, like accesses into vtables, or even when the vtable pointer itself may be unfortunately positioned... it can be really bad.

There just isn't enough wayism in the cache to cover the wayivity of tendrils thrown out from virtual functions.

Still, toohrvyk, I have myself written almost exactly your code into games that have shipped on three platforms! I felt really guilty at the time, it looks like a huge hack... But the client just includes your header, and doesn't even realize it's getting the platform-specific version rather than just an interface. And just for debugging purposes, a flip in the typedef gives the "reference version" platform-agnostic C code to test against the 3000 lines of vector assembly version. I'm glad someone else thought of the same solution to a weird problem.

[Edited by - ajas95 on January 30, 2007 2:22:15 AM]
On PS2 we used to relocate the vtables of our scenegraph objects to a fast uncached memory segment. it was saving for us several precious ms on our scenegraph traversal, just because we were avoiding cache misses later. we shipped this code in several dozens of games..., unfortunately there's no such uncached fast memory on "next-gen" machines...
I was a little bit shocked to see people giving a hard time to the inital poster in the other thread just for bringing up the issue :)
I'll definitely use the template suggestion if I have the occasion.

This topic is closed to new replies.

Advertisement