• Advertisement
Sign in to follow this  

Unity avoiding virtual functions

This topic is 4037 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I was surfing the web and I ended up on that thread talking about virtual functions http://www.gamedev.net/community/forums/topic.asp?topic_id=419087 The original poster was trying to avoid the usage of virtual function calls. half of the people who replied , obviously know only a little about the cost of virtual function calls, and what it represents on a game console for instance. so in the end no one provided any solution. I am not sure it's unsolvable, I am actually pretty sure they are few things to try for a game developer for the function case, it's not for C++ evangelists, just for pragmatic people :) here's the problem: it would be nice to be able to give such a header to the rest of the world:
class Renderer
{
public: 
	void render();
int resourceA;
};
while internally you implement your class with this header
class Win32Renderer
{
public: 
	void render();

private:
	void internalRender();
	int resourceA;
	int resourceB;
};
Since this is a compile time only thing, this is completly unjustified to use virtual functions which are good for choosing the implementation at runtime. So what does virtual function really cost ? A virtual function call is "just" one more indirection, the "this" pointer contains the address of the virtual function table specific of the derived class. the problem is all about that "one" more indirection. First it costs one more assembly instruction, it increases your code size, and slows down your engine. Then on game console where cost reduction has always been important, the instruction caches are usually pretty small. and hitting an instruction which is not in the cache is having an instruction taking about 50x to 100x what it would have taken if it was already in the cache. we usually have in the order of 128kb of fast instruction cache. it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions. ok, now does it matter that much ? how many function call do we have in a frame ? let's do a quick computation with some real "next gen" game figures. 1 frame = 40 000 shader program parameter changes we must have a scene graph too, right, probably about 5000 nodes, we easily need to do 15 000 virtual functions calls to recurse into that tree, etc.. 100 cycles times 50 000 = 5 millions cycles = 1.6ms at 3Ghz and we have a 16ms frame at 60Hz, so we would have lost 3FPS if the proc was the bottleneck,which is really often the case on PC. So the virtual function call cost is a big problem for game developers, it is something to try to avoid for all the high frequency functions. On big games, compile time can easily exceed the hour, there are some great advantage to separate the internals from the interfaces, to avoid to force the 20 programmers to recompile their entire tree when they update their tree, or just the developer of the internal library, which doesn't want to rebuild his entire source tree each time he makes a modification in a header. So using interfaces is a big advantage too, but it conflicts with our virtual function call problem. So I was thinking that an idea to try with extreme care in some very specific cases, would be to create those 2 versions of the headers, a public only one, and the regular one ( public and private ). the lib user could compile with the public one only. The link step will still be correct. In most case I think it will just work. It won't work if you have data that you want to keep only in private header, if the user is able to create or copy the c++ objects. For a renderer class, this is easy to solve, the c++ object can be created by a static factory provided by the lib. If the user can really copy and call new/delete on those objects, there's no other solution than having in the public header a "char reserved[size_of_private_data];" field. C++ is just a tool. in the end the game is executed by a microprocessor which won't judge your style, but will be the result of your organisation, coordination and programming skills :)

Share this post


Link to post
Share on other sites
Advertisement
Quote:

Much has been made about the overhead of virtual functions. A single virtual function in a class requires that every class object and derived objects contain an extra pointer. For simple classes, this can as much as double the size of the object. Creating virtual objects costs more than creating non-virtual objects, because the virtual function table must be initialized. And it takes slightly longer to call virtual functions, because of the additional level of indirection.

However, when you really need virtual functions, you're unlikely to develop a faster mechanism than the virtual function dispatching built into C++. Think about it. If you needed to track an object's type, you'd have to store a type flag in the class, adding to the size of the object. To duplicate virtual functions, you'd probably have a switch statement on the flag, and the virtual function table is faster - not to mention less error prone and much more flexible - than a switch statement.


It is just like anything else in programming, there is nothing wrong with virtual functions, just the way some people use them. You can misuse anything if you try hard enough. There are times in which virutal functions are the only way to go and other times when they would be are really bad idea. The trick is to learn which each time is.

Everyone keeps talking about cost reduction, but forget there is more then one type of cost reduction. There is computer resource cost reduction (CPU time and memory) and then there is coding resource cost reduction. It does not matter if you make the most atomic functions ever if you never get done with the program you are making.

Most of a functions time cost is not going to be in the virtual function costs but in the function itself. Good algorithums will save a lot more computer resources then getting rid of virutal functions will.

theTroll

Share this post


Link to post
Share on other sites
Quote:
I am not sure it's unsolvable, I am actually pretty sure they are few things to try for a game developer for the function case, it's not for C++ evangelists, just for pragmatic people :)


The pragmatic people would just use the wonderful private keyword and just provide the later definition, as they realize trying to fight the language to implement this is a waste of time. Differing sizes, altering the most basic of properties used in pointer arithmetic (and thus by extension anything having to do with arrays) just make this insanely difficult whilst providing no benifit of note (since changes in size would need to recompile all said pointer math, allocation, etc anyways). It boils down to a repetition of what ToohrVyk was saying, coupled with the fact that you will introduce bugs and coder mantinence overhead maintaining this DRY violation, which is exactly what the slated goal is to get away from (as long compile times are annoying).

If you must, you can hide the entire class decleration and just have free functions operating on the class/structure in question as the public interface:

//Interface
class Renderer;
void render( Renderer* , Object* );
Renderer* create( ... );
void destroy( Renderer* );

//Implementation
class Renderer {
...
};
void render( Renderer* r , Object* o ) {
...
}


Client code can only interact via pointers, and cannot do direct array based addressing, but it solves the slated "problem". Of course, add a nice wrapper with forwarding functions and you've realized you just manually arrived at the PIMPL pattern they'd described in the thread you've linked (which does not necessitate virtual functions, if you only have one implementation per compiled target).

If you only want to hide the functions (thus allowing stack instantiation and other things like that), you can be a bastard about it like so:
//Interface
class Renderer {
struct Helpers;
friend struct Helpers;
public:
...
private:
...
};

//Implementation
struct Renderer::Helpers {
static void foo( Renderer* , ... );
static void bar( Renderer* , ... );
static void baz( Renderer* , ... );
};

Share this post


Link to post
Share on other sites
Quote:
If you really must for reasons not listed (tm), you can hide the entire class decleration and just have free functions operating on the class/structure in question as the public interface


it's a good idea. the only little disadvantage, is that this is one additional function call, which is difficult to inline if you keep the class definition in a .h and the implementation in a .cpp ( if I am correct only visual can do it at link time, not sure ). but it's probably not as bad as the virtual funtion for the instuction cache.

the problem really was not to know whether we like or not to disclose the private functions, the problem was, knowing we don't want to show the private functions, is that possible to deal with it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Cedric Perthuis
it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions.


It's easy to conceive, but wrong in this situation. There is one additional indirection here, which consists in jumping into a vtable and fetching an address there. In themselves, vtables are pretty small and only a few different ones exist.

In short, if you call a small group of virtual functions 5000 times in a row, the processor would probably cache-miss on the first access to each vtable, cache it, and hit the cache for the others. So, the more you call your virtual functions, the more they behave like normal ones. Of course, if you flush the cache often enough for vtable access to be performance-adverse, you've got bigger problems on your hands than virtual function overhead (that is, the cache flushing itself).

On the other hand, if you do need runtime polymorphism, you'll have to pay for it (either as virtual functions, or as your own in-house approach, which is usually a bad starting point), and if you don't need runtime polymorphism, you shouldn't be using virtual functions at all.

Quote:
Original post by Cedric Perthuis
it's a good idea. the only little disadvantage, is that this is one additional function call, which is difficult to inline if you keep the class definition in a .h and the implementation in a .cpp ( if I am correct only visual can do it at link time, not sure ). but it's probably not as bad as the virtual funtion for the instuction cache.


If you choose not to provide implementation details in the header file, then you will always have this problem: the definition will not be available for inlining. So, if your intent is to hide a high-performance tool this way, too bad.

Note that additional function calls made inside the implementation can be inlined: it's only the interface functions that cannot be.

And now, for something completely different: a variation on the pimpl idiom, with exactly zero strategy-specific implementation details in the interface.


// renderer_facade.hpp
// No private strategy details

template <class RenderingStrategy>
class RendererFacade
{
RenderingStrategy::RendererImpl impl;

public:

void Render()
{
impl.Render();
}

Texture<RenderingStrategy> Load(const std::string& file)
{
return impl.Load(file);
}
};

template <class RenderingStrategy>
class TextureFacade
{
RenderingStrategy::TextureImpl impl;

public:

// And so on
};

// GLRenderer.hpp
// Private details for GL strategy only

#include "renderer_facade.hpp"
class GLStrategy
{
public:

// Renderer implementation
class RendererImpl
{
// Implementation data here
public:
void Render();
Texture<GLStrategy> Load(const std::string& file);
};

// Texture implementation
class TextureImpl
{
// Other members
};
};

// renderer.hpp
#ifdef OPENGL
#include "GLRenderer.hpp"
typedef GLStrategy RendererStrategy;
#elif DIRECTX
#include "DXRenderer.hpp"
typedef DXStrategy RendererStrategy;
#endif

typedef RendererFacade<RendererStrategy> Renderer;
typedef TextureFacade<RendererStrategy> Texture;




The benefits of this are:
  1. Anything that would have been inlined in a straightforward implementation will be inlined here (including all the wrapper calls) on any decent optimizer.
  2. The interface is split into two files: a strategy-agnostic main file, which merely describes the interface (your public header file), and a strategy-specific helper file that discusses actual data members (your private header file).
  3. It's automatic: the size of the Renderer class is adjusted by the compiler, its constructors, destructors and assignment operators are safely forwarded, and you can alter the private implementation without having to refer to the standard every few seconds.
  4. You can add new renderers without having to modify the facade class: you only have to alter the compiler switch in renderer.hpp to accomodate a new typedef.
  5. You can ensure consistency: an OpenGL Renderer cannot be mistakenly given a DirectX Texture to work on.
  6. You have an interface-implementation separation for each strategy, so changing a strategy doesn't force you to recompile everything.


I usually refer to C++ purists as C++ victims: people who've been bitten in the ass by C++ so many times that they can actually smell a bad idea when they see one. Your quote:

Quote:
So I was thinking that an idea to try with extreme care in some very specific cases, would be to create those 2 versions of the headers, a public only one, and the regular one ( public and private ). the lib user could compile with the public one only. The link step will still be correct. In most cases I think it will just work.


is enough to send shivers down my spine, my chair, the floor in my room, plus one or two additional floors in my apartment building all down to the basement, and keep'em there for an hour or two. Knowing that when this kind of contraption blows up (and it always does), you'll be out in the cold, naked (without a compiler/debugger/linker/standard to help you out).

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
And now, for something completely different: a variation on the pimpl idiom, with exactly zero strategy-specific implementation details in the interface.

*** Source Snippet Removed ***

The benefits of this are:
  1. Anything that would have been inlined in a straightforward implementation will be inlined here (including all the wrapper calls) on any decent optimizer.
  2. The interface is split into two files: a strategy-agnostic main file, which merely describes the interface (your public header file), and a strategy-specific helper file that discusses actual data members (your private header file).
  3. It's automatic: the size of the Renderer class is adjusted by the compiler, its constructors, destructors and assignment operators are safely forwarded, and you can alter the private implementation without having to refer to the standard every few seconds.
  4. You can add new renderers without having to modify the facade class: you only have to alter the compiler switch in renderer.hpp to accomodate a new typedef.
  5. You can ensure consistency: an OpenGL Renderer cannot be mistakenly given a DirectX Texture to work on.
  6. You have an interface-implementation separation for each strategy, so changing a strategy doesn't force you to recompile everything.



That, sir, is brilliant. :D I love these simple yet powerful ideas. I'd also like to point out the way that renderer.hpp is set up to isolate the conditional compilation logic. This is encapsulation of a kind not normally discussed, and is also where the "ensuring consistency" property comes from. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
Original post by Cedric Perthuis
it's easy to conceive a significant percentage of virtual function call will miss the instruction cache, just because there are 2 indirections, a far bigger percentage than if there was no virtual function, because in that case the compiler or the user could rearrange the functions.


It's easy to conceive, but wrong in this situation. There is one additional indirection here, which consists in jumping into a vtable and fetching an address there. In themselves, vtables are pretty small and only a few different ones exist.

In short, if you call a small group of virtual functions 5000 times in a row, the processor would probably cache-miss on the first access to each vtable, cache it, and hit the cache for the others. So, the more you call your virtual functions, the more they behave like normal ones. Of course, if you flush the cache often enough for vtable access to be performance-adverse, you've got bigger problems on your hands than virtual function overhead (that is, the cache flushing itself).


:) of course the 5000 same functions are not called 5000 times in a row, in the case of rendering objects there can be easily few thousands instructions in between. you jump somewhere else in the code, you increase the risk of hitting a cache line used in the code of the caller. Sure instruction cache misses performance hits aren't as big of an issue on PCs which have huge and extremely performant caches, still on gaming console this is a problem. it's greatly aggravated by virtual functions calls. having linear codes greatly helps, in the range of few ms / frame. It's difficult to quantify it without an example for sure.

Quote:
Original post by ToohrVyk
I usually refer to C++ purists as C++ victims: people who've been bitten in the ass by C++ so many times that they can actually smell a bad idea when they see one. Your quote is enough to send shivers down my spine, my chair, the floor in my room, plus one or two additional floors in my apartment building all down to the basement, and keep'em there for an hour or two. Knowing that when this kind of contraption blows up (and it always does), you'll be out in the cold, naked (without a compiler/debugger/linker/standard to help you out).


very true, as I said it depends on the case, with a private constructor and a factory, there's no way to get wrong.

still your suggestion is nice, at least there is one true solution. thanks to the loophole of the language :), it reminds me of a hack used to call private functions legally.

[Edited by - Cedric Perthuis on January 30, 2007 2:42:48 AM]

Share this post


Link to post
Share on other sites
Hi ToohrVyk,
You should understand that while "last-gen" consoles had a piddly, measly I-Cache size of 16kb, after 6 years of technological breakthroughs our monster-sized "Next-Gen" I-cache is a whopping 32kb (shared between 2 hardware threads, just to show it is a really poor improvement). And the cache-line is quadrupled from 32 bytes to 128. So grabbing a few bytes here and there randomly, like accesses into vtables, or even when the vtable pointer itself may be unfortunately positioned... it can be really bad.

There just isn't enough wayism in the cache to cover the wayivity of tendrils thrown out from virtual functions.

Still, toohrvyk, I have myself written almost exactly your code into games that have shipped on three platforms! I felt really guilty at the time, it looks like a huge hack... But the client just includes your header, and doesn't even realize it's getting the platform-specific version rather than just an interface. And just for debugging purposes, a flip in the typedef gives the "reference version" platform-agnostic C code to test against the 3000 lines of vector assembly version. I'm glad someone else thought of the same solution to a weird problem.

[Edited by - ajas95 on January 30, 2007 2:22:15 AM]

Share this post


Link to post
Share on other sites
On PS2 we used to relocate the vtables of our scenegraph objects to a fast uncached memory segment. it was saving for us several precious ms on our scenegraph traversal, just because we were avoiding cache misses later. we shipped this code in several dozens of games..., unfortunately there's no such uncached fast memory on "next-gen" machines...
I was a little bit shocked to see people giving a hard time to the inital poster in the other thread just for bringing up the issue :)
I'll definitely use the template suggestion if I have the occasion.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By Vu Chi Thien
      Hi fellow game devs,
      First, I would like to apologize for the wall of text.
      As you may notice I have been digging in vehicle simulation for some times now through my clutch question posts. And thanks to the generous help of you guys, especially @CombatWombat I have finished my clutch model (Really CombatWombat you deserve much more than a post upvote, I would buy you a drink if I could ha ha). 
      Now the final piece in my vehicle physic model is the differential. For now I have an open-differential model working quite well by just outputting torque 50-50 to left and right wheel. Now I would like to implement a Limited Slip Differential. I have very limited knowledge about LSD, and what I know about LSD is through readings on racer.nl documentation, watching Youtube videos, and playing around with games like Assetto Corsa and Project Cars. So this is what I understand so far:
      - The LSD acts like an open-diff when there is no torque from engine applied to the input shaft of the diff. However, in clutch-type LSD there is still an amount of binding between the left and right wheel due to preload spring.
      - When there is torque to the input shaft (on power and off power in 2 ways LSD), in ramp LSD, the ramp will push the clutch patch together, creating binding force. The amount of binding force depends on the amount of clutch patch and ramp angle, so the diff will not completely locked up and there is still difference in wheel speed between left and right wheel, but when the locking force is enough the diff will lock.
      - There also something I'm not sure is the amount of torque ratio based on road resistance torque (rolling resistance I guess)., but since I cannot extract rolling resistance from the tire model I'm using (Unity wheelCollider), I think I would not use this approach. Instead I'm going to use the speed difference in left and right wheel, similar to torsen diff. Below is my rough model with the clutch type LSD:
      speedDiff = leftWheelSpeed - rightWheelSpeed; //torque to differential input shaft. //first treat the diff as an open diff with equal torque to both wheels inputTorque = gearBoxTorque * 0.5f; //then modify torque to each wheel based on wheel speed difference //the difference in torque depends on speed difference, throttleInput (on/off power) //amount of locking force wanted at different amount of speed difference, //and preload force //torque to left wheel leftWheelTorque = inputTorque - (speedDiff * preLoadForce + lockingForce * throttleInput); //torque to right wheel rightWheelTorque = inputTorque + (speedDiff * preLoadForce + lockingForce * throttleInput); I'm putting throttle input in because from what I've read the amount of locking also depends on the amount of throttle input (harder throttle -> higher  torque input -> stronger locking). The model is nowhere near good, so please jump in and correct me.
      Also I have a few questions:
      - In torsen/geared LSD, is it correct that the diff actually never lock but only split torque based on bias ratio, which also based on speed difference between wheels? And does the bias only happen when the speed difference reaches the ratio (say 2:1 or 3:1) and below that it will act like an open diff, which basically like an open diff with an if statement to switch state?
      - Is it correct that the amount of locking force in clutch LSD depends on amount of input torque? If so, what is the threshold of the input torque to "activate" the diff (start splitting torque)? How can I get the amount of torque bias ratio (in wheelTorque = inputTorque * biasRatio) based on the speed difference or rolling resistance at wheel?
      - Is the speed at the input shaft of the diff always equals to the average speed of 2 wheels ie (left + right) / 2?
      Please help me out with this. I haven't found any topic about this yet on gamedev, and this is my final piece of the puzzle. Thank you guys very very much.
    • By Estra
      Memory Trees is a PC game and Life+Farming simulation game. Harvest Moon and Rune Factory , the game will be quite big. I believe that this will take a long time to finish
      Looking for
      Programmer
      1 experience using Unity/C++
      2 have a portfolio of Programmer
      3 like RPG game ( Rune rune factory / zelda series / FF series )
      4 Have responsibility + Time Management
      and friendly easy working with others Programmer willing to use Skype for communication with team please E-mail me if you're interested
      Split %: Revenue share. We can discuss. Fully Funded servers and contents
      and friendly easy working with others willing to use Skype for communication with team please E-mail me if you're interested
      we can talk more detail in Estherfanworld@gmail.com Don't comment here
      Thank you so much for reading
      More about our game
      Memory Trees : forget me not

      Thank you so much for reading
      Ps.Please make sure that you have unity skill and Have responsibility + Time Management,
      because If not it will waste time not one but both of us
       

    • By RoKabium Games
      We've now started desinging the 3rd level of "Something Ate My Alien".
      This world is a gas planet, and all sorts of mayhem will be getting in our aliens way!
      #screenshotsaturday
    • By Pacoquinha Studios
      Kepuh's Island is Multiplayer 3D Survival Game where you survive on the Kepuh's Islands, confronting challenges that are not only other players but also bosses, and even the environment itself.
      We have a lowpoly faster battle-royale idea, where about 12 players on the map fighting for survival! Also adding some more things into that style such as bosses around the map giving you abilities and much more such as vehicles, weapons, skins, etc...
      Now we are on cartase which is a crowdfunding online which purpose is to raise funds for the development of the game. Come and be part of this development.
      Link for Cartase: https://www.catarse.me/kepuhsisland?ref=project_link
      We post updates and trailers on
      Twitter: https://twitter.com/pcqnhastudios
      Facebook: https://www.facebook.com/pacoquinhastudios/
      Site: http://pacoquinhastudios.com.br
      If you could check out it would be great
      Thnks
      Some images:





  • Advertisement