Sign in to follow this  
pro_optimizer

What are good estimates of renderer (UE3, CryEngine1/2, ...) overhead?

Recommended Posts

Hello folks, do you have any ballpark estimations of what the overhead of the renderer architecture of a typical state-of-the-art engine (such as the mentioned ones) is? I'm thinking about shader-heavy scenes, with many different materials and objects. Also, what overhead levels would be acceptable to be able to compete with such a renderer? 20% more than the driver time? 50? Thanks in advance, Christian

Share this post


Link to post
Share on other sites
There is no way to answer your question. There is no overhead of renderer architecture because every renderer is unique. It all depends on how you build it and you balance everything in a way that it runs the way you want it. So there is nothing fixed.
It also heavily depends on the underlying target platform. If you target XBOX 360 and PS3 this is a constant value otherwise not.
Even if you remove all the variables from the underlying hardware it all depends on what you render. Let's say you want to measure what it takes to render into a shadow map with the double-speed write. There is not much you can do different so it all depends on how much geometry you render and how much foliage will go in there and how big the foliage is ...

So no there is no general answer possible ... it all depends on how long you want it to take :-).

Share this post


Link to post
Share on other sites
Okay, so I'll try to bring in a few more details.

I wrote a little hobby renderer which, amongst others, uses an abstraction layer on top of Cg.
This provides a convenient language to parameterize shaders for the upper layers, instantiates & caches shaders on the fly, does API calls only when necessary and is less than 1500 LoC. It combines a few tricks like copy-on-write, cached hashes and union-find to stay efficient. But still, I'm paying approx. 15-30% of the frame time for this convenience, depending on how complex & dynamic the scene is. So, what I'd like to decide is whether I can roll with it and build a full-blown renderer on top of it, or whether that's just too much overhead to be acceptable. The thing is, I don't know if other (successful) engines are designed this way, or whether there's some kind of mantra to do this kind of tracking at the upper layers, where more information is available and the overhead can be mitigated.

Here are a few excerpts from a demo app (the "backend" doesn't know anything about the layout of shaders, parameters, passes, materials etc.)


// done by the shadow mapping framework
cg_value lookup = backend->create_instance("SingleMapLookup");
lookup["viewproj"] = viewproj;
lookup["map"] = tex;

// done by the variance shadow mapper
shadowing = backend->create_instance("VarianceShadowing");
shadowing["lookup"] = lookup;
shadowing["depthScale"] = 1.0f/target.range;

// done by some light sources
model = renderer->create_instance("PointLight");
model["shadowing"] = backend->create_instance("NullShadowing");
model["color"] = node->color();
// when position changes
model["position_w"] = node->global()[3];
if (shadows) {
mapper->setup_point(node->global()[3],100);
mapper->calculate_shadowing(node->sector());
model["shadowing"] = mapper->get_shadowing();
}

// done by the render loop ...
...
backend["brightness"] = cBrightness;
...
vector<cg_value> sources;
sources.push_back(ambient.value());
foreach(spatial_node *cur, visibles) {
...
sources.push_back(light->model());
}
backend["lights"] = sources;
...
backend.render("ZPrepass",visibles);
backend.render("AmbientIllumination",visibles);
...





As you see, many distant parts of the engine are nicely decoupled by this use of a common parameter type, so it obviously has some merits...
In the end, I would do one or more graphics demos, in the style of Crysis (although not so foliage centric), running at not less than 20 fps on a modern PC (GF8800GT-ish).

[Edited by - pro_optimizer on May 16, 2008 6:13:01 PM]

Share this post


Link to post
Share on other sites
Profile your code, you look heavily depended on string compares "lookup["map"]" which can be "non-cheap". But honestly from my experience this should be more like 5%. But without seeing the code behind it it's impossible to say.

Share this post


Link to post
Share on other sites
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.
Most modern engines are dealing mostly with data management. Depending on your underlying platform and the number of cores you will want to structure your C code following the data and not the other way around. So in essence do not think about designing classes but think about what you want to do with the data and see how you can stream data to the numerous stream processors on your hardware platform (GPU, SPU or multi-core CPUs). This is easier than one would think. If you know how to program a GPU you just build the same kind of model for some of the cores of a CPU. So you define a data input structure, a data output structure and you write small C functions that manipulate the data in-between. Than one of your cores works on this data ... I think you got the picture.
So the idea is to distribute the work that the renderer does over several processors.
When you program the GPU you will have to be careful not to re-build what the driver on the PC is already doing for you. Shadowing constant data might be a good thing to do. Managing render states so that you can set cache them too. Other than this I am not quite sure if you want to do anything special here.
Because the GPU is the most powerful chip in your console or PC, you want to spend lots of time to figure out how to squeeze the last cycle out of it. On the C level you have the challenge to feed the GPU as fast as possible with data without involving shader switching, shader constant waterfalling, render state switches, texture switches and all the other terrible things that can happen here. Whatever you design on the C level is build in a way that you can feed the GPU in the best possible way. So your design is driven by the way you balance shader switching and all this other stuff while rendering.

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.

I must have stumbled into a time machine because I thought it was 2008 and not 1988.

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.


Uh, what? I've used both STL and boost libraries on SPUs, there's nothing inherently slow about the code at all, and I doubt you could rewrite their algorithms to be faster until you vectorize it. Templates (or rather 'generic programming'), in general reduce the use of virtual functions, increase type safety, and make it much easier to write tight, fast code. Sounds like you're just dismissing tools without actually understanding what's slow about them. I can write slow code using any language, with any library.

Plus, almost every studio nowadays uses a decent competence of C++ to write most of their code.

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.


Does this really worth the extra development time you'll put into developing home-brewed containers if, say your title is PC only*? OK, maybe yes, but consider just how much time and man-power you'll have to put into debugging a full-blown set of containers. But more importantly, you'll loose most of the benefits of using a newer object-oriented language like modularity, code resuse, encapsulation and all other stuff that the world has been shouting about for the last decade or two. You'd basically have to start all over again for the next title.

Hell, even John Carmack, who's been known for his notorious use of C moved to C++ when working on Doom3, which considering the fact that the game's development started 4 years prior to its relase, it must have been around 2000.

Let's burry C deep down in the jungles with a stick in its heart! It's really time to move on.

EDIT:
* : The above poster noted that it can also be used on consoles.

Share this post


Link to post
Share on other sites
Quote:
Original post by AnAss
Quote:
Original post by wolf
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.

I must have stumbled into a time machine because I thought it was 2008 and not 1988.


Much as I fear agreeing with AnAss might make me one I have to say that declaring the STL, Boost and C++ (by virtue of only addressing C) to be "all the other things you do not want to use in game development" is a level of madness I hear far too often in game coding circles.

Far too often it leads to the idiocy of writing every single thing yourself only to find that what has been implemented is... the STL, or Boost::AutoPtr but with none of the benefits of using the STL or Boost!

Sure there's plenty of times you don't want to be using the STL or many other libraries with wild abandon but they're a damned useful tool and nothing LESS.

Wierd to still see these opinions going around. For reference I have used the STL (or STLport when needed) on every platform that I've got listed in my profile. It's a useful tool. When you're profiling shows that it's the slow thing you rework the code but to date I've never found it to even showup in most of our profiling as a cause of poor performance, and when it HAS shown up in profiling it's because the algorithm calling it all of the time is wrong.

Andy

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
Because the GPU is the most powerful chip in your console or PC, you want to spend lots of time to figure out how to squeeze the last cycle out of it. On the C level you have the challenge to feed the GPU as fast as possible with data without involving shader switching, shader constant waterfalling, render state switches, texture switches and all the other terrible things that can happen here.


That's funny, I thought many more games were simply GPU bound than CPU bound. Though it does pack a lot of good power, it's still often the bottleneck.

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
As mike2343 says profile your code. What you also can do is remove all the speed bumps like STL (can't use it on consoles anyway)and BOOST and all the other things you do not want to use in game development. If you use C++ -it looks like it- reduce the number of virtual functions, use as much C as possible and don't forget that C is the language you will use most to program your renderer and especially SPU's and try to keep your code fast.
Most modern engines are dealing mostly with data management. Depending on your underlying platform and the number of cores you will want to structure your C code following the data and not the other way around. So in essence do not think about designing classes but think about what you want to do with the data and see how you can stream data to the numerous stream processors on your hardware platform (GPU, SPU or multi-core CPUs). This is easier than one would think. If you know how to program a GPU you just build the same kind of model for some of the cores of a CPU. So you define a data input structure, a data output structure and you write small C functions that manipulate the data in-between. Than one of your cores works on this data ... I think you got the picture.
So the idea is to distribute the work that the renderer does over several processors.
When you program the GPU you will have to be careful not to re-build what the driver on the PC is already doing for you. Shadowing constant data might be a good thing to do. Managing render states so that you can set cache them too. Other than this I am not quite sure if you want to do anything special here.
Because the GPU is the most powerful chip in your console or PC, you want to spend lots of time to figure out how to squeeze the last cycle out of it. On the C level you have the challenge to feed the GPU as fast as possible with data without involving shader switching, shader constant waterfalling, render state switches, texture switches and all the other terrible things that can happen here. Whatever you design on the C level is build in a way that you can feed the GPU in the best possible way. So your design is driven by the way you balance shader switching and all this other stuff while rendering.


In short, while you're busy rewriting mechanisms to do virtual functions, while you're busy debugging invalid void* to T* casts, while you're busy rewriting standard containers and algorithms, my code will be easier to verify as correct, and thus the designers can iterate on gameplay much earlier, and product risks can be evaluated much earlier, giving me more time to sit back and optimize intelligently, instead of debugging under the wire until ship.

Share this post


Link to post
Share on other sites
Apologies for us completely going off-topic so back to your original post.

It's been pointed out that you could really do with profiling your code to actually find out what is costing you the most time. That string compares are bad, what's wrong with using enums? but that you're just fine using the STL and Boost.

However your question wanted to know about actual overhead which is considered normal. Which is a tricky stick to measure. If you're doing nothing particularly complex in terms of gameplay, no physics, little sound, etc then the renderer could take a large percentage of the available cpu time. Conversely if your doing a tonne of complex calculations (networking, physics, lots of 3D audio etc) then you're going to get bound by some hard cpu time limits and this will impinge upon the cycles you've got spare for rendering code.

Again it all comes down to profiling and eliminating your bottlenecks.

Andy

Share this post


Link to post
Share on other sites
Thanks for your answers, so far.

I'll see how much I can squeeze out by more aggressive optimizations -
there are still a few heavyweights in the code, which is originally a prototype study.

I also wouldn't reject C++ at this point, since some of its abstraction facilities, esp. templates, are invaluable for me, being a single person who plans to implement tons of functionality quickly, bug-free, and with high run-time efficiency. Still, I think that this is not the right place to discuss these points.

Share this post


Link to post
Share on other sites
Quote:
Original post by RDragon1
In short, while you're busy rewriting mechanisms to do virtual functions, while you're busy debugging invalid void* to T* casts, while you're busy rewriting standard containers and algorithms, my code will be easier to verify as correct, and thus the designers can iterate on gameplay much earlier, and product risks can be evaluated much earlier, giving me more time to sit back and optimize intelligently, instead of debugging under the wire until ship.


I don't think wolf suggested at any point you try to roll your own virtual functions. I think he was suggesting that you simply try refactoring your renderer so that it requires fewer virtual functions in the back end. This is more important when developing on the currently generation of console hardware than you might think if you're only used to PC games programming.

To address your second point, I don't know of any large games company that hasn't rolled their own version of the the standard containers and algorithms. There are more savings there to be made than you might think (provided you know what you're doing). For example: eastl

Share this post


Link to post
Share on other sites
Regarding the usage of C++ I always recommend Pete Isensee's talk on GDC. He is a bit generic so. As someone who writes a renderer you want to be more cautious.

Regarding the data-driven design you should read Mike Acton's GDC 2008 talk. He is a bit extreme in not using C++ at all but I can agree with most what he says otherwise.

Regarding balancing a game engine: if you are pixel shader limited and your game runs with 30 or 60fps you did a good job in using all your resources evenly. This is the ultimate goal :-) ... being CPU limited is not that good :-) but obviously easier to achieve :-)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this