Jump to content
  • Advertisement
Sign in to follow this  
Nairou

Batching draw calls for multiple object instances?

This topic is 2527 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to figure out how to avoid doing a separate draw call for each object in the scene, and instead batch them so that there are fewer draw calls per frame.

I keep hearing about how you need to increase the number of triangles rendered per batch to increase performance. However, let's say your scene is filled with hundreds of small (few triangle) objects, all moving around independently. Since each one has a different world transform, I always assumed each one would need to be a separate draw call, in order to pass the transform to the shaders. But then how to people batch their draw calls? Can batching only be applied to static unmovable geometry? Is there another way to accomplish this?

Share this post


Link to post
Share on other sites
Advertisement
I keep hearing about how you need to increase the number of triangles rendered per batch to increase performance.
This isn't quite true, it's more about reducing the number of draw-calls per frame.
If you've got a fixed number of triangles that you want to draw, then increasing triangles/draw-call will have the effect of reducing draw-calls/frame.
However, let's say your scene is filled with hundreds of small (few triangle) objects, all moving around independently. Since each one has a different world transform, I always assumed each one would need to be a separate draw call, in order to pass the transform to the shaders[/quote]You can pass an array of transforms to the shader, and have each vertex contain a different index into that array.
There's also stream instancing, where you can render the same object many times, with different per-instance data (such as a world transform).

Share this post


Link to post
Share on other sites

[quote name='Nairou' timestamp='1316066124' post='4861914']I keep hearing about how you need to increase the number of triangles rendered per batch to increase performance.
This isn't quite true, it's more about reducing the number of draw-calls per frame.
If you've got a fixed number of triangles that you want to draw, then increasing triangles/draw-call will have the effect of reducing draw-calls/frame.
However, let's say your scene is filled with hundreds of small (few triangle) objects, all moving around independently. Since each one has a different world transform, I always assumed each one would need to be a separate draw call, in order to pass the transform to the shaders[/quote]You can pass an array of transforms to the shader, and have each vertex contain a different index into that array.
There's also stream instancing, where you can render the same object many times, with different per-instance data (such as a world transform).
[/quote]
NVidia presented a formula for render calls per frame a few years back:

rendercalls = 25000 * g * f / fps

with
g = GHz of CPU
f = factor of CPU time spend to deploy the render calls
fps = target frames per second


I.e. when you target a 2.5GHz CPU, with an average of 30 FPS and 40% of CPU time spend for render calls you got

r = 25000 * 2.5 * 0.4 / 30 = 833

That is, you can spend ~800 render calls per frame in this example. The bottleneck are the CPUs not the GPUs, so each render call should stress the GPU (lot of tris or expensive/large shaders). Eventually this is just a rule of thumb and data may change with better APIs,drivers etc..

Share this post


Link to post
Share on other sites
the easiest sort of batching would be static batching. In this you will merge static primitives with the same material into one big mesh. This will let you draw them all with one call (if they all fit in one mesh that is), the tradeoff would be that this pretty much nullifies your culling algorithms.
You can also do dynamic batching, for this you could look for "render buffers". The idea is to have a certain amount of buffers, each buffer containing geometry for a single material. When you want to render an object you transform it to world on the cpu and put it in the buffer instead. When a buffer is full or you need it for another material you send the data to the gpu and draw all the buffered objects using a single draw call. While multithreading and a sse optimized matrix can help with performance in this case you will have to find out if what you gain by less draw calls is more than what you lose by using dynamic buffers.

Share this post


Link to post
Share on other sites

There's also stream instancing, where you can render the same object many times, with different per-instance data (such as a world transform).

Instancing definitely sounds like it will do the trick. I had heard of instancing, but wasn't familiar with how it worked. Though as a curiosity, I wonder how this sort of thing was done (or if it even was) before instancing existed. Was every object in the scene always just rendered in it's own draw call until instancing came around?

As far as passing an array to the shaders to provide per-instance positioning, I'm guessing there isn't actually a way to pass an array, and that I'll have to pass it as a texture, correct? Does this mean I need to continually edit and re-upload an actual texture to the GPU in order to pull this off?




NVidia presented a formula for render calls per frame a few years back:
(...)

That is, you can spend ~800 render calls per frame in this example. The bottleneck are the CPUs not the GPUs, so each render call should stress the GPU (lot of tris or expensive/large shaders). Eventually this is just a rule of thumb and data may change with better APIs,drivers etc..

Thanks for the formula! From the look of it, I probably won't need to put instancing at the top of my to-do list. But I would eventually like to consolidate my draw calls, as I will quickly get to the point where I'm making several hundred draw calls per frame, but only rendering 15-20 vertices per call.

Share this post


Link to post
Share on other sites
Though as a curiosity, I wonder how this sort of thing was done (or if it even was) before instancing existed. Was every object in the scene always just rendered in it's own draw call until instancing came around?
Simply using lots of draw calls is often a valid solution. The transform-array method is pretty old, and is still being as well.
As far as passing an array to the shaders to provide per-instance positioning, I'm guessing there isn't actually a way to pass an array, and that I'll have to pass it as a texture, correct?[/quote]Both HLSL and GLSL support arrays in shaders, though you could use a texture instead if you wanted to.
Actually -- you need the transforms in the vertex shader, and older cards do not support reading from textures in vertex shaders, so on older cards you'd have to use an array.
Does this mean I need to continually edit and re-upload an actual texture to the GPU in order to pull this off?[/quote]If you used a texture, yes. N.B. it would also have to be a floating-point texture in order to hold transforms.

Share this post


Link to post
Share on other sites
[color="#1C2837"]That is, you can spend ~800 render calls per frame in this example[/quote]
[color="#1C2837"]In a real world test though, I got away with like 100K calls at over 60fps. For 100 objects, 1,000 objects you really won't need to optimize unless you are pushing photorealism. And even then I dont know how much benefit you will get.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!