How do you implement batching in DirectX?

Graphics and GPU Programming Programming

Started by ronan.thibaudau October 10, 2013 11:45 PM

7 comments, last by Hodgman 10 years, 6 months ago

ronan.thibaudau

1,038

Author

October 10, 2013 11:45 PM

Hello there,

I'm trying to work directly with DX (SharpDX to be precise) coming from higher level engines and am left wondering how batching is managed.

From what i noticed in my Unity3D experience you can "batch" items that share a material (same shader, same parameters) into a single draw call, however i'm confused about how you do that since draw takes a single set of vertices. I did hear about drawinstanced, but that's for drawing the same geometry if i understand correctly so my question is : How do i draw multiple different meshes (that share the same shader / textures) in a single Draw() call?

Thanks.

Jason Z

6,437

October 11, 2013 01:39 AM

The mesh data all has to be in a single Input Assembler configuration, and then the draw call can span multiple meshes. This typically means that you need to allocate one large buffer and fill it with pre-transformed vertex data, or create a transformation matrix array and pass that as a constant parameter. The performance will depend on what else you are doing, so the best way is to try both implementations and see which one works best in your configuration. It may also be faster to just use multiple draw calls too!

Jason Zink :: DirectX MVP

Direct3D 11 engine on CodePlex: Hieroglyph 3

Direct3D Books: Practical Rendering and Computation with Direct3D 11, Programming Vertex, Geometry, and Pixel Shaders
Articles: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article (original):: Fast Silhouettes Article

Games: Lunar Rift

ronan.thibaudau

1,038

Author

October 11, 2013 01:53 AM

The mesh data all has to be in a single Input Assembler configuration, and then the draw call can span multiple meshes. This typically means that you need to allocate one large buffer and fill it with pre-transformed vertex data, or create a transformation matrix array and pass that as a constant parameter. The performance will depend on what else you are doing, so the best way is to try both implementations and see which one works best in your configuration. It may also be faster to just use multiple draw calls too!

But what i'm confused is about how to do this (technically speaking, i get the concept)

I mean i get the part about stuffing multiple meshes in a vertex buffer & pre transforming them but what do i actually call (in code) to tell direct x, "here , take this buffer, use this shader with it, but do understand it's multiple diferent meshes and not a single poly soup". Currently all i would know how to do would be thread the whole thing as poly soup (which would mean both meshes would be linked by a triangle).

So what method do i call to draw a single time & render multiple (pre transformed is fine) different meshes?

ronan.thibaudau

1,038

Author

October 11, 2013 01:54 AM

Oh and another thing because something you said picked my interested: it can be faster to do multiple draw calls!

How come? I mean i always understood it was a design limitation (sharing material) vs performance, how could it possibly be faster to call draw multiple times instead of once?

Nik02

4,359

October 11, 2013 05:49 AM

If the batching setup takes more time than the draw calls, then multiple draw calls are cheaper.

Multiple meshes in one VB/IB pair do not need to share any vertices. DrawIndexed takes a triangle list, and it doesn't care about geometry adjacency (though adjacency info has its uses in some GS stuff). For all D3D knows, all the vertices in the vertex buffer (and triangles in index buffer) are separate.

Niko Suni

Hodgman

52,717

October 11, 2013 07:01 AM

Currently all i would know how to do would be thread the whole thing as poly soup (which would mean both meshes would be linked by a triangle).

It depends on the primitive type that you're drawing with.
Often games use triangle lists, where 3 values are read from the index buffer at a time to define a triangle. Triangles can be disconnected from each other / there's no need for them to all be linked. If you use triangle lists, there's no problem at all.

If you're using a triangle strip / triangle fan, then yes, all the triangles are joined together, as after the first triangle, only one index is read at a time to 'attach' the next vertex in the strip/fan. This causes a problem when you want to draw different parts in one draw-call, without them being linked.
In this situation you can use a trick known as a "degenerate triangle" to link two physically seperate meshes together. The GPU will immediately discard any triangle with zero area without drawing anything - these triangles are called degenerates.
e.g. say you've got two quads, defined with these indices:

1-2  5-6
|/|  |/|
3-4  7-8

For a tri-strip draw-call, you can use the index buffer values of 1234455678, instead of 12345678. The duplicate indices in the middle create the degenerate linking triangles.
Those indices interpreted as a triangle strip will result in 7 triangles using these indices:
123 234 344 445 556 567 678
...but, with 3 of those triangles being degenerate (zero area, because the same vertex is used twice per triangle) and ignored by the GPU.

Oh and another thing because something you said picked my interested: it can be faster to do multiple draw calls!

How come? I mean i always understood it was a design limitation (sharing material) vs performance, how could it possibly be faster to call draw multiple times instead of once?

Whenever you call any D3D/GL function, it takes time on the CPU. Reducing the number of state-changes (GL/D3D calls other than 'draw' ones, such as changing textures/shaders/etc) is just as important as reducing the number of draw-calls.

However, in D3D11, calls are much cheaper than they were in D3D9. In D3D9, it's quite important to keep your D3D-call count low, but in D3D11 you can make many more calls in the same amount of CPU time... so it's not quite as much of a concern as it used to be.

In any case, reducing draw-calls is a CPU-side optimization.
The GPU on the other hand doesn't really care how many draw-calls you make, as long as between each group of state-changes (i.e. changing shaders, textures, etc) it is given a decent amount of work.

Usually you'll sort your objects so that all the ones that share a material are drawn one after the other, so that you can make all the state-changes once, then execute all the draw-calls together without state-changes inbetween. This reduces your CPU overheads, and it also gives the GPU more work to do between groups of state changes, so the GPU will stall less.

On the GPU side, if you set up your material (set your shader, textures, etc) once and then make 10 small draw-calls, or just make 1 big draw-call, the GPU acts the same.
On the CPU side though, the version with 10 draw-calls has to do up to 10 times more work.

If you have a fast CPU and a slow GPU, then CPU-side optimisations will have no effect on your framerate.

Also, if your 'batching' strategy requires you to copy vertex data around, merge buffers, pre-transform stuff on the CPU, etc, etc, then all that work might be more expensive than the savings you get from 'batching'. It may be faster to just do nothing ;)

. 22 Racing Series .

Shane C

1,369

October 11, 2013 07:46 AM

If you have a fast CPU and a slow GPU, then CPU-side optimisations will have no effect on your framerate.

There is something I wanted to let you know that isn't really related to the subject, but since there aren't too many avenues to tell you about this, I am posting here.

I think you are correct about this. However, I have been a computer benchmarker in the past. Some commercial games improved greatly from adding more CPU power even when the GPU was the problem, that upgrading the GPU was what I really needed to do for great performance increase. However, improving CPU performance still mattered tons.

In the case of the person being given advice to, there is probably so much CPU performance that it's the same in speed, though. :)

ronan.thibaudau

1,038

Author

October 11, 2013 05:05 PM

Thanks for the clarification, for some reason i was (mistakenly) thinking dx was drawing tris out of verts 1/2/3 then 2/3/4 then 3/4/5 etc while it actually does 1/2/3 ; 4/5/6 etc, this is where i was confused but it all makes sense now, thanks :)

Hodgman

52,717

October 12, 2013 03:39 AM

Thanks for the clarification, for some reason i was (mistakenly) thinking dx was drawing tris out of verts 1/2/3 then 2/3/4 then 3/4/5 etc while it actually does 1/2/3 ; 4/5/6 etc, this is where i was confused but it all makes sense now, thanks

It can do either, as mentioned above.

If you tell it to draw triangle lists, it interprets 123456 as 1/2/3, 4/5/6.

If you tell it to draw triangle strips, it interprets 123456 as 1/2/3, 2/3/4, 3/4/5, 4/5/6.

. 22 Racing Series .

How do you implement batching in DirectX?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How do you implement batching in DirectX?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines