Jump to content
  • Advertisement
Sign in to follow this  
magicstix

DX11 [DX 11] Best practices with rendering large numbers of objects

This topic is 2559 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all,
I've been dissecting the DX11 tutorials, but I'm not sure how efficient they are and whether or not the way they do things is the way one would expect to do it in a "real" 3D engine. Basically I don't want to be doing the D3D equivalent of calling glVertex3f over and over when I should be setting up a vertex buffer, etc...

So I'm wondering what are the best practices for rendering a large number of diverse objects as one would find in a game?

My first instinct would be to arrange all of my objects in a scene graph and traverse the graph, calling IASetVertexBuffers, IASetIndexBuffer, VSSetShader, PSSetShader, *SetConstantBuffers, etc, finally followed by a DrawIndexed or Draw call to render the object. I'd repeat this for each rendering pass until the final image is complete.

Does this sound normal, or would I want to do something different, like try and combine all of the scene vertices into a giant buffer of "vertex soup" to cut down on IASetVertexBuffer calls, or avoid making ConstantBuffer changes?

Are any of the calls I've listed particularly expensive, and so should be avoided?

Share this post


Link to post
Share on other sites
Advertisement
My first instinct would be to arrange all of my objects in a scene graph and traverse the graph
"Scene graph" means a lot of different things to different people, but most of those things are bad for rendering.
For opaque objects, you either want your rendering order to be from closest-to-furthest (for maximum z-buffer rejection), or by state-changes (to reduce the CPU overhead of calling D3D functions), or some combination of both.
For transparent objects, you need the rendering order to be from furthest-to-closest (for correct blending).
Traversing a scene graph doesn't give you any of these traversal orders.

calling IASetVertexBuffers, IASetIndexBuffer, VSSetShader, PSSetShader, *SetConstantBuffers, etc, finally followed by a DrawIndexed or Draw call to render the object.[/quote]To reduce CPU overhead (and internal command-buffer usage), you should only call these if necessary (e.g. if the previous draw-call used the same pixel-shader, then PSSetShader wouldn't be required).
However, I've seen some engines that state-caching systems that are so complicated, that the time spent determining whether a state-change is redundant is actually greater than the time saved by not performing the state-change... so keep it simple ;)
combine all of the scene vertices into a giant buffer of "vertex soup" to cut down on IASetVertexBuffer calls, or avoid making ConstantBuffer changes?[/quote]If you can combine buffers together, it's probably a good thing. Usually for a level's static meshes, this is a trivial task.
You should also reduce cbuffer changes where appropriate. Usually shaders are split up so that inputs from different sources are put into different cbuffers - e.g. a camera-params cbuffer, a material-params cbuffer, a model-instance-params cbuffer etc...

CPU-wise, none of those calls should be very expensive by themselves. GPU-wise, the performance of those calls is hard to quantify -- by themselves, each call basically has no impact, but the total difference in state from one batch of triangles to the next, and the amount of triangles/pixels submitted between each state-change is important. The GPU is heavily pipelined, where expensive operations are done in parallel with large numbers of cheap operations, with the result that in good conditions, expensive operations appear cheap. If there aren't enough cheap-operations to cover the latencies of expensive-operations, then a "pipeline bubble" forms, where the GPU is effectively idle for some time.

For example, lets say that while switching shader programs the GPU can process 500 triangles, and let's look at two situations:
1) Set shader A, draw 1 triangle, set shader 2, draw 3 triangles.
2) Set shader A, draw 500 triangles, set shader 2, draw 500 triangles.
Using the above assumptions, both of these situations would take the same amount of time to render!
In the first case, while setting "shader 2", the GPU only has a single triangle to keep itself busy with. After it finishes with that triangle, it has to sit around idle until the binding of "shader 2" has completed.
In the second case, the GPU has enough work, that the whole time that "shader 2" is being configured, it can be busy drawing triangles.

Now that's obviously a huge simplification, but the general rule of thumb is that whenever you change states, you want to draw as many pixels as you can before the next time you change states (to cover the internal latency caused by the change of internal states).

Share this post


Link to post
Share on other sites
the old "vertex soup" method works, as far as I know. But remember with dx10 and dx11 you get the nifty geometry shader, and that can actually draw 10 things every time you draw one thing, its great for "instancing"

Share this post


Link to post
Share on other sites

the old "vertex soup" method works, as far as I know. But remember with dx10 and dx11 you get the nifty geometry shader, and that can actually draw 10 things every time you draw one thing, its great for "instancing"


As far as I know, geometry shader isn't that efficient to perform instancing. The problem with the geometry shader is that it works on primitives and for that reason, you'll end up with huge amounts of extra vector-matrix multiplications per object per instance compared to the built-in instance API. There are drawing commands that perform geometry instancing without the geometry shader.

Cheers!

Share this post


Link to post
Share on other sites
If you use the GS or not depends on how you need your instancing to perform. There is also an inherient bottleneck by the requirement that output vertices per primative must be kept together which can reduce the amount of work the GPU can do.

However, instancing itself IS a good idea.I belive Frostbite 2 (BF3 engine) treats every draw call as an instanced draw call, the only thing which varies is the number of instances. This nets them a large lowering of the per frame draw calls required which is important as, CPU side, that will be your limiting factor.

Share this post


Link to post
Share on other sites
So is it best then to group objects by materials (i.e. shaders) to cut down on the amount of switching between shader programs?

Brute force would seem to be to have each game object have a rendering callback function that the renderer runs, which makes the object draw itself, but if I have a game object made up of a lot of different materials that all have different shaders, this would seem to be inefficient since I'd be doing a lot of shader swapping. It would also seem that I'd want to break out like materials for transparency requirements. The only problem with separating by materials is it seems like it then becomes more complex to keep game objects associated with their renderable objects, and then I'd have to make a pass over my list of game objects to separate their renderable objects into materials before I start rendering.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!