Current Wasteful DirectX Calls - DX11

Started by
2 comments, last by ErnieDingo 6 years, 10 months ago

Before you read, apologies for the wall of text!  

I'm looking to leverage efficiencies in DirectX 11 calls, to improve performance and throughput of my game.  I have a number of bad decisions I am going to remedy, but before I do, I am just wanting to get input into I should put effort into doing these.

I've been running for a while with a high frame rate in my game, but as I add assets, its obviously dipping (its a bit of an n squared issue).  I'm fully aware of the current architecture, and I'm looking to take care of some severe vertex buffer thrashing i'm doing at the moment. 

Keep in mind, the game engine has evolved over the past year so some decisions made at that time in hindsight are considered bad, but were logical at the time.

The scenarios:

Current: my game world is broken up by quad tree.  I'm rendering the terrain geometry and water geometry separately and in different vertex buffers.   Currently I am using Raw Draw Calls which means that I am very wasteful on computational power.  

Goal: Use Index buffers to reduce vertices by 80%, compress my index buffers and vertex buffers into one index buffer and vertex buffer.  I can't reduce the number of draw calls as its per leaf.

Current: Static assets such as trees etc are bound to each leaf of my quad tree, as I traverse the tree to see whats in view/out of view, I trim the leaf which in turn trims all the static assets.  This means there is an instance buffer for each node AND for each mesh.  

Goal: Compress the instance Buffers into one instance buffer per mesh (Ie, even if 10 meshes are in 1 vertex buffer, I need 10 instance buffers), for all meshes, compress the meshes into 1 index buffer and 1 vertex buffer.  I can not reduce the number of draw calls.

Current: My unlimited sea function reuses the same tile mesh and just remaps with a constant buffer.  This means, if there are 10 tiles, there are 10 draw calls and 10 constant buffer updates.

Goal: Simple, Use an instance buffer and remove the constant buffer updates (I was lazy and wanted to do this quick :)).  Reduces it to 1 draw call, 1 instance buffer bind and 1 vertex buffer bind.

Current: Each shader, i'm rebinding the same constant buffers, these buffers only change at the start of a new scene (shadow AND rendered).  

Goal: Create a map of buffers to be bound once per context, use consistent registers.   Combine wasteful buffer structures into 1 buffer.  Reduce number of constant changes.  More negligible for deferred contexts but still worth it.

All these changes are not difficult as I have layered my graphics engine in such a way that it doesn't disturb the lower levels.  Ie. Instance management is not bound to mesh directly, mesh management allows for compression easily.    All static buffers are set immutable in my game, so vertex, index and most index buffers are immutable.

So the questions: 

- Are some or all changes worth it?  Or am I going to just suffer from draw calls?  

- I am assuming at the moment that Setting vertex buffers, index buffers, instance buffers are part of the command buffer?  Is this correct, i'm looking to reduce the number of calls pushed through it.

- I assume in a deferred context world, that constant buffers when set are not persistent across contexts when I execute command lists.

- Lastly, should I look into Draw Indexed instanced indirect to accumulate draw calls?  And would I get any benefit from the GPU side doing this?

 

 

 

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

Advertisement

I think, at least for D3D11, the biggest thing you can do is limiting the amount of state changes you make to the pipeline. This is usually accomplished by grouping meshes that have similar pipeline parameters. As for the QuadTree, is there perhaps a priming read you can do of the tree, get the information that you need from each node, and try to dispatch that to the GPU in one go? Obviously this will result in more app side processing of the scene, but you can than profile, and see if the end result is better.

The goal is to keep the GPU busy, sending small piece meal workloads while you continually prepare the next list of commands on the CPU may result in the GPU being under-utilized

The raw draw call in 11 is where a lot of things are resolved. From what I understand most of the deferred context calls are in fact deferred up to this point. So cutting those down will be a huge benefit (and of course other calls to the deferred context).

Yeah, I've read the state changes are a killer.  I am using Texture atlases to reduce texture swapping.  

I will probably reduce the number of constant buffer changes to sub 10 over the entire rendering cycle, at the moment its around 30+.

I'm steering clear of deferred contexts at the moment since im using an AMD card, I just test that pipeline for the nvidia users.

The issue I have at the moment is profiling the GPU, the reason.  Im using C# and Sharpdx, and the external GPU tools currently have can't pick up Direct X 11 is running.  I regret in some ways my decision to choose these technologies.  The visual studio tool at the least works, and I need to look more deeply into it.

What I can do, i've broken my game into 2 parts, the game logic component and the rendering component.  What I should do is use the GPU downtime in the logic component to do more prep/rendering, im sure profiling will show this up.  I haven't built a jobs component really, this might be something I need to consider.

Will let you know how the state change reductions go and impact on performance.

 

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

This topic is closed to new replies.

Advertisement