D3D11: Deffered Contexts,command lists and state propagation problems and questions

Started by
4 comments, last by Corefanatic 12 years, 6 months ago
Hi all,
I am writing a demo that is using D3D11 with deferred contexts and command lists. Here is the setup I have got so far:

Tasking system using Intel Threading Building Blocks.

Each thread that scheduler uses to execute tasks has a unique deferred context tasks can use

My render work is split up into "layers", which can be perceived as "views" or rendering steps, i.e. opaque geometry is its own layer, transparent another one and post-processing effects each have its own

Within the layer, objects are sorted based on the vertex shader,pixel shader, mesh and texture they use in that order, this is done to minimize state changes and repetition of state changes

Objects are then separated into batches based on a specified number of objects per each batch, 10 for example

Each batch is then rendered into a deferred context in its own task and command list is created.

Command list is put in a queue in its right execution position.

Command lists are then executed sequentially on immediate context.



I have had limited success with this setup due to the nature of state propagation using deferred contexts, here are the issues:

Since the different batches belonging to the same layer may be executed on a different deferred contexts, it does not make sense to save the state between batches.
When executing command lists in the right order,pipeline state does not propagate into the next command list executed (Why Microsoft,WHY!!!!!!?)

To illustrate what I am talking about, here are two simple consecutive batches:



Command list 0:

ClearBackBuffer();
ClearDepthBuffer();
SetRenderAndDepth();
SetViewPort();
SetVS();
SetVSConstantBuffers(); //View matrix and projection same for all objects
SetPS();
SetInputLayout();
SetTopology();
SetVertexBufer();
SetIndexBuffer();
SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

CommandList 1:

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();

SetVSConstantBuffers(); //world matrix per object
DrawIndexed();


Now as you can see above, it would be great if the state propagated from list 0 to list 1 when they are execute in order, but it does not (at least that's how I understand the workings of D3D11)

So, what can be done?

I have a small idea, but need to bounce it off you D3D gurus ;).

Since you can execute a command list with a deferred context, you can combine two into one, therefore no problem with state propagation. Would it work? Any ideas?

Thx for any suggestions.
Advertisement
In short; nothing can be done.

Each command list resets the state of the device to 'default' and is self contained.

While this might seem daft it isn't really; the amount of hastle required to track state and ensure things propigate properly between command lists both on driver and client app side simply isn't worth it.

I'm pretty sure you aren't meant to execute command lists and then record that into a command list, but even if you are allowed it wouldn't matter, all that would happen is that you'd get the following execution;

[reset device]
[execute command list 0]
[reset device]
[execute command list 1]


You'd still get no state propagation.

In short; nothing can be done.

Each command list resets the state of the device to 'default' and is self contained.

While this might seem daft it isn't really; the amount of hastle required to track state and ensure things propigate properly between command lists both on driver and client app side simply isn't worth it.

I'm pretty sure you aren't meant to execute command lists and then record that into a command list, but even if you are allowed it wouldn't matter, all that would happen is that you'd get the following execution;

[reset device]
[execute command list 0]
[reset device]
[execute command list 1]


You'd still get no state propagation.


I think you are not completely right, as per your description command lists would have very narrow use and would not bring much advance in multi-threaded rendering. In my tests so far I have been able to draw multiple objects as I have described in my example using two command lists. I still have a few things I have to try out to be completely sure, but I am certain command lists have much greater use that you imply.
It seems you are right after all. I did some more tests and it really seems command lists are useless for the way I imagined I could use them.

So now I have two options:

1) Each layer gets written into a single command list which will keep state intact and since each layer is in a way independent of the previous layer, I should get some speed up from multi threading, however not with the granuality I was hoping for.

2) Split layers that have a lot of draw commands into two command lists, setting the proper state at the beginning of the second and live with the extra api calls but better granuality and usage of the worker threads.

I will probably try both, but can you help me pick the better one to go with first?

Thx
It very much depends on the design of your game/engine/renderer.

'Scenes' are a common way to split up work; by this I mean you use a command list to render out a shadow map, or do a normal pass, or a depth pass etc. You could also split per-camera into the world as well; so you might have a reflection map generated on a deferred context.

Another option is to sort all your draw calls by material and spread the groups around the various threads.

Command lists certainly aren't "limited" they just require that you do some state setup in advance per-command list and don't try to make them too small. If you find you are rendering one object per command list then you are doing it wrong ;)

Guide lines from GDC2011; ~12 command lists generated per core, ~1ms to generate each CL is a good target (less is, of course, better as that would eat 12ms of a 16.6ms frame)

Jon Jansen - DX11 Performance Gems from GDC2011 is probably worth a look.
Thank you for the GDC recommendation, will look it up on GDC Vault now.

My next steps are gonna be render each layer into command list. Once I get that going I will look into splitting up the draw calls in each layer into separate batches, if there are enough so that the split is worth it and of course set the proper state at the start of each batch.

I am not parallelizing existing engine, I am researching on how to write a parallel one, so there will be a bit of hit and miss before I get it right :).

Anyways, next step is clear, I will post back in few days after it is done.

Once again thx for your help.

This topic is closed to new replies.

Advertisement