Jump to content

  • Log In with Google      Sign In   
  • Create Account






D3D11 Deferred Contexts

Posted by Jason Z, 23 July 2010 · 1,209 views


D3D11 Deferred Contexts


So, all the major surgery has been completed, and now that things are back up and running nicely I wanted to discuss deferred contexts just a little bit. From the DXSDK docs, conference presentations, and other various posts that google churned up, there is surprisingly little information on how to actually use deferred contexts in D3D11. As it turns out, the best reference I could find was actually the multithreading sample in the DXSDK itself...

From my small amount of research, there are generally two different ways to benefit from multithreading with D3D11: resource creation is one, and parallel render operation submission is the other. The first is fairly simple, since the device in D3D11 is free threaded - you more or less just create whatever threads are needed, and they can create the resources without worry about manually mutexing the device or anything like that. The second is a bit more tricky, and is where I think the actual treasure is buried.

Deferred contexts are basically identical to an immediate context (except for the return value of the GetType() method) except that the deferred version doesn't actually submit the state changes and execution requests to the GPU - instead it will build up a command list, which can then be executed on the immediate context. At first this sounds almost like double work, but in theory the command list execution should be faster than making individual function calls since everything is already formatted properly. As long as your rendering operations are parallelizable, then you just fire up a few threads, have them utilize a deferred context to each generate their command lists, and then have the immediate context execute each command list in the order that they should be. Sounds great so far...

I have modified Hieroglyph 3 to allow a 'PipelineManagerDX11' class to be passed to any method call that requires access to the pipeline. This cleverly remains incognito as to which type of context it houses - the object itself doesn't even know (except for the return value of the GetType() method). The beauty of doing this abstraction is that you can very easily have your rendering system decide at startup if it will utilize deferred contexts (i.e. on a 3 or 4 core processor), or if it won't (i.e. on a 1 or 2 core processor). A render view is handed a pipeline object, and executes its rendering pass on that pipeline - either it takes effect immediately, or it is cached into a command list and fired off later on.

With this change completed, I am now to the point where I am successfully utilizing deferred contexts to generate one RenderView's worth of commands into a command list. This is all single threaded action, but it proves out that the machinery is functional before I make the step into multithreaded rendering. This included caching all of the render views needed in a scene instead of recursively processing them all. They are still processed in the depth first order, but are now queued into the renderer before execution - this is the precursor to where these packages will be handed off to worker threads to build the command lists.

So now I must make the leap to multithreading, and then perform some testing on various scene types to find out what types of parallelization make the most sense and what is effective in speeding things up... It should be an interesting series of tests, so keep your eyes open for some further updates shortly!




I slight 'gotcha' which I missed when I started using deferred contexts; between each command list activation the device state is cleared.

This means you have no bound render target for example, which stumpped me for a while.

So, remember, when setting up a command list you need to setup EVERYTHING you need to render, from render target and down. So, you still need to batch things up as best you can for each command list generation.
Thanks for the tip [grin]!

Yes, I found that to be somewhat strange too at first. However, it worked out fairly nice since my render views are more or less setting most of the state anyways (or they have the facilities to do so if a particular state isn't already set). For example, I was originally using a single viewport state for all of my samples that was set once in the SetupEngineComponents() method. Now I moved it into the render views where it gets set every frame to allow deferred or immediate contexts to use it.

This design choice (about the state being reset for the deferred contexts) by Microsoft seems to be intended to simplify the driver implementation (that's just speculation on my part, but it seems like the intent) but it also seems to put some pressure towards making the contexts used for larger payloads. The larger the payload, the more amortized all of the extra state calls (which are normally already set) would be. At least that is my impression, as I haven't done much testing just yet...

It probably would be worth a good write up on deferred contexts to provide a complete source of the peculiarities of their use...
Yes, it's strange at first but then when you think about the fact you can't depend on one command list being executed before another it makes sense.

I think the choice is part driver ease of use and part user ease of use as well; if you could use previously set states then you'd end up with these wonderful dependancy graphs where certain command lists must exist and be executed before others to ensure proper state; nightmare.

The self-contained nature, while giving a bit of overhead for small batches, does make working with them much easier.

My current ideas for using command lists basically revolves around the idea of (at most) one command list per-material batch once the draw calls have been bucket sorted for material. That might change of course; maybe keep everyone with the same shader together but split on other material details, or group depending on batch size or viewport, but it seems like a good solution and lets you work with the seperate nature of command lists nicely.

End of the day; if you can get 8 threads (i7 for example) rattling through nicely batched draw calls than the small overhead of setting up the context for each command list is unlikely to matter anyway as you have 8x more processing power than in a single threaded solution [grin]
That's a good point actually - the state setup is more or less just CPU time to create a commandlist, and should incur minimal GPU cost since it is already in the proper format to feed in. Hopefully I can get some good feel for how the multithreading helps in which situations...
Well, I would hope that, for certain things, the driver can track details and spot when something is the same as a state already set on the GPU; I seem to recall render target changes not being the most light weight of operations so if the driver can spot that the current command from the command list is setting the same object as the render target again it should be able to avoid resending the command which might well avoid pipeline and cache flushes.

September 2014 »

S M T W T F S
 123456
78910111213
14151617181920
21 222324252627
282930    

Recent Comments

Recent Comments

PARTNERS