[D3D12] Resource Barriers in Multiple Command Lists

Started by
9 comments, last by SoldierOfLight 8 years ago

Let's say I'm going to submit Command List 'A' and Command List 'B'. I'm guaranteed to submit 'A' before 'B'. I put 100 commands in 'A', and then a Resource Barrier to transition a resource from a pixel shader resource to a render target. The only thing I put in 'B' is a Resource Barrier to transition that same resource from a render target to a pixel shader resource.

Do I have any assurance that 'A' will transition the resource before 'B' does?

Advertisement

that will be ok. The command lists are executed in the order you have put them in the array you pass to the command queue.

from msdn:

"Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order."

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899114(v=vs.85).aspx


Do I have any assurance that 'A' will transition the resource before 'B' does?

Yes since command lists are executed in the order they are submitted to the command queue and a command list is executed start to finish in order.

Also you might find this video interesting, its on state tracking and resource barriers. It includes a multi threaded example which is an alternative way to accomplish what I think you're trying to do.

-potential energy is easily made kinetic-

How about if they're in different queues?

I have Command List 'A' and Command List 'B'. I put 100 commands in 'A' and a Resource Barrier from pixel shader resource to render target. 'B' is a command list for a Compute Queue, and only has a Resource Barrier from render target to shader resource before the Compute Dispatch.

How do I now assure that 'A' transitions the resource before 'B'?

Two things:

1. You can't actually transition out of the render target state on a compute command queue, because that may require a GPU command which is not executable by a compute queue.

2. In order to ensure ordering between the two queues, you use a fence. After you submit 'A', you submit a fence signal on the direct queue where you submitted 'A', and then you issue a fence wait on the compute queue where you will submit 'B'. You can use the same fence object each time you need to do this same sequence (an 'A' followed by a 'B'), just using a higher value for signal/wait each time.

For the sake of completeness, number 2 is a bit more complicated than that for the case when A writes to UAV and B reads from it (which is the more widespread way to use async compute).

First things first, fences do not ensure UAV writes by themselves. In essence fence is just waiting for a counter to reach certain value, that counter is set on execution queue, while UAV writes happen elsewhere. So if you have your A (on compute queue), it most likely writes to UAV so it cannot be synchronized by a fence just like that.

Second thing to get out of the way is that while the dependency A->B is quite clear, the dependency of B->A is much less clear but it's there (WAR hazard). So you need to synchronize two things, A->B and B->A.

Starting from the easy one, B->A. If all B does is reading (SRV) then it's enough to use a fence to synchronize since all reads happen during or before execution. So when B finishes, the counter increments and that means you're free to write stuff to the buffer.

A->B is a bit more complicated since you need to ensure UAV writes. This can be done explicitly by putting a resource barrier of UAV type on the same queue A is executing on, but before incrementing (signaling) the fence. While the actual resource transition happens on queue that B is executed on (due to limitations mentioned in "1").

Remember that buffers are created with implicitly set D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS, so that means the hardware will not prevent access to the same buffer scheduled from two different queues. You need to ensure correctness on your own.

A UAV barrier is only required to prevent parallelism between draws/dispatches that are submitted to the same queue as part of the same "execution group" (command list(s) that are submitted in a single call to ExecuteCommandLists). It does not serve any other purposes, and can be completely omitted if all operations to a UAV from those commands are really unordered with respect to each other. It has no impact on memory coherency, that is all managed by transition barriers.

A fence is sufficient to synchronize UAV access from different queues.

A fence is sufficient to synchronize UAV access from different queues.

This is contradictory to what some NVIDIA engineers have been telling me. The procedure I described is not one I invented on my own but rather was given to me by NVIDIA with all the remarks.

With the current API design of D3D12, the hardware needs to flush all caches and drain the pipeline at the end of a group of command lists, because the CPU might read/write resources at that granularity. A UAV barrier cannot have any additional effect on top of this. A transition barrier, however, can affect additional state beyond these, and so they still need to be properly tracked/managed.

Feel free to have the NVIDIA engineers follow up with me offline, I'd like to understand where the misunderstandings lie.

Some reference that backs up what I'm saying (though not entirely relevant to the usage at hand):

Notes on the aliasing barrier
The aliasing barrier may set NULL for both pResourceAfter and pResourceBefore. The memory coherence definition of ExecuteCommandLists and an aliasing barrier are the same, such that two aliased accesses to the same physical memory need no aliasing barrier when the accesses are in two different ExecuteCommandLists invocations.

With the current API design of D3D12, the hardware needs to flush all caches and drain the pipeline at the end of a group of command lists, because the CPU might read/write resources at that granularity.

Where did you find this in the documentation? I've never seen this written anywhere, I need to know if I missed something this important.

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement