[D3D12] Command list submission ordering

Started by
4 comments, last by ZachBethel 7 years, 10 months ago

When you submit command lists to a command queue, what ordering guarantees / expectations do you have?

According to MSDN:

GPU work submission

To execute work on the GPU, an app must explicitly submit a command list to a command queue associated with the Direct3D device. A direct command list can be submitted for execution multiple times, but the app is responsible for ensuring that the direct command list has finished executing on the GPU before submitting it again. Bundles have no concurrent-use restrictions and can be executed multiple times in multiple command lists, but bundles cannot be directly submitted to a command queue for execution.

Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order.

That last sentence is where I'm confused.

Is it the case that if I build N command lists and call ExecuteCommandLists(...) with an array of those N command lists, that they are processed in order? That much seems to be true.

The fuzzier part for me is how transition barriers and fences play into the submission order. Say I have a Z-Prepass and a shadow pass, and then some G-Buffer pass. Assuming I transition barrier everything correctly, am I expected to submit my Z-Prepass / Shadow pass command lists before the g-buffer command lists?

That would basically mean I have to schedule my submission thread to wait for all the precursor work to come in from the job system before it can submit. This is what I am expecting that I have to do, but it's pretty unclear to me. It doesn't help that none of the samples online actually do a job-system based multithreaded demo :)

I would love an elaboration on how the driver actually schedules the command list work.

Thanks!

Advertisement

When you submit command lists to a command queue, what ordering guarantees / expectations do you have?

The fuzzier part for me is how transition barriers and fences play into the submission order. Say I have a Z-Prepass and a shadow pass, and then some G-Buffer pass. Assuming I transition barrier everything correctly, am I expected to submit my Z-Prepass / Shadow pass command lists before the g-buffer command lists?

I would love an elaboration on how the driver actually schedules the command list work.

Thanks!

Every command in a command list is executed sequentially from first to last. The array you pass executecommandlists is processed in order first to last. Finally each executecommandlists call is processed in submission order.

Yes you are expected to submit your zprepass lists before your gbuffer command lists.

Practical_DX12_Programming_Model_and_Hardware_Capabilities.pdf look for this document and look at pages 6-9.

edit - as pointed out below I should've used the word submitted.

-potential energy is easily made kinetic-

I wouldn't go so far as to say every command in a command list is processed in order. The GPU can go wide or pipeline by default, the thing which really guarantees ordering within the same ExecuteCommandLists call is the barriers. They ensure data read/write coherency.

The last sentence on MSDN is just trying to say that submitting command lists to a queue from two threads at the same time will serialize on the CPU via a mutex, and the same ordering is respected on the GPU.

I used the wrong word... I used executed instead of submitted. Barriers prevent overlapping execution but submission still occurs in order, it command completion that can vary without barriers. But again submission order is the order given by the programmer correct?

-potential energy is easily made kinetic-

Yep, that's correct. The command processor should process the commands in order, but the rest of the GPU hardware is largely unsynchronized without barriers to enforce ordering.

Great, that's what I expected. I feel like a lot of documents assume you know that and gloss over it.

On that note, when building a task graph, it seems like it's wise to statically bake out your high level render passes on your queue submission thread, batch up all the command lists in those passes (e.g. wait until all your Z-prepass lists come in, for instance), and then submit in dependency sorted order (wait to submit g-buffer until z-prepass group has been submitted).

This topic is closed to new replies.

Advertisement