When you submit command lists to a command queue, what ordering guarantees / expectations do you have?
According to MSDN:
GPU work submission
To execute work on the GPU, an app must explicitly submit a command list to a command queue associated with the Direct3D device. A direct command list can be submitted for execution multiple times, but the app is responsible for ensuring that the direct command list has finished executing on the GPU before submitting it again. Bundles have no concurrent-use restrictions and can be executed multiple times in multiple command lists, but bundles cannot be directly submitted to a command queue for execution.
Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order.
That last sentence is where I'm confused.
Is it the case that if I build N command lists and call ExecuteCommandLists(...) with an array of those N command lists, that they are processed in order? That much seems to be true.
The fuzzier part for me is how transition barriers and fences play into the submission order. Say I have a Z-Prepass and a shadow pass, and then some G-Buffer pass. Assuming I transition barrier everything correctly, am I expected to submit my Z-Prepass / Shadow pass command lists before the g-buffer command lists?
That would basically mean I have to schedule my submission thread to wait for all the precursor work to come in from the job system before it can submit. This is what I am expecting that I have to do, but it's pretty unclear to me. It doesn't help that none of the samples online actually do a job-system based multithreaded demo :)
I would love an elaboration on how the driver actually schedules the command list work.
Thanks!