DirectX 12 command queues

Started by
6 comments, last by Obliique 5 years, 4 months ago

Hi I'm currently going through microsoft online documentation and I came across information that I'm not sure I have a grasp on , particularly concerning command queues. The documentation at some point says a command queue can write to the same resource simultanouesly at the same time if the appropriate flag to the resource is set. 

My question is.. Upon work submission to the command queues. Can it be a requirement for these command queues represent one gpu adapter, in cases were I define two that is. If yes , Does the gpu process both queues in parallel? My other question would be does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ?  I understand that the queue stores command submitted from an application and the order of execution is first in first out execution by the gpu.

Advertisement

I talk about Vulkan here, but i assume it's the same for DX12...

2 hours ago, John321 said:

Does the gpu process both queues in parallel?

Yes. I have used 3 queues to test async compute, and all 3 of them process in parallel.

I've tested this only with compute shaders (no rendering) and with AMD GPU.

It seems while one queue is stalled when processing a memory barrier, the others keep working as expected. Also if the workloads are small, this is a good way to keep the GPU saturated. Downside is the need to do expensive sync across queues, and splitting to multiple command lists so all queues can be fed with work. Both have a recognizable cost.

Keep in mind the case of 'automatic async compute', which happens even with a single queue if you have multiple dispatches but no barriers (so dependencies) in between. This is the preferred way, if possible (but it also gives confusing profiling results!).

Be sure to check performance differencies across queues! (On AMD the graphics queue has best compute performance, the compute queues are likely thought for async stuff and for me they were two times slower. Not sure if that's similar in DX too.)

 

2 hours ago, John321 said:

does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ?

No, but it depends on the hardware. I think Intel recommends to use only one queue at all and NV is traditionally limited with async compute too, but i don't know details. I'd be curious how RTX cards behave here. (If anyone knows...?)

In any case you have to do manual sync across the queues to handle dependencies.

Thanks for the reply. This is very helpful, I've only been working with a single command queue and I'm using intel integrated graphics. I'm interested in how the profiling can be on done on multiple queues if its not much to ask. Should I measure the time based on when a fence point is reached on the command queue or are there better ways to profile when a gpu finishes proccessing a set of commands?

1 hour ago, John321 said:

I'm interested in how the profiling can be on done on multiple queues

I use timestamps before and after each dispatch, and it was not necessary to change anything here when using multiple queues (using VK not DX).

You can use a really large number of timestamps and they still do not affect performance, so you can do it much more fine grained than just at the level of fences without worries. (Of course making it all optional with #ifdef)

Thanks JoeJ, I will definitly look into timestamps ,

D3D12 is not the same as Vulkan when it comes to queues. In Vulkan you can query how many queues (and of which type) are supported by the device, and then you bind to those queues. The idea is that if the actual hardware supports N extra compute queues, then they'll be exposed in Vulkan and you can submit to any of those to have the work execute concurently. In D3D12 you can create as many queues as you want regardless of what the hardware supports. The queues are "virtualized" in D3D12, which means that the OS/scheduler can do things like "flattening" multiple submissions into a single hardware queue (this is possible because the queues in D3D12 are subsets of each other's functionality). I talked about this a bit in this article (scroll down to the section called "The Present: Windows 10, D3D12, and WDDM 2.0"). If you want to know for sure that your command lists are executing in parallel, you'll need to use a tool like GPUView or AMD's Radeon Graphics Profiler.
 

Thanks for the time for putting up such a great article MJP. I've already read a good chunk of it.  :-)

This topic is closed to new replies.

Advertisement