Jump to content
  • Advertisement
John321

DX12 DirectX 12 command queues

Recommended Posts

Hi I'm currently going through microsoft online documentation and I came across information that I'm not sure I have a grasp on , particularly concerning command queues. The documentation at some point says a command queue can write to the same resource simultanouesly at the same time if the appropriate flag to the resource is set. 

My question is.. Upon work submission to the command queues. Can it be a requirement for these command queues represent one gpu adapter, in cases were I define two that is. If yes , Does the gpu process both queues in parallel? My other question would be does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ?  I understand that the queue stores command submitted from an application and the order of execution is first in first out execution by the gpu.

Share this post


Link to post
Share on other sites
Advertisement

I talk about Vulkan here, but i assume it's the same for DX12...

2 hours ago, John321 said:

Does the gpu process both queues in parallel?

Yes. I have used 3 queues to test async compute, and all 3 of them process in parallel.

I've tested this only with compute shaders (no rendering) and with AMD GPU.

It seems while one queue is stalled when processing a memory barrier, the others keep working as expected. Also if the workloads are small, this is a good way to keep the GPU saturated. Downside is the need to do expensive sync across queues, and splitting to multiple command lists so all queues can be fed with work. Both have a recognizable cost.

Keep in mind the case of 'automatic async compute', which happens even with a single queue if you have multiple dispatches but no barriers (so dependencies) in between. This is the preferred way, if possible (but it also gives confusing profiling results!).

Be sure to check performance differencies across queues! (On AMD the graphics queue has best compute performance, the compute queues are likely thought for async stuff and for me they were two times slower. Not sure if that's similar in DX too.)

 

2 hours ago, John321 said:

does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ?

No, but it depends on the hardware. I think Intel recommends to use only one queue at all and NV is traditionally limited with async compute too, but i don't know details. I'd be curious how RTX cards behave here. (If anyone knows...?)

In any case you have to do manual sync across the queues to handle dependencies.

Edited by JoeJ

Share this post


Link to post
Share on other sites

Thanks for the reply. This is very helpful, I've only been working with a single command queue and I'm using intel integrated graphics. I'm interested in how the profiling can be on done on multiple queues if its not much to ask. Should I measure the time based on when a fence point is reached on the command queue or are there better ways to profile when a gpu finishes proccessing a set of commands?

Share this post


Link to post
Share on other sites
1 hour ago, John321 said:

I'm interested in how the profiling can be on done on multiple queues

I use timestamps before and after each dispatch, and it was not necessary to change anything here when using multiple queues (using VK not DX).

You can use a really large number of timestamps and they still do not affect performance, so you can do it much more fine grained than just at the level of fences without worries. (Of course making it all optional with #ifdef)

Share this post


Link to post
Share on other sites

D3D12 is not the same as Vulkan when it comes to queues. In Vulkan you can query how many queues (and of which type) are supported by the device, and then you bind to those queues. The idea is that if the actual hardware supports N extra compute queues, then they'll be exposed in Vulkan and you can submit to any of those to have the work execute concurently. In D3D12 you can create as many queues as you want regardless of what the hardware supports. The queues are "virtualized" in D3D12, which means that the OS/scheduler can do things like "flattening" multiple submissions into a single hardware queue (this is possible because the queues in D3D12 are subsets of each other's functionality). I talked about this a bit in this article (scroll down to the section called "The Present: Windows 10, D3D12, and WDDM 2.0"). If you want to know for sure that your command lists are executing in parallel, you'll need to use a tool like GPUView or AMD's Radeon Graphics Profiler.
 

Share this post


Link to post
Share on other sites

Thanks for the time for putting up such a great article MJP. I've already read a good chunk of it.  :-)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!