Sign in to follow this  

[d3d12] command queues vs hardware queues (ACEs)

This topic is 823 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hi there,

 

do command queues (https://msdn.microsoft.com/en-us/library/windows/desktop/dn788627(v=vs.85).aspx)

correspond directly to hardware queues aka ACEs on GCN?

 

ie. should I create the same number of compute queues as there are ACEs on the GPU?

I suppose there should be only one graphics queue, as the hardware (GCN) can only use one.

Is this the same with DMA copy engines? (same number of copy queues)

 

or should there be one command queue per async submission thread? (ie. 1 graphics/compute/copy queue per thread)

 

afaik it is advised to use one command allocator, one command list and one fence per thread. Is this true?

 

best regards,

Yours3!f

 

 

Share this post


Link to post
Share on other sites

As far I saw, it's better to create only the copy and compute queues you really need by the render logic (by a data oriented point of view). Of course you need to profile your implementation, better on different HIVs hardware.

 

If you are talking about how many compute queues create to run compute works in concurrency with graphics, probably the best number is one (with background priority).

 

Note also you cannot retrieve any information about the adapter engine configuration and implementation of hardware engines and works queues mapping, moreover nothing guarantee you that on a particular graphics architecture the number of the "hardware engines" remains on every devices of different performance/cost rank.

 

EDIT: since background/low priority queues are not available on current version of D3D12, just assign a high priority to the graphics queue.

Edited by Alessio1989

Share this post


Link to post
Share on other sites

As far I saw, it's better to create only the copy and compute queues you really need by the render logic (by a data oriented point of view). Of course you need to profile your implementation, better on different HIVs hardware.

 

If you are talking about how many compute queues create to run compute works in concurrency with graphics, probably the best number is one (with background priority).

 

Note also you cannot retrieve any information about the adapter engine configuration and implementation of hardware engines and works queues mapping, moreover nothing guarantee you that on a particular graphics architecture the number of the "hardware engines" remains on every devices of different performance/cost rank.

 

EDIT: since background/low priority queues are not available on current version of D3D12, just assign a high priority to the graphics queue.

 

thank you :)

 

seems like for now one should suffice... MSDN vs the graphics samples is confusing, because on MSDN they have as many queues as threads in the example codes, but in the samples they have one only. They populate commadn lists on separate threads, and submit on the main graphics thread after syncing.

Share this post


Link to post
Share on other sites

There is only one graphics queue queue per adapter node, but the same restriction does not apply to compute and especially copy queues as far I remember.  There are no restriction to the number of threads submitting command lists on a single queue, of course you need some kind of synchronization between different threads.

Share this post


Link to post
Share on other sites

There is only one graphics queue queue per adapter node, but the same restriction does not apply to compute and especially copy queues as far I remember.  There are no restriction to the number of threads submitting command lists on a single queue, of course you need some kind of synchronization between different threads.

 

yeah I know that, I guess I'll have to measure out if multiple command queues get me additional perf or not.

Share this post


Link to post
Share on other sites

I did not test this, but I can guess having two copy queues with different priorities could be a good example where more than one queue are useful: a higher priority queue for things you need to load immediately before presentation and a "normal" priority queue for background copy operations.

Edited by Alessio1989

Share this post


Link to post
Share on other sites
Threads on the CPU are used to gain access to multi-core and hyperthreading resources.

Queues on the GPU are used to gain access to multi-GPU and async-compute ("GPU shader hyperthreading") resources.

Don't make one queue per CPU thread just to make your life easier. Make them only where you explicitly intend to create GPU-side command concurrency. e.g. computing while rasterizing, or copying while computing.

Share this post


Link to post
Share on other sites

I did not test this, but I can guess having two copy queues with different priorities could be a good example where more than one queue are useful: a higher priority queue for things you need to load immediately before presentation and a "normal" priority queue for background copy operations.

 

yeah of course that makes sense :)

Share this post


Link to post
Share on other sites

Threads on the CPU are used to gain access to multi-core and hyperthreading resources.

Queues on the GPU are used to gain access to multi-GPU and async-compute ("GPU shader hyperthreading") resources.

Don't make one queue per CPU thread just to make your life easier. Make them only where you explicitly intend to create GPU-side command concurrency. e.g. computing while rasterizing, or copying while computing.

 

so what do you advise if I want to say do compute stuff while doing shadow map rendering (ie. only depth passes)

one graphics + one compute queue?

Share this post


Link to post
Share on other sites


so what do you advise if I want to say do compute stuff while doing shadow map rendering (ie. only depth passes)
one graphics + one compute queue?
Yes, and then all of the necessary events/fences to synchronize the resources that are being shared between the two queues (just like you would for code that was split across two threads on a CPU).

Share this post


Link to post
Share on other sites

 


so what do you advise if I want to say do compute stuff while doing shadow map rendering (ie. only depth passes)
one graphics + one compute queue?
Yes, and then all of the necessary events/fences to synchronize the resources that are being shared between the two queues (just like you would for code that was split across two threads on a CPU).

 

allright, thank you! :)

Share this post


Link to post
Share on other sites

This topic is 823 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this