[d3d12] command queues vs hardware queues (ACEs)

Started by
9 comments, last by Yours3!f 8 years, 7 months ago

hi there,

do command queues (https://msdn.microsoft.com/en-us/library/windows/desktop/dn788627(v=vs.85).aspx)

correspond directly to hardware queues aka ACEs on GCN?

ie. should I create the same number of compute queues as there are ACEs on the GPU?

I suppose there should be only one graphics queue, as the hardware (GCN) can only use one.

Is this the same with DMA copy engines? (same number of copy queues)

or should there be one command queue per async submission thread? (ie. 1 graphics/compute/copy queue per thread)

afaik it is advised to use one command allocator, one command list and one fence per thread. Is this true?

best regards,

Yours3!f

Advertisement

As far I saw, it's better to create only the copy and compute queues you really need by the render logic (by a data oriented point of view). Of course you need to profile your implementation, better on different HIVs hardware.

If you are talking about how many compute queues create to run compute works in concurrency with graphics, probably the best number is one (with background priority).

Note also you cannot retrieve any information about the adapter engine configuration and implementation of hardware engines and works queues mapping, moreover nothing guarantee you that on a particular graphics architecture the number of the "hardware engines" remains on every devices of different performance/cost rank.

EDIT: since background/low priority queues are not available on current version of D3D12, just assign a high priority to the graphics queue.

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

As far I saw, it's better to create only the copy and compute queues you really need by the render logic (by a data oriented point of view). Of course you need to profile your implementation, better on different HIVs hardware.

If you are talking about how many compute queues create to run compute works in concurrency with graphics, probably the best number is one (with background priority).

Note also you cannot retrieve any information about the adapter engine configuration and implementation of hardware engines and works queues mapping, moreover nothing guarantee you that on a particular graphics architecture the number of the "hardware engines" remains on every devices of different performance/cost rank.

EDIT: since background/low priority queues are not available on current version of D3D12, just assign a high priority to the graphics queue.

thank you :)

seems like for now one should suffice... MSDN vs the graphics samples is confusing, because on MSDN they have as many queues as threads in the example codes, but in the samples they have one only. They populate commadn lists on separate threads, and submit on the main graphics thread after syncing.

There is only one graphics queue queue per adapter node, but the same restriction does not apply to compute and especially copy queues as far I remember. There are no restriction to the number of threads submitting command lists on a single queue, of course you need some kind of synchronization between different threads.

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

There is only one graphics queue queue per adapter node, but the same restriction does not apply to compute and especially copy queues as far I remember. There are no restriction to the number of threads submitting command lists on a single queue, of course you need some kind of synchronization between different threads.

yeah I know that, I guess I'll have to measure out if multiple command queues get me additional perf or not.

I did not test this, but I can guess having two copy queues with different priorities could be a good example where more than one queue are useful: a higher priority queue for things you need to load immediately before presentation and a "normal" priority queue for background copy operations.

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/
Threads on the CPU are used to gain access to multi-core and hyperthreading resources.

Queues on the GPU are used to gain access to multi-GPU and async-compute ("GPU shader hyperthreading") resources.

Don't make one queue per CPU thread just to make your life easier. Make them only where you explicitly intend to create GPU-side command concurrency. e.g. computing while rasterizing, or copying while computing.

I did not test this, but I can guess having two copy queues with different priorities could be a good example where more than one queue are useful: a higher priority queue for things you need to load immediately before presentation and a "normal" priority queue for background copy operations.

yeah of course that makes sense :)

Threads on the CPU are used to gain access to multi-core and hyperthreading resources.

Queues on the GPU are used to gain access to multi-GPU and async-compute ("GPU shader hyperthreading") resources.

Don't make one queue per CPU thread just to make your life easier. Make them only where you explicitly intend to create GPU-side command concurrency. e.g. computing while rasterizing, or copying while computing.

so what do you advise if I want to say do compute stuff while doing shadow map rendering (ie. only depth passes)

one graphics + one compute queue?


so what do you advise if I want to say do compute stuff while doing shadow map rendering (ie. only depth passes)
one graphics + one compute queue?
Yes, and then all of the necessary events/fences to synchronize the resources that are being shared between the two queues (just like you would for code that was split across two threads on a CPU).

This topic is closed to new replies.

Advertisement