[D3D12] Multi-threading: Command Queue, Allocator, List

Started by
3 comments, last by Alessio1989 8 years, 8 months ago

According to the documentation, "a given allocator can be associated no more than one currently recording command list at a time." So realistically, in a multi-threaded rendering setup, each thread would want its own Allocator, and then would record one or more lists (sequentially) using that allocator.

Is that right?

And multiple Queues might be for when you are rendering to an off-screen texture, which you then use to render to your main-thread Queue?

Does that all sound right?

Advertisement

In this presentation: D3D12-A-new-meaning-for-efficiency-and-performance.ppsx

Look at slide 10 it says to reuse lists/allocators with similar data.

Basically allocators don't release there memory when reset, so they keep memory the size of the greatest allocation associated with it. I guess that might become an issue.

I think there was another presentation with some advice, I'll try to find it for you.

But you are right that a minimum of one allocator per submission thread is necessary. I have to review when to/how to reset/reuse lists before giving you a better answer.

edit - I found another presentation though its not the one I was thinking about (unless I'm thinking about a video). Getting-the-best-out-of-D3D12.ppsx slide 28&29

-potential energy is easily made kinetic-

Yep, except, from my understanding, multiple queues should be somewhat rare.

AFAIK, there's no GPUs yet that have more than one graphics queue in HW, although plenty do have multiple compute and DMA queues.

As I understand it, multple queues is useful when you want the GPU to be executing DMA copies or "async compute" shaders concurrently with the commands in your main graphics queue.
i.e. the GPU might be rasterizing some triangles, copying a texture AND running a compute/dispatch all simultaneously, which requires very careful use of fences/events to avoid race conditions (just like traditional concurrent programming on the CPU does).
You are correct on the queue front.

AMD GCN hardware has 1 gfx queue, at least 2 compute pipes (GCN1.0, the 290X I have at home has 8) and, iirc, 2 DMA engines. The 'compute pipes' are referred to a Async Compute Engines, or ACE, and each can handle multiple command queues and keep more than one job in flight.

NV is a bit more complex, before Maxwell 2 you basically couldn't have a Gfx pipe and a compute pipe active at once. Maxwell 2 removes this restriction, giving you 1 gfx pipe and 31 compute pipes. However NV aren't forthcoming with details so it is unknown how those pipes match up to queues.

Intel don't have any speciality hardware and, due to how it is design, show little improvement with D3D12 in the first place. Still worth using however as it will still reduce CPU overhead.

(Queue = memory address we are reading commands from, pipe = hardware consuming said queue.)

And yes, multiple queues allow work to be dealt with independently in an optimal fashion.
For example, if you had a copy, a gfx and a compute work which is independent you could put that all into the graphics queue BUT it would take time for the graphics command processor to chew all that and distribute it. You also have the serial nature of pushing each command into a single queue to execute it.

By contrast by using a separate queue for each piece of work the GPU can dispatch it at the same time as each queue will be directed at the correct bit of hardware. Front end pressure on the graphics command processor is dropped by 2/3rds as instead of dealing with 3 commands it is now dealing with 1, and the hardware can be utilised fuller faster. (You can also setup and dispatch each piece of work independently on the CPU side so a win there too).

This is, of course, a simple example, but when you start throwing loads of copies and more gfx and compute work into the mix you can see the win.

How you split things up is, of course, up to you and finding the right balance is key.

Hi, as far I remember there is still only one graphics (default) queue per adapter node.

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

This topic is closed to new replies.

Advertisement