"Xbox One actually has two graphical queues"

Started by
7 comments, last by Hodgman 8 years, 7 months ago

Reading here:

http://wccftech.com/xbox-one-directx-12-asynchronous-compute-hardware-specifications-ps4-comparison/4/

I was wondering, what is the point of having two graphical queues?

Is it to allow actual parallelism of different contexes?

Does it mean that command list can be parallelized not only on the CPU but also on the GPU ?

I will greatly appreciate any explanation on the motivation and the dev API usage possibilities of this "two graphical queues" hw design.

Thanks :)

Advertisement
The answer is covered under the async-compute sections.
It's hyperthreading for GPU's - when you've got a thousand cores, you want them to always try and find something useful to do.
e.g. Some draw calls are rasterization-bound, leaving compute cores idle, some are texture-fetch bound, also leaving compute cores idle. More queued up works gives those cores more chance of finding work to keep themselves busy with.

Thanks for the quick answer!

And how does that work API wise?
Can the gpu/driver deduce which drawcalls are not dependant on previous, or is there some API that let's the developer take care of that?

That's kind of the big deal with DX12 and Vulkan.

In D3D11 & GL there is only one visible queue API wise and the driver must take care of that and does a poor job because it lacks a lot of information to accurately deduce dependencies while ensuring it always looks correct.

In D3D12/Vulkan the developer must take care of it by inserting fences and barriers and explicitly handling the multiple queues and having the queues wait for the other queues.

Does XB1 really support multiple default/graphics command queues or just one graphics queue + compute queues in concurrency like all GCN on PCs? I was aware that XB1 comes with a GCN 1.1/Gen 2 Bonaire (aka 7970/R9 260 series)..

I guess this is a kind of info we can find in the XDK (that I do not have .-. )

edit: multiple graphics queues could be interesting with hardware tiled rendering... (dreamcast anyone?)

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/
Remember; the GPUs in the Xbox and PS4 are not just normal consumer GPUs but have been modified by both MS and Sony in small ways.

Two graphics command queues could simply be a case of one 'game' and one 'system' in order for the system to be able to launch rendering tasks without getting in the way of/blocked by game tasks.
I don't believe it has two render queues. What It has are two asynchronous compute engines (ACE) each of which manages several GPU tasks, switching amongst tasks as the current one stalls (e.g. Awaiting memory).

In a CPU, hyperthreading is a "logical thread" (all CPU state, such as registers) which utilizes the functional units that are unused by its companion thread(s) to make forward progress (e.g. When instruction sequence or data dependencies prevent a single thread from saturating the core's capacity to issue instructions each cycle.) An Ace is like that, except that it utilizes unused compute units (shader lanes).

I could be wrong about Xbox one not having multiple render queues, but no PC hardware yet has it. I suspect, though, that we will see two true render queues soon -- as it could help reduce latency in VR applications, which are very latency-sensitive.

throw table_exception("(? ???)? ? ???");

Close, but the layering is a bit more complicated than that.

An ACE is a higher level manager which will split work to compute units; internally the CU can schedule and control up to 40 wave fronts of work (4 x 10 'program counters' if you will) dispatching instructions and switching between work as required - the details are covered in AMD presentations, but basically from each group of 10 program counters it can dispatch up to 4 instructions to the SIMD, scalar, vector memory and scalar memory and program flow control units, which is the 'hyper threading' part.

(Each CU can handle 40 programs of work, each of those consists of 64 threads, multiple up by CU count and you get the amount of 'in flight' work the GPU can handle).

The ACE, which is feeding the CU, handles work generation and dispatch, along with work dependency tracking - from a CPU point of view it is more like the kernel secular, working out what needs to be dispatched to each core (although instead of just assigning work its more like a case of "I need these resources, can anyone handle it?" for the work, with the ability to suspend work (and, iirc, pull the state back) when more important work is required to be run on a CU).

The amount of ACEs varies across hardware; at least 2, currently a max of 8.

it can dispatch up to 4 instructions to the SIMD, scalar, vector memory and scalar memory and program flow control units, which is the 'hyper threading' part.

When getting dirty in hardware details, that is indeed the closest analogue to actual hyperthreading biggrin.png
But, I also think of the whole multi-engine/multi-queue high level system as being a hyperthreading analogue, because, if you forget about intra-task parallelism (i.e that Draw/Dispatch tasks are made up of thousands of pixels) each queue is just a linear sequence of Draw/Dispatch instructions. Multi-queue suddenly means that you've got 2+ sequences of instructions to pull work from. This is akin from going from having a single hardware thread to having 2+ hardware threads.

This topic is closed to new replies.

Advertisement