I've been studying multithreaded command submission in D3D12 and realized I want to render my scene front to back. So I need to synchronize the command list submissions and was considering using fences but I remembered reading fences are expensive and not to use to many. So I'm looking for the best way on the CPU side to order my command list submission... any ideas?
[D3D12]FrontoBack rendering and multi-threaded rendering
What do you mean you need to synchronize your command list submission?
If you have 4 threads each recording a command list with a subset of 1000 models (0-249 on Thread 0, 250-499 on Thread 1 etc) and they're in front to back order then you simply need to submit those command lists to ExecuteCommandLists in the order you want them executed.
A bubble-sort algorithm is effective for already partially sorted arrays, so if your observer does not change position in world revolutionary, you would be most set with this, as for cache coherency and computation.
It is actualy so trivialy fast, that moving the sorting a thread away can imply a harm in most scenarios (mind shared cache traveling to exchange mem to thread, unless you are socketing, what is not cheapest as well).
Profile.
You have one GPU queue with multiple command lists being submitted to it, by one CPU thread. There's no synchronization problem there at all.
All you need to do is ensure that those many-lists have actually been generated (by your many threads) before one thread submits them. That's a traditional multithreading problem with no link to D3D :)
I've been studying multithreaded command submission in D3D12 and realized I want to render my scene front to back.
Sorting a set is a problem which makes multithreading go grab a popcorn though.
Process your scene computation problems that need single processing unit first, then off load parallel problems (or backwards, or at once), and after it, use the central list when finsihed.
Fences are for CPU<->GPU synchronization, or synchronization between multiple GPU queues.
I know but unless I'm mistaken I can order submission using them if I wanted to. (although it will cause bubbles in both CPU and GPU execution)
You have one GPU queue with multiple command lists being submitted to it, by one CPU thread. There's no synchronization problem there at all.
All you need to do is ensure that those many-lists have actually been generated (by your many threads) before one thread submits them.
Are there any other options?
That's a traditional multithreading problem with no link to D3D
Yeah I was going to post this in general but I figured there would be more context here.
Most of the benefits of going properly multithreaded are going to be from using multiple threads to generate multiple command lists and submitting them to a single command queue. I assume you're proposing creating multiple DIRECT command queues (one per thread) while also trying to synchronise work across these queues so it executes in the order you want?
Do you have a particular scenario in mind where serialising command list submission to a single DIRECT command queue is not ideal?
To get rendering in order ordered you have to submit your set of commands to the queue in order.
For one command list this just means
- sort the command set
- submit each command to the command list in order
- submit the command list to the graphics queue
Extending this to multiple command lists you need to
- sort the command set
- split the command set into subsets
- submit each subset to a command list in order
- submit the command lists to the graphics queue in order
In pseduo code
void Renderer::RenderFrame()
{
RenderModelRange opaque, translucent;
vector<CommandList> commandLists;
tie( opaque, translucent ) = parallel_partition( renderModels, IsOpaque() );
parallel_sort( opaque, FrontToBack() );
parallel_sort( translucent, BackToFront() );
parallel_reduce( renderModels, BuildCommandLists(commandLists) );
for_each( commandLists, SubmitCommandList(renderQueue) );
}
In this pseudo code BuildCommandLists is the magic it (see the parallel_reduce Body concept from TBB for example)
- splits the command set into subsets
- builds a command list for each subset
- aggregates the resulting lists into a vector (in order, command list building is associative but not commutative)
Of course this is pseudo code so there is a lot of details missing.
For example you need to limit number of command lists (number of subsets) to a reasonable number.
This pseudo code also waits for all command lists to be built before submitting any to the queue.
Ideally you want to submit the first list as soon as it's built.
And submit the Nth command list as soon as it's built and the (N-1)th command list has been submitted.
I assume you're proposing creating multiple DIRECT command queues (one per thread) while also trying to synchronise work across these queues so it executes in the order you want?
No, one queue multiple submission threads. But right now I am imagining something along the lines of round-robin submission...cmdlist from thread 1 then cmdlist from thread 2... then back at one again.
Extending this to multiple command lists you need to
sort the command set
split the command set into subsets
submit each subset to a command list in order
submit the command lists to the graphics queue in order
What I was kicking around in my mind was using a spatial subdivision for a course grained sort, then divide the work up (this is a little complex since there will be overlap) among multiple threads, sort and build cmdlists and then use some sort of synchronization to submit in order.
But I will think about what you proposed.