# DX12 [D3D12] Is ExecuteCommandLists asynchronous, does Present stalls, how signalling works?

This topic is 909 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello,

I'm reading and reading about dx12 but still have some problems with understending basic concepts. I'm an active reader of this section of the forum but still can miss something. Please redirect me to the correct place if my questions are duplicate. So, here's a msdn Hello World example.

1. The method OnRender(). We populated a command list and executed it with ExecuteCommandLists() method. The documentation says: "Submits an array of command lists for execution.". Not a lot . I bet the commands doesn't send immediately, right? What is happening during a travel of command list from CPU to GPU, a stall?
2. Next a method Present(). But what is happening here? CPU stall again? What if it called when commands are not finished and there's nothing to present?
3. Next we're synchronizing by calling first ID3D12CommandQueue::Signal(). Is it possible this command execute before ExecuteCommandLists() (if ExecuteCommandLists() is asynchronous) and the signal become before or in the middle of the supplied commands??
4. Next we're waiting:
if (m_fence->GetCompletedValue() < fence)
{
ThrowIfFailed(m_fence->SetEventOnCompletion(fence, m_fenceEvent));
WaitForSingleObject(m_fenceEvent, INFINITE);
?}


what if fence was updated immediately after GetCompletedValue() but before SetEventOnCompletion()? Isn't it a data race?

Edited by nikitablack

##### Share on other sites
• The method OnRender(). We populated a command list and executed it with ExecuteCommandLists() method. The documentation says: "Submits an array of command lists for execution.". Not a lot . I bet the commands doesn't send immediately, right? What is happening during a travel of command list from CPU to GPU, a stall?
It is asynchronous (meaning it is submitted on the CPU time line, but it will not actively wait for the GPU timeline to complete the task before returning).. but the commands get sent right away (it doesn't wait for anything). What happens is that your program lives and executes in user land. The windows DDI does not allow (currently) to submit commands directly from user land.
Because of that submitting commands to the GPU is triggering a user/kernel transition, which is a bit expensive (to do at every draw call).

Once upon a time, because of this user land limitation, commands would be batched by the runtime and driver and would be submitted all at once at random times (not immediately).
(though some commands would force this submission to happen, this was not how commands were typically submitted).

Now with dx12 YOU control the rate of submission and execute calls are submitted immediately. So you are the one making the judgement call to build a batch of commands (through command lists) big enough to not trigger the user/kernel transition too often.

But will the GPU see the command immediately after the submission ? Well it depends. If there's nothing in the pipe being rendered then that
command list could be seen immediately by the GPU. If there is still work to be executed then it will be put into a queue for execution.

There's actually a higher priority queue that will take up any new work that is posted there before looking at the other work posted by normal apps.
(as an application writer you should not worry about that detail).

• Next a method Present(). But what is happening here? CPU stall again? What if it called when commands are not finished and there's nothing to present?

Present will stall.. but only if you hit the render limit you set (or the one set by the API). Typically by default it is three frames of GPU work can be submitted
before a Present() call will stall. It is to prevent the CPU from going way more ahead than practical. You can control that rate and sometimes it is encouraged to do so to limit latency (the time it takes for an input to be taken into consideration and having an effect visible to the end user on their monitor).

That stall does not need to consume CPU power (it can be paused then resumed at the next vblank), but your app will be stuck in that thread during that time (which can be okay.. or not well it's up to you).
Because it doesn't consume CPU power, your OS/CPU can either run another thread that still has work to do, or go into a idle mode that does not consume as much electricity.

• Next we're synchronizing by calling first ID3D12CommandQueue::Signal(). Is it possible this command execute before ExecuteCommandLists() (if ExecuteCommandLists() is asynchronous) and the signal become before or in the middle of the supplied commands??

There's the notion of API order. And multiple time lines. In the current time line, things are ordered in the order they are submitted to that time line.
If in your timeline you submitted the Signal() AFTER the Execute() then you should be guaranteed that the Execute() is all done when you receive the message that the Signal() has completed.

This is really important as you're using fences before you recycle, reset, destroy, resources and can't have them still in use by the GPU when you do.

• Next we're waiting:
if (m_fence->GetCompletedValue() < fence)
{
ThrowIfFailed(m_fence->SetEventOnCompletion(fence, m_fenceEvent));
WaitForSingleObject(m_fenceEvent, INFINITE);
?}


• what if fence was updated immediately after GetCompletedValue() but before SetEventOnCompletion()? Isn't it a data race?

It's not a race condition, because if the condition becomes true after the if() is taken, then WaitForSingleOBject() will simply return immediately.

This code is functionally equivalent to the one you posted :
// Signal and increment the fence value.
const UINT64 fenceToWaitFor = m_fenceValue;
ThrowIfFailed(m_commandQueue->Signal(m_fence.Get(), fenceToWaitFor));
m_fenceValue++;

// Wait until the fence is completed.
ThrowIfFailed(m_fence->SetEventOnCompletion(fenceToWaitFor, m_fenceEvent));
WaitForSingleObject(m_fenceEvent, INFINITE);
But it doesn't do a quick early check for fenceToWaitFor, as a consequence it will set the event every time (you can see that as a mini-optimization, not changing the meaning of the code). Edited by LeGreg

##### Share on other sites

Thank you guys. Now it's clear.

Recently I read Intel's article and it helped me a lot to understand swap chains. Highly recommend.

1. 1
2. 2
Rutin
22
3. 3
4. 4
JoeJ
16
5. 5

• 14
• 29
• 13
• 11
• 11
• ### Forum Statistics

• Total Topics
631774
• Total Posts
3002286
×