D3D12 Fence and Present

Started by
15 comments, last by acerskyline 5 years, 3 months ago

I have been trying to figure out how does fence and present synchronize the pipeline when using vsync.

I have read https://computergraphics.stackexchange.com/questions/2166/how-does-vsync-affect-fps-exactly-when-not-at-full-vsync-fps

https://www.gamedev.net/forums/topic/677527-dx12-fences-and-swap-chain-present/

https://www.gamedev.net/forums/topic/679050-how-come-changing-dxgi-swap-chain-descbuffercount-has-no-effect/

https://software.intel.com/en-us/articles/sample-application-for-direct3d-12-flip-model-swap-chains 

and https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgiswapchain-present. 

But I'm still a little confuesd. My major question is, assuming we are using triple buffer, will Present block cpu thread? If yes, when will it block cpu thread?

I made this picture, please tell me which combination is the correct situation for next frame? In my opinion it should be B,E,H. But if it is really B,E,H, it doesn't conform to what the link#4 suggest under the classic mode section. As a matter of fact, I don't even understand how could GPU thread be 2 vsync late than CPU thread in the first place in that situation. Also, if it is really B,E,H, it doesn't conform to what Nathan Reed suggested in link#1. It seems in his example, cpu thread is not throttled by Present or vsync at all. Cpu threads start to work right after gpu finish its work.

 

 

vsync3.jpg

Advertisement

Blocking the CPU thread has nothing to do with fences. If you call Present 3 times very quickly, then the 4th time you call Present, it'll block until one of your previous 3 frames is done - really done, as in on-screen.

Unless, of course, you use the waitable object swapchain, in which case the waits for frame latency are done manually by the application, instead of in the call to Present. Additionally, this is the only way to change the maximum frame latency value in DX12 from the default of 3, and if you use this waitable object, the default for that scenario is 1.

3 hours ago, SoldierOfLight said:

Blocking the CPU thread has nothing to do with fences.

Isn't calling WaitForSingleObject on a fence block the CPU thread?

Also, I am wondering does Present block GPU thread?

Assume I have called Present 3 times very quickly, before the 4th time I call Present, I called ExecuteCommandList. After ExecuteCommandList, I called Signal and then I called Present. So it looks like this:

0.We have already completed step 1 to step 8 for 3 times.(i = 1, 2, 3. Now i = 1 again)

1.WaitForSingleObject(i)

2.Barrier(i) present->render target

3.Record commands...

4.Barrier(i) render target->present

5.ExecuteCommandList

6.Signal

7.Present

8.Go to step 1

Under this circumstance, please answer my following questions:

A.Step 1 may block CPU thread if previous work of frame 1 is not finished on GPU. Am I right?

B. Assume previous work of frame 1 has finished on GPU. Step 7 may block CPU thread if none of the previous 3 frames is done - really done, as in on-screen. Am I right?

C.If the answer to B is yes, then CPU thread will be blocked at step 7 but command list has already been submitted, what will happen to GPU thread? Will GPU thread be blocked? If yes, by what (If the answer is yes, I'm suspecting by the barrier recorded in step 2 (present->render target barrier))? If no, where will GPU render to when none of the previous 3 frames is done - really done, as in on-screen?

2 minutes ago, acerskyline said:
4 hours ago, SoldierOfLight said:

Blocking the CPU thread has nothing to do with fences.

Isn't calling WaitForSingleObject on a fence block the CPU thread?

Yes, explicitly calling WaitForSingleObject on an event which will be signaled by SetEventOnCompletion is related to fences. All I meant was that any implicit blocking within the Present API call is not necessarily related to fences, it's only related to the "maximum frame latency" concept.

To answer your specific questions:
1. Yes.
2. Yes.
3. Yes. Work which is submitted against a resource that is being consumed by the compositor or screen is delayed until the resource is no longer being consumed. The fact that the command list writes to the back buffer is most likely detected during the API call to the resource barrier API, and implicitly negotiated with the swapchain and graphics scheduler at ExecuteCommandLists time, to ensure that the command list doesn't begin execution until the resource is available.

Also to clarify, by "GPU thread" we're talking about the command queue. If you had a second command queue, or a queue in a different process, it'd still be possible for that queue to execute while the one writing to the back buffer is waiting.

57 minutes ago, SoldierOfLight said:

Work which is submitted against a resource that is being consumed by the compositor or screen is delayed until the resource is no longer being consumed.

Continue my previous example, please bare with me.

Now, assume one of the previous 3 frames is done - really done, as in on-screen, and the GPU workload for the current frame is very heavy.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

0.We have already completed step 1 to step 8 for 3 times.(i = 1, 2, 3. Now i = 1 again)

1.WaitForSingleObject(i)

2.Barrier(i) present->render target <---------------- "GPU thread" (command queue) was here

3.Record commands...

4.Barrier(i) render target->present

5.ExecuteCommandList

6.Signal

7.Present <------------------------------------------------- CPU thread was here

8.Go to step 1

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

cpu ... present|

gpu  ... barrier|----------------heavy work----------------|

              -----------------------------------------------------------------------------

              |   3   |   1   |

              |   2   |   3   |   1   |

              |   1   |   2   |   3   |   1   |

              -----------------------------------------------------------------------------

screen   |   3   |   1   |   2   |   3   |   ?   |

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

My question is:

What should the question mark be in the diagram above? Or will this happen? Thanks!

I'm not sure I'm following your diagram or question.

Are you asking what is displayed if you let your GPU queue drain entirely (i.e. stop submitting new work)? The screen just doesn't update and it will continue displaying buffer #3 until it has something new to replace it. The CPU won't be blocked though, because at that point you've only got one frame queued, so the CPU will continue running ahead until it gets back up to 3 frames queued.

3 minutes ago, SoldierOfLight said:

I'm not sure I'm following your diagram or question.

Sorry, I should have given it a little explanation.

What I'm asking is what will happen if the GPU haven't finished rendering the frame but the Present is being "executed" to display it. The reason I didn't draw the rest of the pipeline is not that it's drained, it's just because I don't think its necessary to show the rest since it's irrelevant and it's also a lot of work to type the rest of it ; ).

A Present is a queued operation, just like rendering work. It doesn't get executed until the rendering work is done. If you submit rendering work A to a queue, and then rendering work B to a queue, it doesn't really make sense to ask what happens when B starts executing before A is done... because by definition A has to finish before B can start. A Present is queued the same way.

54 minutes ago, SoldierOfLight said:

A Present is a queued operation, just like rendering work. It doesn't get executed until the rendering work is done. If you submit rendering work A to a queue, and then rendering work B to a queue, it doesn't really make sense to ask what happens when B starts executing before A is done... because by definition A has to finish before B can start. A Present is queued the same way.

Yeah! I totally agree. I am waiting for this. So, if Present is a queued operation, why does this diagram indicates that the CPU thread generate two "colored blocks", one on GPU queue and one on "present queue", and the time line looks like the Present is ahead of the actual rendering. Does it make sense?

main-figure1-large.png

Capture.JPG

Ah, I see where the confusion's coming from. A frame in the "present queue" is waiting for all associated GPU work to finish before actually being processed and showing up on screen, as well as for all previous frames to be completed.

The way I prefer to think about / visualize it is that a frame is waiting in the GPU queue until all previous work is completed, and is then moved to a present queue after that, where it waits for all previous presents to complete.

This topic is closed to new replies.

Advertisement