# D3D12 Best Practices

This topic is 1020 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi all,

I've been playing around with D3D12 and while going through the samples I've run into a couple of questions that I've yet to find an answer for. The questions I have are the following:

1) How many descriptor heaps should an application have?

Is it vital for an app to have one SRV/RT/DSV heap or is it fine to have several smaller heaps related to specific rendering tasks (I'm specifically wondering whether cmdList->SetDescriptorHeaps() can cause any possible cache coherency issues)? I remember reading somewhere that an app should have only one heap of each type but I can't remember where I saw it so my memory might just be letting me down at this point.

2) How should constant buffers be handled?

Throughout the samples I found that often times the applications created several constant buffers based on the exact same structure for different draw calls, e.g. instead of using map() on application init and then memcpy() to load per draw-call data into the constant buffer, the apps seemed to create n amount of constant buffers instead and used descriptor tables to handle correct resource referencing. Is that the way it should be done or have I misunderstood something (e.g. see the Bundles example)?

3) More generally, how should frame resources be handled?

This follows from the fact that the apps seem to be creating n times the number of resources used per frame: e.g. using double-buffered rendering, constant buffer descriptor heap size is given as 2 * numCBsPerFrame (where numCBsPerFrame is an array of CBs for different draw calls) (number of command lists seem to be allocated in a similar manner). What is the reason for doing this? I think this has something to do with GPU-CPU synchronization: preventing read/write clashes but I'm not sure.

4) What would be the suggested synchronization method? I'm currently using the one provided in the HelloWorld samples, i.e. I'm waiting for the GPU to finish before continuing to the next frame. This clearly isn't the way to go as my fullscreen D3D11 app runs at ~6k FPS whereas my D3D12 app runs at ~3k FPS. Furthermore, how would one achieve max framerate in windowed mode (I've seen this video but I don't really follow the logic - taking the last option, wouldn't rendering something for the sake of rendering cause possible stalls? I don't really understand this). Is the swapchain's GetFrameLatencyWaitableObject() useful here?

##### Share on other sites

I'm still a bit confused about the synchronization though - could you please explain it in more detail?

I've implemented the following NextFrame() sync function that is called immediately after Present():

bufIndex_ = swapChain_->GetCurrentBackBufferIndex();

frameFences_[currFrameIndex_] = fenceValue_;
THROW_FAILED(directCommandQueue_->Signal(fence_.Get(), fenceValue_));
++fenceValue_;

// advance to the next frame
currFrameIndex_ = (currFrameIndex_ + 1) % bufferCount_;
UINT64 lastCompletedFence = fence_->GetCompletedValue();

if ((frameFences_[currFrameIndex_] != 0) && (frameFences_[currFrameIndex_] > lastCompletedFence)) {
THROW_FAILED(fence_->SetEventOnCompletion(frameFences_[currFrameIndex_], fenceEvent_));
WaitForSingleObject(fenceEvent_, INFINITE);
}


By increasing bufferCount_ to 3, I can achieve 120fps and using 4 I can achieve 180fps but anything higher than that won't net me anything further. Also, I'm not sure how to interpret FPS tracing I've set up, in the sense that if I count the times my Render() loop is reached in one second, I get 180FPS but when I query the actual time passed, I get much higher values for some frames, e.g. see below where the first number is the FPS based on the actual time spent on Render() call and the one in brackets is the number of times Render() got called per second. I assume the 2nd is the actual FPS but I'm not quite sure how to interpret the first one - does it even mean anything? (similar output can be produced in the samples by outputting 1.0f / m_timer.GetElapsedSeconds() instead of GetFramesPerSecond()).

FPS: 65 (182)
FPS: 1663 (182)
FPS: 1693 (182)
FPS: 65 (182)
FPS: 1294 (182)
FPS: 2066 (182)
FPS: 63 (182)
FPS: 1058 (182)
FPS: 1739 (182)
FPS: 64 (182)
FPS: 1741 (182)
FPS: 2245 (182)
FPS: 65 (182)



So, I still have no idea how my D3D12 app compares against my D3D11 app: for my D3D11 app, I use the 1.0f / GetElapsedSeconds() method for acquiring max framerate, however what is the equivalent of that in D3D12?

Also, I decided to have a look at the GPU execution times with the VS2015 debugger (debug->start diagnostic tools without debugging) but all of the Event Names are Unknown so I can't really tell which API calls are which. Is this feature not yet supported for D3D12?

##### Share on other sites

So I've been doing further testing with the samples and when I compare the fullscreen modes in those applications to the ones in DX11, the differences are quite marginal:

Avg FPS DX11 (custom app): 8575

Avg FPS DX12 (Bundles app): 4518

Both versions use the same presentation model (FLIP_DISCARD), so I'm not sure where the difference comes from? I've only enabled the render target clearing and present operations, everything else is commented out in the apps.

Furthermore, following the idea of per frame CBs, if I perform renders to texture, should I queue them also on a per frame basis similar to the constant buffers? (nvm: the Multithreading sample provided me with the answer that yes this is the case indeed)

Edited by dr4cula

##### Share on other sites

4) What would be the suggested synchronization method? I'm currently using the one provided in the HelloWorld samples, i.e. I'm waiting for the GPU to finish before continuing to the next frame. This clearly isn't the way to go as my fullscreen D3D11 app runs at ~6k FPS whereas my D3D12 app runs at ~3k FPS. Furthermore, how would one achieve max framerate in windowed mode (I've seen this video but I don't really follow the logic - taking the last option, wouldn't rendering something for the sake of rendering cause possible stalls? I don't really understand this). Is the swapchain's GetFrameLatencyWaitableObject() useful here?

I only recently started browsing the D3D12 samples (only the hellotriangle so far) and in the wait for previous frame method it states the following:

// WAITING FOR THE FRAME TO COMPLETE BEFORE CONTINUING IS NOT BEST PRACTICE.
// This is code implemented as such for simplicity. More advanced samples
// illustrate how to use fences for efficient resource usage.

Isn't waiting for the frame to complete what Hodgman suggested here?

4) For double buffering, before starting next frame, wait for the previous frame to complete.
e.g. after submitting all commands for frame #2, make the CPU wait until the GPU has completed all commands for frame #1.
although the after submitting all commands for frame #2 part is confusing me a bit.

edit - also for those interested in more D3D12 best practices look here: https://developer.nvidia.com/dx12-dos-and-donts
Edited by Infinisearch

##### Share on other sites

Isn't waiting for the frame to complete what Hodgman suggested here?

In the sample, frame #2 doesn't begin on the CPU until the GPU has finished frame #1.
I'm suggesting that frame #3 doesn't begin on the CPU until the GPU has finished frame #1.

##### Share on other sites

Ah Thank you... brain short circuit.

##### Share on other sites

Thought I shouldn't start a new topic as it is kind of generic DX12 question: if I put my CBV in the root signature, does the driver version the data automatically, i.e. I wouldn't need a per-frame constant buffer resource?

Thanks!

##### Share on other sites

Thought I shouldn't start a new topic as it is kind of generic DX12 question: if I put my CBV in the root signature, does the driver version the data automatically, i.e. I wouldn't need a per-frame constant buffer resource?

Thanks!

You will still need a per frame constant buffer resource. You just won't need a per frame entry in a descriptor table for that CBV.

The only way to not need a constant buffer resources is to store constants directly inside the root signature, but you have very limited root signature memory, so you won't be able to store everything in the root signature.

Edited by TiagoCosta

1. 1
Rutin
24
2. 2
JoeJ
20
3. 3
4. 4
5. 5

• 9
• 46
• 41
• 23
• 13
• ### Forum Statistics

• Total Topics
631749
• Total Posts
3002047
×