Direct3D12 offsets into Constant buffers, similar to Vulkan's dynamic offsets

Started by
5 comments, last by Hodgman 6 years, 8 months ago

So these days i'm writing a D3D12/Vulkan abstraction for a project and i've hit a wall tackling resource binding. In an older renderer i wrote, i put all of my per-object uniforms into one big Uniform Buffer/Constant Buffer, copied all the data in one go and bound ranges of it using glBindBufferRange (GL) and XSSetConstantBuffers1 (D3D11) for each object in the scene. It seemed to be a more efficient approach than copying between draws.

The same thing can be done in Vulkan by creating a dynamic uniform buffer and providing Dynamic Offsets to vkCmdBindDescriptorSets.

But when it comes to Direct3D 12, i haven't seen an equivalent approach yet. The only thing that i came up with so far is to create multiple ConstantBufferViews for a Constant Buffer into a Descriptor Table and bind it by adding the appropriate offset to the Descriptor Table's address.


// Get the descriptor increment size
const UINT cbvSrvDescriptorSize = pDevice->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);

// Handle for the first object's constant buffer view
CD3DX12_GPU_DESCRIPTOR_HANDLE cbvSrvGpuHandle1(pCbvSrvHeap->GetGPUDescriptorHandleForHeapStart(), 0);
pCommandList->SetGraphicsRootDescriptorTable(0, cbvSrvGpuHandle1);

// Draw first object...

// Add offset to get the handle for the second object's constant buffer view
CD3DX12_GPU_DESCRIPTOR_HANDLE cbvSrvGpuHandle2(pCbvSrvHeap->GetGPUDescriptorHandleForHeapStart(), cbvSrvDescriptorSize);
pCommandList->SetGraphicsRootDescriptorTable(0, cbvSrvGpuHandle2);

// Draw second object...

While this approach works fine, it doesn't necessarily match Vulkan's dynamic offset approach, so i can't make a clean abstraction over the two.

So how would you guys go about handling this? And i'd love to know your approaches to abstracting the Vulkan/D3D12 binding model in general.

Advertisement

I have a big block of memory for streaming dynamic constants through. let's say for example that it's 32KiB in size (it's much larger in reality), then for that block of memory I pre-create 255 CBV's:
1x 32k view
2x 16k views
4x 8k views
8x 4k views
16x 2k views
32x 1k views
64x 512b views
128x 256b views

When placing dynamic constants in that memory, I can then fetch a pre-created CBV that corresponds to the particular allocation. I place these CBV's in a non-shader-visible descriptor heap, and at binding time I use CopyDescriptors to copy the appropriate CBV's into a dynamic descriptor table, and then bind that single descriptor table to a root argument.

34 minutes ago, Hodgman said:

When placing dynamic constants in that memory, I can then fetch a pre-created CBV that corresponds to the particular allocation. I place these CBV's in a non-shader-visible descriptor heap, and at binding time I use CopyDescriptors to copy the appropriate CBV's into a dynamic descriptor table, and then bind that single descriptor table to a root argument.

So you're copying per-object constants between draw calls instead of copying everything at once into a single large buffer? And wouldn't copying descriptors in every frame have a lot of overhead?

How do you handle dynamic constants in your Vulkan path?

Two other approaches that you can take:

  1. Create the CBV descriptor on-the-fly with the desired offset. Creating descriptors is fast, since there's no allocation or heavy resource creation
  2. Use a "root" CBV, which let's you pass an arbitrary GPU virtual address. 

I'd recommend #2, since it's easy and cheap. Nvidia also prefers it for various reasons.

7 minutes ago, MJP said:

Two other approaches that you can take:

  1. Create the CBV descriptor on-the-fly with the desired offset. Creating descriptors is fast, since there's no allocation or heavy resource creation
  2. Use a "root" CBV, which let's you pass an arbitrary GPU virtual address. 

I'd recommend #2, since it's easy and cheap. Nvidia also prefers it for various reasons.

I didn't know creating CBV descriptors was cheap. I should give that a try. I doubt i could use CBV's in the Root Signature for everything though, since they take up a lot of space. Now i have to figure out a way to neatly wrap this with Vulkan. Thank you! :D

4 hours ago, DiharaW said:

So you're copying per-object constants between draw calls instead of copying everything at once into a single large buffer? And wouldn't copying descriptors in every frame have a lot of overhead?

How do you handle dynamic constants in your Vulkan path?

I haven't finished a Vulkan path yet, so can't advise there.

Copying constants between draw-calls is just a call to memcpy into write-combined memory -- plenty fast! I put the dynamic constants in an upload heap and keep it persistently mapped for writing from the CPU.

I assume that copying a pre-made descriptor should be faster than creating one from scratch... I have used on-the-fly creation (MJP's #1) as well in an Xbone port and performance was fine, but I have not had time yet to implement both the pre-creation and on-the-fly creation strategies in a single project and profile them against each other.

4 hours ago, MJP said:

2. Use a "root" CBV, which let's you pass an arbitrary GPU virtual address. 

I support this as well, if that's what the current shader has requested. My shader compiler generates the root signatures, and then the behavior of the resource binding code is data-driven based on what kind of root signature has been generated. A CB slot will either exist as a root argument, in which case no CBV is required, and/or the root will contain a table of CBVs.

I also have another code path for immutable CB's -- in that case the memory allocation and the descriptor to use are static, which means I don't use my constants upload heap. Furthermore, I pre-create my "draw-calls" as a "draw item" structure, and if a draw-item is found to have immutable resource bindings, I can pre-create the descriptor tables that it uses and keep them around instead of dynamically creating them every frame. Honestly though, the wins are so small that it's not really worth it. Even when doing everything dynamically, without fancy optimization for the static/immutable cases, you can submit more draw-calls than a GPU can handle per frame in a handful of milliseconds on a single core :o 

This topic is closed to new replies.

Advertisement