[D3D12] Binding multiple shader resources

Started by
7 comments, last by SoldierOfLight 7 years, 11 months ago

Hello,

In D3D12 we have SetGraphicsRootDescriptorTable/SetComputeRootDescriptorTable on command list to bind a range of descriptors from the descriptor heap. Very often descriptors are scattered in the descriptor heap. Potentially, you will end up with multiple calls to SetGraphicsRootDescriptorTable and SetGraphicsRootDescriptorTable to bind them.

Recently, MJP has published his DeferredTextureing sample https://github.com/TheRealMJP/DeferredTexturing, where he is following this principle to manage resources:

1.SRVs, CBVs, UAVs, Samplers are created in shader invisible descriptor heap by default. Thus, they have CPU read/write access but not GPU access. I know that you need to have your descriptors in this descriptor heap to be able to execute CopyDescriptors and ClearUnorderedAccessView. Maybe, there are more use cases.

2.During the shader resource binding stage, he copies their descriptors into shader visible descriptor heap, by calling CopyDescriptors on D3D12 device. This allows to have GPU access to the resources and to group them in one range to have only one call to SetGraphicsRootDescriptorTable/SetComputeRootDescriptorTable.

This approach, by placing your descriptors at the creation stage into shader invisible descriptor heap and then copying them into shader visible descriptor heap during binding stage, seems like a good idea to handle operations, requiring CPU read/write access on your resource and grouping your resources in one descriptor table. I am wondering if this is the recommended way to manage resources?

I also had a quick look at the source code in Unreal Engine. They seem to be using the same CopyDescriptors principle while binding resources.

Microsoft D3D12 samples, I had a chance to go through, I find not very representative when it comes to demonstrating this.

Usually, they have a few render passes, where they manage to place descriptors grouped together in the descriptor heap.

In the app with multiple render passes, it is very unlikely to have grouped descriptors for each draw/dispatch call in advance.

CopyDescriptors feels like the way to overcome that.

There is an article from Intel where they talk about D3D12 resource binding model advantages over D3D11 one https://software.intel.com/en-us/blogs/2014/08/07/direct3d-12-overview-part-4-heaps-and-tables. I am interested in Redundant Resource Binding section, where they describe the need for the driver to copy resource bindings to a new location to bind them for the draw call. Isn't explicit CopyDescriptors does what actually is handled by the driver in their case? Is it really advantage of the resource management in D3D12 then? I am not really sure if we can compare this two copy operations.

I would be really appreciative to hear your input on this.

Thanks

Advertisement

I expect a lot of content which is being ported from older APIs like D3D11 will have the CopyDescriptors model. Most content typically has an upper level and a graphics layer, and all they're doing is porting the graphics layer, which exposes the same interface as all the other graphics layers. In this case, the graphics layer typically has a concept of a "view" as something that can be bound, which, as it turns out, doesn't really match the hardware designs these days. So these engines need to collect sets of "views" in order to bind them, which results in a CopyDescriptors gather operation.

However, I do have first-hand experience with content designed from the ground up for low-level APIs like D3D12, and I can tell you that there's almost no CopyDescriptors. Instead of the primitive for binding being a "view" it's a "view table". These tables have their own lifetimes, and there's a hard upper limit such that all of them will fit in a single shader-visible descriptor heap.

I expect that moving forward we'll see more content moving towards this model, where we'll start to see significant benefits from the expressiveness and flexibility of the D3D12 bind model.

Jesse thank you for the idea!

I tried to understand how "view table" concept could be implemented. I guess, you could have dedicated range of descriptors in the descriptor heap for each render pass.

If the same resource is used in multiple render passes, you will have its descriptor copy per each descriptor range.

Also, for each descriptor range you will need to track resource objects associated with each descriptor to be able to transit the resource into a corresponding state for a particular render pass.

You still have to deal with descriptor copies but in this case you can pre-populate descriptor heap at the level loading stage thus avoiding dynamic copies for each render pass.

Also, for each descriptor range you will need to track resource objects associated with each descriptor to be able to transit the resource into a corresponding state for a particular render pass.

Unless your engine also has the concept of a resource state change at the high level as well... :) Then there's no tracking required, the graphics layers which need state assume it's already correct, the engines which don't can ignore those operations. Just a thought.

Jesse thanks again for the input!

The CopyDescriptors approach is mostly for convenience and rapid iteration, since it doesn't require you to have descriptors in a contiguous table until you're ready to draw. For a real engine where you care about performance, you'll probably want to pursue something along the lines of what Jesse describes: put your descriptors in contiguous tables from the start, so that you're not constantly copying things around while you're building up your command buffers.

I also want to point out that the sample demonstrates another alternative to both approaches in its use of indexing into descriptor tables. In that sample it works by grabbing all of the textures needed to render the entire scene, putting them in one contiguous descriptor table, and then looking up the descriptor indices from a structured buffer using the material ID. Using indices can effectively give you an indirection, which means that your descriptors don't necessarily have to be contiguous inside the descriptor heap.

That's going to be basically the same performance as assembling your tables up front right? Is the downside that the overall heap will probably be a little bit larger than if it were designed from the beginning?

I guess if you suddenly need a lot of new tables it would take a perf hit too?

I expect that moving forward we'll see more content moving towards this model, where we'll start to see significant benefits from the expressiveness and flexibility of the D3D12 bind model.

I'm trying to head in this direction :)

In anticipation of GNM, D3D12 and Mantle (RIP, now in anticipation of D3D12 and Vulkan :lol:), we replaced "texture bindings" in our engine with "resource list bindings". We created a new API object - the resource list, which is basically an array of shader-resource-views - much like how a cbuffer is an array of constants.

Our graphics API lets the user create Resource Lists of a particular size (e.g. large enough to fit 3 SRV's), can call UpdateResource on them to fill in their contents, and Map/Unmap (with the different modes, such as WRITE_DISCARD or NOOVERWRITE). At the moment, this is all a bit of a charade, as most of the time on D3D11, CreateResourceList just calls malloc, UpdateResource calls memcpy and map just returns the pointer.

In our shaders, a the syntax for declaring the textures looks like below:


ResourceList( 0, Pixel, 'Material', {
  t_Diffuse = Texture2D(float4),
  t_Specular = Texture2D(float4),
})
 
ResourceList( 1, Pixel, 'Lighting', {
  t_ShadowAtlas = Texture2DArray(float),
  t_LUT = Texture2D(float4),
})

This declares:

Res-list slot 0 is visible to pixel shader only, and contains two Tex2D's for the material.
Res-list slot 1 is visible to the pixel shader only, and contains a Tex2DArray and a Tex2D from the lighting system.
That declaration ends up generating this HLSL code at the top of the shader:


//ResourceList Material : slot(0)
Texture2D<float4> t_Diffuse : register(t0);
Texture2D<float4> t_Specular : register(t1);
//ResourceList Lighting : slot(1)
Texture2DArray<float> t_ShadowAtlas : register(t2);
Texture2D<float4> t_LUT : register(t3);

I'm still getting around to the D3D12 port, but from what I've read of the API so far... I think that this means that I'll be able to implement my ResourceList API objects as ranges within an CBV_SRV_UAV descriptor heap, and the root-signature the this shader above would have two root-descriptor-tables -- one for the Material data, and one for the Lighting data. When the material system and the lighting system ask to bind their ResourceLists, I'll actually just be setting these root-descriptor-tables to these pre-allocated heap-ranges.

My CBuffer management is still based on the D3D11 model (there's 14 CBuffer slots in my API -- they do not get placed inside ResourceLists like texures/buffers do). So I'll probably have to do the CopyDescriptors pattern to create a contiguous table of the draw-item's cbuffer bindings.

However, my engine is also a stateless renderer that works in two steps:

#1) You create "draw-items", which requires specifying all the parameters for a [Multi]Draw[Indexed][Instanced] call, along with all the pipeline state, and all the resource bindings (CBuffers, ResourceLists).

#2) You can submit draw-items into drawing contexts / command buffers.

So, I guess I'll be able to do the CopyDescriptors work during step #1, to build a table per draw-item which contains it's contiguous cbuffer views, and then step #2 can be executed many times, which will be able to simply bind this pre-created table.

While I'm here I may as well add an actual question to the pot though :) I'm reading Frank Luna's D3D12 book, and in his examples, he creates a single shader-visible CBV_SRV_UAV heap, writes directly into it from the CPU, and has the GPU read directly from it.

Is it preferable to have the CPU write into a non-shader-visible heap, and then copy from that one into a shader-visible one for the GPU to read from?

What's the difference -- will the CPU writes be extremely slow in Luna's use case?

Hodgman, your engine sounds a lot like the one I have experience with, and it should be a very efficient approach. And yes, that engine reserved a section of its descriptor heap for dynamic CBVs which were created as part of a "draw-item". The primary difference is that their resource lists were fixed-size instead of dynamically-sized, which helps prevent fragmentation and freelist tracking in the descriptor heap, but wastes some space. Something to consider.

It is expected that the implementation of the Create*View methods which initialize descriptors are implemented by the driver in a write-only manner, and that the CPU virtual addresses backing the GPU-visible descriptor heaps are write-combined. If we make both of these assumptions, then there should be a performance win by writing directly to the shader-visible descriptor heaps, since you don't have to copy the memory around.

However note that CopyDescriptors is likely significantly cheaper than Create*View, so if you plan on using an identical descriptor multiple times in multiple places, it's probably better to stage it once in a non-shader-visible heap than to create it multiple times.

This topic is closed to new replies.

Advertisement