mark_braga

DX12 Do we need to rebind a descriptor table after CopyDescriptors operation?

Recommended Posts

I am working on optimizing our descriptor management code. Currently, I am following most of the guidelines like sorting descriptors by update frequency,...

I have two types of descriptor ranges: Static (DESCRIPTOR_RANGE_FLAG_NONE) and Dynamic(DESCRIPTORS_VOLATILE). So lets say I have this scenario:

pCmd->bindDescriptorTable(pTable);

for (uint32_t i = 0; i < meshCount; ++i)
{
  	// descriptor is created in a range with flag DESCRIPTORS_VOLATILE
  	// setDescriptor will call CopyDescriptorsSimple to copy descriptor handle pDescriptor[i] to the appropriate location in pTable
	pTable->setDescriptor("descriptor", pDescriptor[i]);
}

Do I need to call bindDescriptorTable inside the loop?

Share this post


Link to post
Share on other sites

As long as the descriptor is marked VOLATILE, then the guarantee is the hardware will read the contents of the descriptor at execution time. No need to re-apply the same descriptor handle to the command list.

Share this post


Link to post
Share on other sites

Thanks for the info. Since we are on the topic, just curious to know whether it's optimal to use:

  • The DESCRIPTORS_VOLATILE flag, create one big descriptor table and let the driver manage the versioning
  • Separate the tables based on update frequency?

The first scenario will have only one SetRootDescriptorTable call from app code. Not sure about driver code.

The second scenario will have multiple SetRootDescriptorTable calls depending on the update frequency and the driver has to do no versioning since the app manages it.

Thanks

Share this post


Link to post
Share on other sites

DESCRIPTORS_VOLATILE is the default behavior of root signature 1.0. It requires that drivers cannot read the descriptors at all until the GPU is executing. This means I can do something like:

commandList->SetGraphicsRootDescriptorTable(foo);
commandList->Close();
commandQueue->Wait(fence, 1);
commandQueue->ExecuteCommandLists(&commandList);
device->CopyDescriptors(foo, bar);
fence->Signal(1);

This would be valid, and the GPU would read the updated descriptors when it becomes unblocked. If you did that in root signature 1.1 without the DESCRIPTORS_VOLATILE flag, that would be invalid, because the descriptors changed from the point of recording the command in the command list, and the GPU executing those commands. It is a hint to the driver that they can read the descriptors on the CPU and potentially make optimizations off of that information - whether that means that they embed the entire contents of the descriptor in the command list or not is a driver implementation detail (and unless it's a buffer, I don't think anyone can do that today).

Share this post


Link to post
Share on other sites

@SoldierOfLight Unfortunately, the above example is not valid, we expect descriptors to be "set in stone" by ExecuteCommandLists (submission) time.  This is in accordance with the documentation below:

https://msdn.microsoft.com/en-us/library/windows/desktop/mt709473(v=vs.85).aspx#descriptors_volatile

"With this flag set, the descriptors in a descriptor heap pointed to by a root descriptor table can be changed by the application any time except while the command list / bundles that bind the descriptor table have been submitted and have not finished executing. For instance, recording a command list and subsequently changing descriptors in a descriptor heap it refers to before submitting the command list for execution is valid. This is the only supported behavior of Root Signature version 1.0."

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Forum Statistics

    • Total Topics
      627719
    • Total Posts
      2978797
  • Similar Content

    • By Mr_Fox
      Hi Guys,
      Does anyone know how to grab a video frame on to DX texture easily just using Windows SDK? or just play video on DX texture easily without using 3rd party library?  I know during DX9 ages, there is a DirectShow library to use (though very hard to use). After a brief search, it seems most game dev settled down with Bink and leave all hobbyist dx programmer struggling....
      Having so much fun play with Metal video playback (super easy setup just with AVKit, and you can grab movie frame to your metal texture), I feel there must be a similar easy path for video playback on dx12 but I failed to find it.
      Maybe I missed something? Thanks in advance for anyone who could give me some path to follow
    • By _void_
      Hello guys,
      I have a texture of format DXGI_FORMAT_B8G8R8A8_UNORM_SRGB.
      Is there a way to create shader resource view for the texture so that I could read it as RGBA from the shader instead of reading it specifically as BGRA?
      I would like all the textures to be read as RGBA.
       
      Tx
    • By _void_
      Hello guys,
      I am wondering why D3D12 resource size has type UINT64 while resource view size is limited to UINT32.
      typedef struct D3D12_RESOURCE_DESC { … UINT64                   Width; … } D3D12_RESOURCE_DESC; Vertex buffer view can be described in UINT32 types.
      typedef struct D3D12_VERTEX_BUFFER_VIEW { D3D12_GPU_VIRTUAL_ADDRESS BufferLocation; UINT                      SizeInBytes; UINT                      StrideInBytes; } D3D12_VERTEX_BUFFER_VIEW; For the buffer we can specify offset for the first element as UINT64 but the buffer view should still be defined in UINT32 terms.
      typedef struct D3D12_BUFFER_SRV { UINT64                 FirstElement; UINT                   NumElements; UINT                   StructureByteStride; D3D12_BUFFER_SRV_FLAGS Flags; } D3D12_BUFFER_SRV; Does it really mean that we can create, for instance, structured buffer of floats having MAX_UNIT64 elements (MAX_UNIT64 * sizeof(float) in byte size) but are not be able to create shader resource view which will enclose it completely since we are limited by UINT range?
      Is there a specific reason for this? HLSL is restricted to UINT32 values. Calling function GetDimensions() on the resource of UINT64 size will not be able to produce valid values. I guess, it could be one of the reasons.
       
      Thanks!
    • By pcmaster
      Hello!
      Is it possible to mix ranges of samplers and ranges of SRVs and ranges of UAVs in one root parameter descriptor table? Like so:
      D3D12_DESCRIPTOR_RANGE ranges[3]; D3D12_ROOT_PARAMETER param; param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; param.DescriptorTable.NumDescriptorRanges = 3; param.DescriptorTable.pDescriptorRanges = ranges; range[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV; .. range[1].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV; .. range[2].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER; .. I wonder especially about CopyDescriptors, that will need to copy a range of D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER and a range of D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.
      Thanks if anyone knows (while I try it :))
      .P
    • By Infinisearch
      So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:
      Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls
      Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists
      Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs
      Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones
      OS takes ~60μs to schedule upcoming work
      So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?
      What do you do to take this into account?
      What about other things like transitions?  I can only think of actual measurement in this case.
  • Popular Now