• 10
• 10
• 12
• 14
• 15
• ### Similar Content

• While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
#define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
mov  qword ptr [rdx],rax
which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.

• By lubbe75
As far as I understand there is no real random or noise function in HLSL.
I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious?

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
• By NikiTo
Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.

if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

MSN>
discard: Do not output the result of the current pixel.
<MSN

As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

(what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
• By NikiTo
I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!

# DX12 D3D12 Root Signatures for different shaders

This topic is 991 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

How is one supposed to use Root Signatures for rendering lots of different things?  I've been reading through the documentation and sample code for DX12, but everything so far is set up for a single application that has one specific rendering purpose.

This page on Root Signatures says "Currently, there is one graphics and one compute root signature per app." Yet the first sentence in the paragraph is "The root signature defines what resources are bound to the graphics pipeline." If a root signature links the command list to the resources required by the shader (textures and cbuffers), then it would seem like I want a root signature per distinct shader.  (eg. one that has diffuse/normal textures, a different one that has 3 cbuffers, etc.)

Or am I supposed to instead make an uber root signature that contains the maximum SRVs and CBVs I could ever use?

##### Share on other sites

Now I'm thinking that the presence of the function "SetGraphicsRootSignature" implies that I should create multiple root signatures.  That quote must be saying that there can only be one in use at any given time.

##### Share on other sites

You could use multiple root signatures if you want, but it is advised to keep the number of root signatures used in your application to a minimum. It's pretty much a balance of finding the most minimal root signature with which you can render the largest amount of objects. That being said though you want to avoid creating huge root signatures as a catch-all solution as this might push it to use less efficient memory.

Also remember that switching root signatures means you have to rebind all data originally bound to the root signature as this state is not preserved between root signature rebinds, which can introduce some overhead.

I'd experiment with a couple of different solutions and see what works best for you.

##### Share on other sites

Aren't tables part of the root signature?

##### Share on other sites

Aren't tables part of the root signature?

Perhaps we should clarify a few confusions:

"Descriptors" are stored in the Heap.
A "Descriptor Table" is nothing more than a range of the heap.
This range of the heap must be set to the root signature.
The Root Signature has its own small heap for 'dirty' things.

In C jargon, we could say it's similar to the following:

struct Descriptor;

Descriptor myTable[256];
myTable[0] = Descriptor( ... );
myTable[1] = Descriptor( ... );
...
SetNumGraphicsRootDescriptorTables( 1 ); //At initialization, tell the Root signature we can have up to 1 table. AFAIK we cannot change it later.
SetGraphicsRootDescriptorTable( 0, &myTable[2], 5 ); //Bind myTable[2] through myTable[8]


This is of course simplified.
Whether you want to use just one table or multiple ones is up to you (probably you will have to use more than just 1 table because of certain restrictions, eg. Samplers must live in their own table). You probably want to have around 4 or 5 to change the Tables based on update frequency (e.g. do not put a texture that will be used by all objects, like a lightmap or a shadow map, in the same table as you will put diffuse textures).
The max amount of tables you can have depends on how you use the limited Root space.

Edit:
The key to performance is in baking the heaps as much as possible so that you just set the ranges of the table and fire rendering, rather than filling the heap data on the fly.
Examples in pseudo code (I'm assuming 2 texture per shader for simplicity in the explanation):
You can bake data like this:

Descriptor myTable[256];

//Once, during initialization:

myTable[0] = setupDescriptor( diffuseTextureF );
myTable[1] = setupDescriptor( specularTextureG );

myTable[1] = setupDescriptor( diffuseTextureH );
myTable[2] = setupDescriptor( specularTextureI );

myTable[3] = setupDescriptor( diffuseTextureJ );
myTable[4] = setupDescriptor( roughnessmapK );

//Every frame:
size_t lastId = -1;
for( i < numObjects )
{
if( renderable[i]->GetMaterialId() != lastId )
SetGraphicsRootDescriptorTables( 4, myTable[renderable[i]->GetMaterialId()], 2 );
drawPrimitiveCmd( renderable[i] );
}


Or you can do set every descriptor on the fly, D3D11/GL style:

//Every frame:
size_t currentTableIdx = 0;
size_t lastId = -1;
for( i < numObjects )
{
if( renderable[i]->GetMaterialId() != lastId )
{
myTable[currentTableIdx] = setupDescriptor( renderable[i]->GetTexture0() );
myTable[currentTableIdx+1] = setupDescriptor( renderable[i]->GetTexture1() );
SetGraphicsRootDescriptorTables( 4, myTable[currentTableIdx], 2 );
currentTableIdx += 2;
}

drawPrimitiveCmd( renderable[i] );
}


The first method lets you reuse descriptors if it gets reused even they're not contiguous. The last method only gets to reuse descriptor if renderable and renderable[i-1] used the same resources. Plus it wastes performance setting them up every time they change.
And of course you can reduce the number of calls to SetGraphicsRootDescriptorTables by dynamically indexing the textures in the shader as shown in the samples, which is basically a much better version of D3D11's texture arrays (beware of hardware limits).

Edited by Matias Goldberg

##### Share on other sites

However, since you want to minimize the number of root signatures,

Do you mean go out your way to minimize or just avoid duplicates?

If you mean go out your way could you explain the logic as to why?

##### Share on other sites

The main issue with having too many Root Signatures, is that each time you change the Root Signature, you must bind everything again (RootCBV, RootSRV, Descriptor Tables, etc), since there is no warranty that the next Root signature will match that. And this applies even if the new signature has a RootCBV in the same root slot. This is driver/hardware dependent so you can't rely on the fact that it just work for you (in the case that you don't bind everything again which it does seem to work on some platforms/cases).

Root Slots are not the same as the old slots of DX11 (like constant buffer or texture slot). Is not that you bind a CBV or a SRV to slot 0 and you can rely that is there for good.

A good approach is to have Root Signatures per usage. For example in my case I have 3 root signatures for meshes (one for DepthOnly/shadows, one for GBuffer and one for forward/material pass), so when I am going to render the shadows for the meshes, I just bind one root signature and I never change it for that pass.

There are cases like transparent meshes that are interleaved with particles that require two root signatures, but that is not the common case.

In the end, you shouldn't need more, at most, than a couple of dozens Root Signatures for your engine.

And about the root signature itself, using CBV in the root signature is fine, just as using root constants. In fact for instance data, or things that change per call is better to have it this way, otherwise you would need to modify the tables, but the root signature data is copied to the command list for each draw call, so if you the bigger it is the more expensive it gets. Remember that RootCBV takes 4 DWORDs, and tables only takes one DWORD.

On the GPU side of things, on some platforms, Constants in Root Sgnature can be cheaper than in a Constant Buffer (a CBV in the root or in a table) but it should never be slower. In the end is all about balance.

Note: SRV on the Root only work for buffers, not textures. So the only way to bind textures is using tables.

##### Share on other sites

As a follow up question:  Say I have two shaders that use slightly different resources.

2 Textures

1 CBuffer

1 Sampler

1 Texture

1 CBuffer

1 Sampler

Should I use the same root signature for these two shaders?  (One that strictly matches Shader A).  It seems like it would still work for Shader B, but it would have a place for an additional texture that the shader doesn't actually use.

Would I want to use the same root signature for B, or make one specific to its format?