D3D12: Texture2D/3D/Cube accessed inside a single SRV descriptor heap ?

Started by
11 comments, last by Adam Miles 8 years, 1 month ago

In my app I'm currently writing Shader Resource View every draw call even if they didn't change inbetween. After some measurement it looks like writing to descriptor heap is not cheap (it takes up to 2ms here) so I'm trying to modify my app to only write to the descriptor heap if necessary and let the shader access the descriptor heap using ids provided with constant roots.

Unfortunatly it looks like HLSL lacks a way to represent a descriptor heap that contains texture of different dimension. There is no "texture" type but Texture2D, Texture3D, ... so the descriptor heap is typed with a texture dimension.

For instance I have to declare in HLSL :

"Texture2d texture_array[10000] : register(t0);"

but from C++ side the descriptor range pointed by t0 can contains any type of texture since the dimension is stored in the descriptor.

Is there a way to "untype" texture_array ? I was thinking of declaring several descriptor array in HLSL and have them point to the same descriptor heap from c++ side but I'm not sure it works, beside I'm not sure if HLSL handles "aliasing pointer" well. Another solution is maybe to cast Texture2d to Texture3d or TextureCube but I'm not sure HLSL allows that.

Advertisement

2ms sounds way too much for a descriptor heap write.. how are you doing that? Creating the SRV (once) into a cpu writable heap and then using CopyDescriptors or some other way?

Also, I misclicked and downvoted you.. sorry. :/

I kept thinking about this and there's probably hardware limitations.

There are several types of types of divergences:

  • All pixels in a wavefront A access texture[0]; All pixels in a wavefront B access texture[1]; This is very likely because the wavefronts belong to different draws.
  • Pixel A accesses texture[0]; Pixel B accesses texture[1]. Both are Texture2D.

The first case is fine.
But the second one is not. You must specify NonUniformResourceIndex if you expect to be in scenario 2.

On the top of that, to cover what you want; we would have to add several more scenarios, where the type of texture could be divergent, not just the texture index.
Some hardware out there definitely cannot do that.

Note however, nothing's preventing you from doing this:


Texture2d texture_array2d[5000] : register(t0); //We assume the first 5000 textures are 2D
Texture3d texture_array3d[5000] : register(t5000); //We assume textures [5000; 10000) are 3D

Also note that Tier 1 supports up to 256 textures, so an array(s) of 10.000 will lock you out of a lot of cards (Haswell & Broadwell Intel cards, Fermi NVIDIA)

Edit: Just to clarify why it's not possible (or very difficult, or would add a lot of pointless overhead). There's three parts:

  1. Information about textures like format (i.e. RGBX8888 vs Float_R16, etc) and resolution. In some hardware it lives in a structure in GPU memory (GCN), in other hardware it lives in a physical register (Intel).
  2. Information about how to sample the texture (bilinear vs point vs trilinear, mip lod bias, anisotropy, border/clamp/wrap, etc). In GCN most of this information lives in a SGPR register that points to a cached region of memory. The border colour (for the border colour mode) lives in a register table. In Haswell this information lives in physical register IIRC.
  3. Information about the type of the texture, which affects how it is sampled (1D vs 2D vs 2D Array vs 3D vs Cube vs Cube Array). In GCN, sampling a cubemap requires issuing more instructions (V_CUBE*_F32 family if I recall); sampling 3D textures requires providing more VGPRs (since more data is needed) than for sampling 2D textures.

Your assumption is that the type of texture lives in GPU memory alongside the format and resolution (point 1). But this is not the case. It lives on the ISA instructions (point 3).

In fact D3D12 provides some level of abstraction: You think the format and resolution lives in GPU memory, when in fact on Intel GPUs it lives in physical registers (that's where the 256 limit of Tier 1 comes from btw. D3D11 by spec allowed up to 128 textures, and it happens to be both Fermi & Intel supported up to 256)

Therefore, it becomes too cumbersome to support this sort of generic-type texture you want.

After some measurement it looks like writing to descriptor heap is not cheap (it takes up to 2ms here) so I'm trying to modify my app to only write to the descriptor heap if necessary and let the shader access the descriptor heap using ids provided with constant roots.

  1. Make sure you're not stalling before writing to the desc. heap
  2. Make sure the assembly does not read from the desc heap memory by any random chance (write combined memory will hit you hard!)
  3. I prefer baking a pack of descriptors (i.e. 256 is the limit), and then swap them out depending on which pack I need.
  4. I assume you're not mapping and unmapping memory to write to the heap?

How can I check for stalling when writing to descriptor heap ?

In one of my app demanding scenario my "upload_texture" function (which uploads texture and fills a D3D12_SHADER_RESOURCE_VIEW and sampler view array) is taking 20 us,

the loop that writes the SRV and samplers view to my descriptor heap takes 120 us, but the combinaison of both takes 2000us (ie the 2ms) so maybe there's a stall there.

Is there a way to map and unmap descriptor heap ? I always use ID3D12Device::CreateShaderResourceView

How can I check for stalling when writing to descriptor heap ?

Well, if your code contains an explicit wait then I would first check there.
Otherwise, use a CPU profiler to see significant portions spent inside the driver with the callstack leading to your code. Or learn to use GPUView (also check this out).

Is there a way to map and unmap descriptor heap ? I always use ID3D12Device::CreateShaderResourceView

Could you provide some code? We're shooting in the dark.
Also name your GPU just in case.

I don't think there is a way to map/unmap descriptor heaps? There isn't even a way to write to a descriptor heap on a command list (I'm guessing a descriptor heap is a heavy abstraction on some older GPUs).

nvm I made a silly mistake when measuring time, I used microsecond granularity yet I was adding ~1000 sub micro second duration.
The texture upload function takes 800 us on average and the descriptor writing code 1200 us, this is coherent with the 2ms time.

I took a look at your code.

You're calling CreateShaderResourceView & CreateSampler every frame. Don't do that.

You need to create them once (i.e. on initialization), then copy the descriptors selectively based on the textures/samplers you will be using via CopyDescriptors or

CopyDescriptorsSimple (which is what Mona2000 suggested). Or alternatively create multiple descriptors with lots of textures each (i.e. "packs") and swap the descriptors if the texture you need isn't in the currently bound descriptor. With enough luck you will be calling SetGraphicsRootDescriptorTable 10 to 12 times per frame to swap the textures (depending on how many textures you pack together).

To be clear, your costly calls here are CreateShaderResourceView & CreateSampler, rather than the SetGraphicsRootDescriptorTable calls.

Also don't just reuse & share the samplers. From what I can induce, you are not reusing samplers. But the code is too complex to tell easily if this is the case. Take in mind the amount of different samplers you can use at a time is very low.

The only thing you should be doing every frame is calling CopyDescriptors, or checking if the texture is in the currently bound descriptor pack and if not, bind the pack that has it.

thanks !

This topic is closed to new replies.

Advertisement