Texture sample as uniform array index.

Started by
1 comment, last by MJP 8 years ago

Background

So I'm trying to implement a material compositing system using Texture2DArrays and vector arrays indexed by a sample from a mask texture.

Questions

1.) Is it even possible to use a texture sample as an index into a uniform array?

2.) If it is, is there a huge performance hit?

Thanks.


[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

Advertisement

1) Yes.
2) On modern hardware, no.

Get the HLSL compiler to spit out the assembly code of your shaders and have a quick look. Even if you can't really read the assembly, "constant waterfalling" will stand out like a sore thumb... On some older shader models, the asm will look like (psuedocode):


if sample == 0 then value = array[0] else
if sample == 1 then value = array[1] else
if sample == 2 then value = array[2] else
if sample == 3 then value = array[3] else

This shouldn't happen in shader model 4/5... but it did used to be a thing that happened... and obviously was a huge performance hit :wink:

However, consider using a structured buffer (GL: texture buffer) instead of a constant buffer (GL: uniform array) -- these are a hint to the driver that you'll be reading random subsets within the buffer, whereas a constant/uniform buffer is a hint to the driver that every pixel/vert/etc will require all data contained within the buffer.

All modern hardware that I know of can dynamically index into constant (uniform) buffers. For AMD hardware it's basically the same as using a structured buffer: for an index that can vary per-thread in a wavefront, the shader unit will issue a vector memory load through a V# contains the descriptor (base address, num elements, etc.). On Nvidia, there's 2 different paths for constant buffers and structured buffers. They recommend using constant buffers if the data is very coherent between threads, since this will be lower-latency path compared to structured buffers. I have no idea what the situation is for Intel, or any mobile GPU's.

This topic is closed to new replies.

Advertisement