The most efficient way to create a const array in HLSL?

Started by
2 comments, last by pseudomarvin 8 years, 10 months ago

Say I have an array of data which does not change during the entire execution of a program, e.g. the Poisson Disk used for shadow mapping in the following code snippet:


	const uint sampleCount = 16;
	static float2 poissonDisk[sampleCount] =
	{
		float2(0.2770745f, 0.6951455f),
		float2(0.1874257f, -0.02561589f),
		float2(-0.3381929f, 0.8713168f),
		float2(0.5867746f, 0.1087471f),
		float2(-0.3078699f, 0.188545f),
		float2(0.7993396f, 0.4595091f),
		float2(-0.09242552f, 0.5260149f),
		float2(0.3657553f, -0.5329605f),
		float2(-0.3829718f, -0.2476171f),
		float2(-0.01085108f, -0.6966301f),
		float2(0.8404155f, -0.3543923f),
		float2(-0.5186161f, -0.7624033f),
		float2(-0.8135794f, 0.2328489f),
		float2(-0.784665f, -0.2434929f),
		float2(0.9920505f, 0.0855163f),
		float2(-0.687256f, 0.6711345f)
	};

	float shadow = 0.0f;
	float calculatedDepth = input.shadowPos.z - 0.002f;
	float2 shadowTexCoord = float2(input.shadowPos.x * 0.5f + 0.5f, -input.shadowPos.y * 0.5f + 0.5f);
	for (uint i = 0; i < sampleCount; i++)
	{
		if (!shadowTexture.SampleCmp(depthSampler, shadowTexCoord + poissonDisk[i] / 700.0f,
			calculatedDepth))
		{
			shadow += 0.5f / sampleCount;
		}
	}

What is the most efficient way of declaring it?

Options:

1) Send it to the GPU in a const buffer but the data will never change during the program.

2) Declare it as a local array as it is declared now. In this case does it get created on the GPU's stack every time the shader is executed or is GPU smart enough to move it to its 'static memory' and define it only once.

3) Put the static keyword in front. In this case, it should only be defined once right?

4) There are const global variables in OpenGL is there such an alternative in D3D?

5) Something else

Thanks for any advice.

Advertisement

Put the static keyword in front. In this case, it should only be defined once right?

From my understanding it's the same as in c/c++. Its defined once, and maintains static duration during run-time.

There are const global variables in OpenGL is there such an alternative in D3D?

The equivalent to my knowledge is a ConstantBuffer in D3D (ShaderModel 4.0+). Even if your data is to be declared outside of a constant buffer it still gets tucked into a default constant buffer under the hood.

To clarify a ConstantBuffer is a buffer that is optimized for constant variable usage and enables the app developer to specify data that is constant for that pass through the shader. Those are your constant variables in HLSL.

As to your original question, I don't know if it's the best way to do it (I can't say one way over the other), I don't think the way your doing it now is very performance prohibitive. At most when the function is called it has to allocate 128 bytes in video memory for that array.

Marcus

Talking about this sort of thing can get a bit complicated, since there's actually a layered approach to how shaders execute on a GPU. Your HLSL code is compiled to D3D bytecode, which is a bunch of instructions and declarations for a virtual ISA. When you actually bind that shader at runtime and use it, the driver will JIT compile the virtual ISA bytecode into the actual ISA used by the GPU. It's actually a bit like CLR/.NET in this regard, in that you have different semantics for both the virtual machine and the actual hardware. So to simplify things, I'm mostly just going to talk about D3D's virtual ISA here since there's such a wide variety of GPU's out in the wild.

The D3D shader ISA does not have any concept of a stack. So there is possibility of your array being placed into the stack, and read from there.The ISA can only read values from registers, or from a buffers/textures. Which one of these it uses will depend on how you declare the array, and also on the code that uses the array. In your particular case, it primarily depends on whether you declare your array in a constant buffer and whether or not the compiler unrolls your for loop. The first part is simple: if you declare your array inside of a cbuffer declarataion, or declare as a global without the "static" modifier, then the array data will be loaded out of a constant buffer. The second part is a bit more complicated. In general the HLSL compiler likes to unroll loops whenever it can, which is typically when the number of iterations is known at compile time. In your case the number of iterations is fixed at 16, and so its likely that the compiler will unroll the loop. If it unrolls the loop and your array is marked as constant (doesn't come from a buffer), then the compiler can essentially inline your array data right into the code generated for the unrolled loop. This can be nice, since it removes the need for any memory accessing instructions when using the array data. However it also results in more total program instructions compared to using flow control instructions. Note that you can try to force the compiler to unroll the loop by using the [unroll] attribute on your for loop. If the compiler doesn't unroll the loop and instead uses dynamic flow control instructions (either because the loop count isn't known, or because you used the [loop] attribute), then the compiler will no longer be able to directly embed your array values into the code, since the same code is used for each loop iteration. It also can't just pre-load the array values into a bunch of registers, since the D3D ISA lacks the ability to dynamically index into registers. So what it will typically do is embed your array values into a compiler-generated constant buffer. This is a special constant buffer that the driver manages transparently, and it allows the shader program to dynamically index into the array using memory instructions.

With all of that said, I would recommend keeping the code the way you have it. Doing it that way should allow the compiler to unroll the loop, which is usually good from a performance point of view. Not only will you avoid memory accesses from reading the values out of a constant buffer, but unrolling can also give the hardware a better opportunity to pipeline your shadow map texture fetches in order to hide latency.

Thank you both for your help. MJP, your answer was really enlightening, it is nice to have at least a clue about what's going on under-under the hood.

This topic is closed to new replies.

Advertisement