Sign in to follow this  
tr0x

constant buffer with an array of floats?

Recommended Posts

Hi there,

I'd like to pass a float array to the GPU, using DirectX 11 and HLSL. But obvioulsy the following approach doesn't work:


cbuffer cbArray : register( b0 )
{
float g_floatArray[6];
};


I just found a thread about the same topic, but the answer there is to vague for me: http://forums.nvidia.com/index.php?showtopic=169643

I am new to DirectX and HLSL, so I don't know what to do now. It doesn't necessarily have to be a constant buffer - I just want to pass an array from the CPU to the GPU.


Any Suggestions?

Share this post


Link to post
Share on other sites
The constant buffer must have the right padding to work. The easiest way is to only use 4 dimensional vectors and arrays with multiples of 4 elements since the machineword is 4 floats.

For example:
float4 g_floatArray[n];
float g_floatArray[n * 4];

Share this post


Link to post
Share on other sites
Can you give more detail about how your wish for a float array of 6 doesn't work?

Try the following where n is the size of the array:


cbuffer cbArray
{
float4 g_array[n/4 + !!(n%4)];
}

//To use it do the following:

float x = ((float[4])(g_array[i/4]))[i%4];


Share this post


Link to post
Share on other sites
First of all: thanks for your replies!

I'll get into more detail, since I don't know at what point the problem is.


I am experimenting with the new Tesselation Hardware and therefore I'd like to use an array of floats in the Domain Shader, which is defined by the application.

As a start, I used a sample program of the DirectX11 SDK and tried to change it.

So basically the relevant parts of the sourcecode look as follows:



ID3D11Buffer* g_pcbPerFrame = NULL;
UINT g_iBindPerFrame = 0;

// the array I want to use in the shader
struct CB_PER_FRAME_CONSTANTS
{
float anArray[6];
};

.
.
// Create constant buffers
D3D11_BUFFER_DESC Desc;
Desc.Usage = D3D11_USAGE_DYNAMIC;
Desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
Desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
Desc.MiscFlags = 0;

Desc.ByteWidth = sizeof( CB_PER_FRAME_CONSTANTS ) * 6; // at this point it doesn't compile without "* 6" - I don't know why exactly
V_RETURN( pd3dDevice->CreateBuffer( &Desc, NULL, &g_pcbPerFrame ) );
DXUT_SetDebugName( g_pcbPerFrame, "CB_PER_FRAME_CONSTANTS" );

.
.

// Update per-frame variables
D3D11_MAPPED_SUBRESOURCE MappedResource;
pd3dImmediateContext->Map( g_pcbPerFrame, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource );
CB_PER_FRAME_CONSTANTS* pData = ( CB_PER_FRAME_CONSTANTS* )MappedResource.pData;

pData->anArray[0] = 0.0f;
pData->anArray[1] = 0.2f;
pData->anArray[2] = 0.4f;
pData->anArray[3] = 0.6f;
pData->anArray[4] = 0.8f;
pData->anArray[5] = 1.0f;

pd3dImmediateContext->Unmap( g_pcbPerFrame, 0 );


// Bind the CB
pd3dImmediateContext->VSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );
pd3dImmediateContext->HSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );
pd3dImmediateContext->DSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );
pd3dImmediateContext->PSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );
.
.





and the shader:




cbuffer cbPerFrame : register( b0 )
{
float g_anArray[6];
};

.
.




So more or less everything is copied from the sample but the Array. The program compiles and starts, but the resulting Tesselation is wrong (compared to the hardcoded one) and flickers. The Tesselation changes by changing the Array, so some data is passed, but obviously in a faulty way.

It actually seems to me as a very basic problem, the more it stressed me that I'm not able to solve it.

Concerning the padding: I changed the program as DieterVW suggested - it resulted in the same flicking. So either the problem is somewhere else or I did something wrong.


PS: Are there any good books or tutorials on DirectX11 & HLSL you can recommend? I'd like to be able to solve such problems on my own, but I failed to find appropriate literature.

Share this post


Link to post
Share on other sites
I have not found a sufficiently good explanation for this behavior anywhere yet -- I ran in to it recently while trying to write a simple compute shader example for SlimDX, where I had 16-element constant buffer that I doubled and wrote to the output buffer. The buffer was filled with the elements 1 through 16 on the CPU side, declared in the shader as

cbuffer constants {
float input[16];
}

and indexed in the shader as input[x] where x was 0 to 15. To my surprise, I found that input[0] would yield 1 but input[1] returned 5. I came across a post by John Rapp in my investigating saying something about constant buffers being access on 16-byte boundaries, which made some sense, but I've been meaning to go in search of a more detailed explanation.

FWIW, I changed my constant buffer to float4 input[4] and did a bit of indexing sleight of hand and things worked fine. You can find the sample here. Program.cs is the code (which in C# but should be easy enough to transform back to C++) and the hlsl file contains the compute shader.

Share this post


Link to post
Share on other sites
So here is an example shader that will help show the difference in the techniques:


cbuffer carray1
{
float x[6];
};

cbuffer carray2
{
float4 y[2];
};

float4 main( uint i : index ) : SV_Position
{
return x[i] + y[i/4][i%4];
}





Here is the reflection information when this shader is compiled:

// cbuffer carray1
// {
// float x[6]; // Offset: 0 Size: 84
// }
//
// cbuffer carray2
// {
// float4 y[2]; // Offset: 0 Size: 32
// }





Here is the asm


vs_5_0
dcl_globalFlags refactoringAllowed
dcl_immediateConstantBuffer { { 1.000000, 0, 0, 0},
{ 0, 1.000000, 0, 0},
{ 0, 0, 1.000000, 0},
{ 0, 0, 0, 1.000000} }
dcl_constantbuffer cb0[6], dynamicIndexed
dcl_constantbuffer cb1[2], dynamicIndexed
dcl_input v0.x
dcl_output_siv o0.xyzw, position
dcl_temps 1
and r0.x, v0.x, l(3)
ushr r0.y, v0.x, l(2)
dp4 r0.x, cb1[r0.y + 0].xyzw, icb[r0.x + 0].xyzw
mov r0.y, v0.x
add o0.xyzw, r0.xxxx, cb0[r0.y + 0].xxxx
ret





So from this we can see that the two cbuffers carray1 and carray2 are very different in size. The float[6] version requires 84 bytes. The float4[2] version requires only 32 bytes. The difference here is the approach to indexing the data. Keep in mind that all registers in HLSL are vec4's. If you look at the asm code you'll see that when cb0 is indexed it only access the .x component. To keep things simple and fast, the compiler indexes the cbuffer register in order to access the array and chooses to keep the component access static. Each index of that array falls on the next register in the x cbuffer array which just happens to provide the least number of instructions necessary to do array indexing.

The other technique I showed allows for tighter memory packing but in turn requires a bit more code gen. A simple experiment where we create two shader using the different indexing and memory packing shows us how different the number of instructions required is. The difference is only 1.

// Approximately 4 instruction slots used

cbuffer carray2
{
float4 y[2];
}

float4 main1( uint i : index ) : SV_Position
{
return y[i/4][i%4];
}

//vs_5_0
//dcl_globalFlags refactoringAllowed
//dcl_immediateConstantBuffer { { 1.000000, 0, 0, 0},
// { 0, 1.000000, 0, 0},
// { 0, 0, 1.000000, 0},
// { 0, 0, 0, 1.000000} }
//dcl_constantbuffer cb0[2], dynamicIndexed
//dcl_input v0.x
//dcl_output_siv o0.xyzw, position
//dcl_temps 1
//and r0.x, v0.x, l(3)
//ushr r0.y, v0.x, l(2)
//dp4 o0.xyzw, cb0[r0.y + 0].xyzw, icb[r0.x + 0].xyzw
//ret





// Approximately 3 instruction slots used

cbuffer carray1
{
float x[6];
}

float4 main2( uint i : index ) : SV_Position
{
return x[i];
}

//vs_5_0
//dcl_globalFlags refactoringAllowed
//dcl_constantbuffer cb0[6], dynamicIndexed
//dcl_input v0.x
//dcl_output_siv o0.xyzw, position
//dcl_temps 1
//mov r0.x, v0.x
//mov o0.xyzw, cb0[r0.x + 0].xxxx
//ret





[Edited by - DieterVW on December 15, 2010 12:00:31 PM]

Share this post


Link to post
Share on other sites
Thanks for your effort! Unluckily I still wasn't able to fix it. Changing the padding resulted in the same issues. :(

Is there maybe a different solution of passing an array to the shader? Like in a 1D texture?

This problem makes me go crazy...

Share this post


Link to post
Share on other sites
Sounds like you need to crack open PIX or one of the IHV's shader and pipeline debuggers to figure out where your problem actually lies. Arrays in HLSL are pretty simple and other methods are going to eat your time. You may as well figure out the real problem now.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this