Hi there,
I'd like to pass a float array to the GPU, using DirectX 11 and HLSL. But obvioulsy the following approach doesn't work:
cbuffer cbArray : register( b0 )
{
float g_floatArray[6];
};
I just found a thread about the same topic, but the answer there is to vague for me: http://forums.nvidia.com/index.php?showtopic=169643
I am new to DirectX and HLSL, so I don't know what to do now. It doesn't necessarily have to be a constant buffer - I just want to pass an array from the CPU to the GPU.
Any Suggestions?
constant buffer with an array of floats?
*bump*
I still wasn't able to solve my issue. :[
Is anyone around who can tell me, what's the easiest way to pass a float array to the GPU?
I still wasn't able to solve my issue. :[
Is anyone around who can tell me, what's the easiest way to pass a float array to the GPU?
The constant buffer must have the right padding to work. The easiest way is to only use 4 dimensional vectors and arrays with multiples of 4 elements since the machineword is 4 floats.
For example:
float4 g_floatArray[n];
float g_floatArray[n * 4];
For example:
float4 g_floatArray[n];
float g_floatArray[n * 4];
Can you give more detail about how your wish for a float array of 6 doesn't work?
Try the following where n is the size of the array:
Try the following where n is the size of the array:
cbuffer cbArray { float4 g_array[n/4 + !!(n%4)];}//To use it do the following:float x = ((float[4])(g_array[i/4]))[i%4];
First of all: thanks for your replies!
I'll get into more detail, since I don't know at what point the problem is.
I am experimenting with the new Tesselation Hardware and therefore I'd like to use an array of floats in the Domain Shader, which is defined by the application.
As a start, I used a sample program of the DirectX11 SDK and tried to change it.
So basically the relevant parts of the sourcecode look as follows:
and the shader:
So more or less everything is copied from the sample but the Array. The program compiles and starts, but the resulting Tesselation is wrong (compared to the hardcoded one) and flickers. The Tesselation changes by changing the Array, so some data is passed, but obviously in a faulty way.
It actually seems to me as a very basic problem, the more it stressed me that I'm not able to solve it.
Concerning the padding: I changed the program as DieterVW suggested - it resulted in the same flicking. So either the problem is somewhere else or I did something wrong.
PS: Are there any good books or tutorials on DirectX11 & HLSL you can recommend? I'd like to be able to solve such problems on my own, but I failed to find appropriate literature.
I'll get into more detail, since I don't know at what point the problem is.
I am experimenting with the new Tesselation Hardware and therefore I'd like to use an array of floats in the Domain Shader, which is defined by the application.
As a start, I used a sample program of the DirectX11 SDK and tried to change it.
So basically the relevant parts of the sourcecode look as follows:
ID3D11Buffer* g_pcbPerFrame = NULL;UINT g_iBindPerFrame = 0;// the array I want to use in the shaderstruct CB_PER_FRAME_CONSTANTS{ float anArray[6];};..// Create constant buffersD3D11_BUFFER_DESC Desc;Desc.Usage = D3D11_USAGE_DYNAMIC;Desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;Desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;Desc.MiscFlags = 0;Desc.ByteWidth = sizeof( CB_PER_FRAME_CONSTANTS ) * 6; // at this point it doesn't compile without "* 6" - I don't know why exactlyV_RETURN( pd3dDevice->CreateBuffer( &Desc, NULL, &g_pcbPerFrame ) );DXUT_SetDebugName( g_pcbPerFrame, "CB_PER_FRAME_CONSTANTS" );..// Update per-frame variablesD3D11_MAPPED_SUBRESOURCE MappedResource;pd3dImmediateContext->Map( g_pcbPerFrame, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource );CB_PER_FRAME_CONSTANTS* pData = ( CB_PER_FRAME_CONSTANTS* )MappedResource.pData;pData->anArray[0] = 0.0f;pData->anArray[1] = 0.2f;pData->anArray[2] = 0.4f;pData->anArray[3] = 0.6f;pData->anArray[4] = 0.8f;pData->anArray[5] = 1.0f;pd3dImmediateContext->Unmap( g_pcbPerFrame, 0 );// Bind the CBpd3dImmediateContext->VSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );pd3dImmediateContext->HSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );pd3dImmediateContext->DSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );pd3dImmediateContext->PSSetConstantBuffers( g_iBindPerFrame, 1, &g_pcbPerFrame );..
and the shader:
cbuffer cbPerFrame : register( b0 ){ float g_anArray[6];};..
So more or less everything is copied from the sample but the Array. The program compiles and starts, but the resulting Tesselation is wrong (compared to the hardcoded one) and flickers. The Tesselation changes by changing the Array, so some data is passed, but obviously in a faulty way.
It actually seems to me as a very basic problem, the more it stressed me that I'm not able to solve it.
Concerning the padding: I changed the program as DieterVW suggested - it resulted in the same flicking. So either the problem is somewhere else or I did something wrong.
PS: Are there any good books or tutorials on DirectX11 & HLSL you can recommend? I'd like to be able to solve such problems on my own, but I failed to find appropriate literature.
I have not found a sufficiently good explanation for this behavior anywhere yet -- I ran in to it recently while trying to write a simple compute shader example for SlimDX, where I had 16-element constant buffer that I doubled and wrote to the output buffer. The buffer was filled with the elements 1 through 16 on the CPU side, declared in the shader as
and indexed in the shader as input[x] where x was 0 to 15. To my surprise, I found that input[0] would yield 1 but input[1] returned 5. I came across a post by John Rapp in my investigating saying something about constant buffers being access on 16-byte boundaries, which made some sense, but I've been meaning to go in search of a more detailed explanation.
FWIW, I changed my constant buffer to float4 input[4] and did a bit of indexing sleight of hand and things worked fine. You can find the sample here. Program.cs is the code (which in C# but should be easy enough to transform back to C++) and the hlsl file contains the compute shader.
cbuffer constants { float input[16];}
and indexed in the shader as input[x] where x was 0 to 15. To my surprise, I found that input[0] would yield 1 but input[1] returned 5. I came across a post by John Rapp in my investigating saying something about constant buffers being access on 16-byte boundaries, which made some sense, but I've been meaning to go in search of a more detailed explanation.
FWIW, I changed my constant buffer to float4 input[4] and did a bit of indexing sleight of hand and things worked fine. You can find the sample here. Program.cs is the code (which in C# but should be easy enough to transform back to C++) and the hlsl file contains the compute shader.
So here is an example shader that will help show the difference in the techniques:
Here is the reflection information when this shader is compiled:
Here is the asm
So from this we can see that the two cbuffers carray1 and carray2 are very different in size. The float[6] version requires 84 bytes. The float4[2] version requires only 32 bytes. The difference here is the approach to indexing the data. Keep in mind that all registers in HLSL are vec4's. If you look at the asm code you'll see that when cb0 is indexed it only access the .x component. To keep things simple and fast, the compiler indexes the cbuffer register in order to access the array and chooses to keep the component access static. Each index of that array falls on the next register in the x cbuffer array which just happens to provide the least number of instructions necessary to do array indexing.
The other technique I showed allows for tighter memory packing but in turn requires a bit more code gen. A simple experiment where we create two shader using the different indexing and memory packing shows us how different the number of instructions required is. The difference is only 1.
// Approximately 4 instruction slots used
// Approximately 3 instruction slots used
[Edited by - DieterVW on December 15, 2010 12:00:31 PM]
cbuffer carray1{ float x[6];};cbuffer carray2{ float4 y[2];};float4 main( uint i : index ) : SV_Position{ return x + y[i/4][i%4];}
Here is the reflection information when this shader is compiled:
// cbuffer carray1// {// float x[6]; // Offset: 0 Size: 84// }//// cbuffer carray2// {// float4 y[2]; // Offset: 0 Size: 32// }
Here is the asm
vs_5_0dcl_globalFlags refactoringAlloweddcl_immediateConstantBuffer { { 1.000000, 0, 0, 0}, { 0, 1.000000, 0, 0}, { 0, 0, 1.000000, 0}, { 0, 0, 0, 1.000000} }dcl_constantbuffer cb0[6], dynamicIndexeddcl_constantbuffer cb1[2], dynamicIndexeddcl_input v0.xdcl_output_siv o0.xyzw, positiondcl_temps 1and r0.x, v0.x, l(3)ushr r0.y, v0.x, l(2)dp4 r0.x, cb1[r0.y + 0].xyzw, icb[r0.x + 0].xyzwmov r0.y, v0.xadd o0.xyzw, r0.xxxx, cb0[r0.y + 0].xxxxret
So from this we can see that the two cbuffers carray1 and carray2 are very different in size. The float[6] version requires 84 bytes. The float4[2] version requires only 32 bytes. The difference here is the approach to indexing the data. Keep in mind that all registers in HLSL are vec4's. If you look at the asm code you'll see that when cb0 is indexed it only access the .x component. To keep things simple and fast, the compiler indexes the cbuffer register in order to access the array and chooses to keep the component access static. Each index of that array falls on the next register in the x cbuffer array which just happens to provide the least number of instructions necessary to do array indexing.
The other technique I showed allows for tighter memory packing but in turn requires a bit more code gen. A simple experiment where we create two shader using the different indexing and memory packing shows us how different the number of instructions required is. The difference is only 1.
// Approximately 4 instruction slots used
cbuffer carray2{ float4 y[2];}float4 main1( uint i : index ) : SV_Position{ return y[i/4][i%4];}//vs_5_0//dcl_globalFlags refactoringAllowed//dcl_immediateConstantBuffer { { 1.000000, 0, 0, 0},// { 0, 1.000000, 0, 0},// { 0, 0, 1.000000, 0},// { 0, 0, 0, 1.000000} }//dcl_constantbuffer cb0[2], dynamicIndexed//dcl_input v0.x//dcl_output_siv o0.xyzw, position//dcl_temps 1//and r0.x, v0.x, l(3)//ushr r0.y, v0.x, l(2)//dp4 o0.xyzw, cb0[r0.y + 0].xyzw, icb[r0.x + 0].xyzw//ret
// Approximately 3 instruction slots used
cbuffer carray1{ float x[6];}float4 main2( uint i : index ) : SV_Position{ return x;}//vs_5_0//dcl_globalFlags refactoringAllowed//dcl_constantbuffer cb0[6], dynamicIndexed//dcl_input v0.x//dcl_output_siv o0.xyzw, position//dcl_temps 1//mov r0.x, v0.x//mov o0.xyzw, cb0[r0.x + 0].xxxx//ret
[Edited by - DieterVW on December 15, 2010 12:00:31 PM]
Thanks for your effort! Unluckily I still wasn't able to fix it. Changing the padding resulted in the same issues. :(
Is there maybe a different solution of passing an array to the shader? Like in a 1D texture?
This problem makes me go crazy...
Is there maybe a different solution of passing an array to the shader? Like in a 1D texture?
This problem makes me go crazy...
Sounds like you need to crack open PIX or one of the IHV's shader and pipeline debuggers to figure out where your problem actually lies. Arrays in HLSL are pretty simple and other methods are going to eat your time. You may as well figure out the real problem now.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement