[D3D 10/11] setting a bool constant in HLSL

Started by
8 comments, last by DieterVW 13 years, 5 months ago
Everything was working fine until I threw in a boolean into my constant buffer :

//////// vs.fx
#include "skinning.fx"...


//////// skinning.fx
cbuffer cbPerObject{    row_major float4x4 WVP_Matrix;    row_major float4x4 ModelView_Matrix;    row_major float4x4 SkinningMatrices[ 72 ];    bool bSkinning;              //<-- added this};


///////// .cpp
struct VSConstantBuffer{    D3DXMATRIX WVP_Matrix;    D3DXMATRIX ModelView_Matrix;    D3DXMATRIX SkinningMatrices[ MAX_SKELETON_JOINTS ];    bool bSkinning;              //<-- added this};...D3D10_BUFFER_DESC cbDesc;ZeroMemory( &cbDesc, sizeof(D3D10_BUFFER_DESC) );cbDesc.ByteWidth = sizeof( VSConstantBuffer );cbDesc.Usage = D3D10_USAGE_DYNAMIC;cbDesc.BindFlags = D3D10_BIND_CONSTANT_BUFFER;cbDesc.CPUAccessFlags = D3D10_CPU_ACCESS_WRITE;HRESULT VSConstantBufferResult = 0;VSConstantBufferResult = m_pd3dDevice->CreateBuffer( &cbDesc, NULL, &VScBuffer );assert( VSConstantBufferResult == D3D_OK );


now VSConstantBufferResult returns E_INVALIDARG.

By commenting out "bool bSkinning;" in VSConstantBuffer, the CreateBuffer function succeeds.

What am I missing???

-edit-
I've also added the D3D10_SUBRESOURCE_DATA argument :
VSConstantBuffer init;ZeroMemory( &init, sizeof(VSConstantBuffer) );init.ApplySkinning = true;D3D10_SUBRESOURCE_DATA subres;ZeroMemory( &subres, sizeof(D3D10_SUBRESOURCE_DATA) );subres.pSysMem = (void*)&init;HRESULT VSConstantBufferResult = 0;VSConstantBufferResult = m_pd3dDevice->CreateBuffer( &cbDesc, &subres, &VScBuffer );

but that didn't work...

[Edited by - 16bit_port on October 28, 2010 2:18:12 PM]
Advertisement
Your sizeof(VSConstantBuffer) is not safe since constant buffers require sizes to be in multiples of 16 bytes.
Quote:Original post by DieterVW
Your sizeof(VSConstantBuffer) is not safe since constant buffers require sizes to be in multiples of 16 bytes.


So... then should I throw in extra unnecessary bytes (for padding)? That is just absurd.

I had to do the following to get it to work :

///////// skinning.fx
cbuffer cbPerObject{    row_major float4x4 WVP_Matrix;    row_major float4x4 ModelView_Matrix;    row_major float4x4 SkinningMatrices[ 72 ];    float4 bSkinning;};


///////// .cpp
struct VSConstantBuffer{    D3DXMATRIX WVP_Matrix;    D3DXMATRIX ModelView_Matrix;    D3DXMATRIX SkinningMatrices[ MAX_SKELETON_JOINTS ];    D3DXVECTOR4 bSkinning;};


Is there no other way?

[Edited by - 16bit_port on October 27, 2010 4:53:35 PM]
Just put some packing later in your structure to make its size a multiple of 16.

If you mix differently sized types in your structure you should also make sure that no packing rules mess things up for you. The following pages contain some information on the subject:
Working with Packing Structures.
Packing Rules for Constant VAriables (DirectX HLSL).
There's no need to put extra padding into the C++ struct as that would be a waste on the CPU side. The GPU allocates registers in increments of 16bytes, so you have no choice on that side of things. This is to some degree hidden by the HLSL compiler since it'll bump the cbuffer size up for you silently. You just need to increment the cbuffer size to the next multiple of 16 when creating the constant buffer.

C++ structs which are written to match constant buffer packing is a convenience but doesn't scale well if the application is large with a wide variety of different cbuffers. A more extensible solution would use reflection to determine the cbuffer layout/packing along with the data types and sizes. The application can use HLSL variable annotations to query the game engine for the data needed for the current shader and then pack a constant buffer from there. This makes shaders more 'drop in' so long as the annotation used are commonly known by the game engine along with functions for retrieving the appropriate data. Of course, for static inbox engine shaders this wouldn't be a very good approach.

Often the data needed for cbuffers is located in many places in the game code which makes more sense from a design stand point and probably also from a performance stand point. Upon needing various values from the engine, they can be copied directly into a buffer, created as just new BYTE[sizeNeeded], and placed using the reflection information into the correct memory offset. Then this general buffer can be sent to the GPU. If the buffer is dynamic, then the target of the data copies can just be the dynamic buffer directly.
I'm not so familiar with structure packing and data alignment but after doing some reading, I have a few questions :

1)
Given the following :
struct VSConstantBuffer{    D3DXMATRIX WVP_Matrix;              //64 bytes    D3DXMATRIX ModelView_Matrix;        //64 bytes    D3DXMATRIX SkinningMatrices[ 72 ];  //64 * 72 bytes    bool bSkinning;                     //1 byte};


what determines how much padding is added to the end of the structure? Does it go by the largest BASIC data type (in this case a float in the matrix struct) or does it go by the largest data type/structure (D3DXMATRIX)? I'm believe it's the former since the compiler only slapped 3 extra bytes to the end of it, but I just want to be sure.

2) When packing a C++ struct (done with the #pragma pack pre-compiler directive), does it just pretty much say (that within this struct) every basic data type will be aligned with that byte boundary specified and padding will be supplied accordingly?

3)
Quote:
There's no need to put extra padding into the C++ struct as that would be a waste on the CPU side.

The extra padding... do you mean indirectly through packing or when I changed "bool bSkinning" to "D3DXVECTOR4 bSkinning"?

4)
Quote:
You just need to increment the cbuffer size to the next multiple of 16 when creating the constant buffer.

So then the only solution would be to use "D3DXVECTOR4"?

5) According to MSDN, "If the packsize is set equal to or greater than the default alignment, the packsize is ignored.". What do they mean by default alignment?
You can just do something like this:
cbDesc.ByteWidth = sizeof( VSConstantBuffer );if((cbDesc.ByteWidth % 16) != 0)	cbDesc.ByteWidth += 16 - (cbDesc.ByteWidth % 16);


It doesn't matter if the buffer is larger than your structure.

The only problem is if you use a pointer to your structure as source data for UpdateSubresource or as init-data, but if you Map your constant buffer it shouldn't be a problem.
I used Erik Rufelt's suggestion and it works in D3D10 (I'm able to toggle skinning on and off)

BUT

I tried doing the exact same thing in D3D11 and it always renders with skinning. When I run it through PIX, I get a different result (it constantly toggles skinning on and off). If I change bool VSConstantBuffer::bSkinning to D3DXVECTOR4 VSConstantBuffer::bSkinning, everything is fine and dandy.

Why is that?

Also, if someone can answer the questions in my previous post that'd be great.
If you want to pad out the structure on the CPU side then the simplest option is to use __declspec(align(16)) struct { ... };. It can be a little bit wasteful of memory doing that, but it avoids the need to manually pad out the structure.

I think your problem with the bool is probably down to HLSL using 4 byte booleans, and C++ using single byte ones. That means you'd be coping your one byte boolean, and leaving the following 3 bytes uninitialized. If I'm right changing it to the Windows BOOL type instead should fix that.
Normally the compiler decides how values need to be padded based on the target architecture. It's normally set to a default alignment of the base register size. The default choice usually prefers speed over size.

You can experiment with both the C++ compiler and with HLSL compiler to test packing/padding for yourself, but here is a basic description.

Lets take an example: On x86, the preferred alignment is 4 bytes. That is, a 4 byte type such as a INT32 must align to a 4 byte address in order for the CPU to do integer math on it. By default the compiler will pack your member variables of type UINT32 so that they are always on a 4 byte boundary. What does this mean when you're coding?

struct MyWastefulStruct{     bool dirty; // 1 byte     int value;  // 4 bytes};// x86assert( sizeof(MyWastefulStruct) == 8 );


In the above example our struct only needs 5 bytes. The trouble is that if the int member followed in memory directly after the bool it wouldn't have a 4 byte alignment, and so wouldn't be natively usable by the hardware. There are to options for the compiler, it can either add code in order to do a shift operation on the int member every time the member is used, or the compiler can add 3 bytes of empty padding after the bool in order to push the int into 4 byte alignment. The default choice is to add the empty padding and waste the 3 bytes. Of course you don't see this since the compiler hides it from you.

If you use the #pragma to do 1 byte alignment instead, then the compiler will generate code to do the shift every time the int member is used. Most people want to avoid that overhead.

I once went and looked over my code that I've written over the years and realized that many of my data structures were 2x or more bigger than necessary all because of the padding and alignment.

Now, since HLSL has a different alignment size than C++ on x86 hardware you're stuck having to be aware of padding and packing rules. The basic packing rule for HLSL is that no intrinsic data type (float2,float3,float4 for example) can fall straddling a 16 byte boundary. Matrices are build from the basic vector types, and so are also forces to follow these rules. The compiler will move those members to offset them from straddling the boundary and add padding in the process.

cbuffer X{     float3x3 rot;     float x;};// memory layout// reg0 float3, pad// reg1 float3, pad// reg2 float3, float// row 1 and row 2 of the matrix rot are shifted to the next 16 byte register// so that they don't straddle two registers.


You can see the exact layout by looking at the reflection information from a shader constant buffer or by compiling your shader using FXC on the command line. By default FXC prints out the memory layout of cbuffers after a compile for easy viewing. Make sure that your C++ structs match that layout exactly.

This topic is closed to new replies.

Advertisement