Comments about HLSL array packing policy

Graphics and GPU Programming Programming

Started by indiocolifa March 18, 2016 02:04 AM

3 comments, last by indiocolifa 8 years, 1 month ago

indiocolifa

135

Author

March 18, 2016 02:04 AM

Hi all! I want to know if I'm correct about HLSL array behavior.

I'm working with a standard lighting vertex shader, HLSL plus C++, DirectX11.1 API. For storing my light data I've setup a simple declaration, padded accordingly to match the packing requirements of constant buffers:


#define MAX_LIGHTS 16

// C++ side 
struct LightBase {
   DirectX::XMFLOAT3 pos;
   DirectX::XMFLOAT3 color;
   float intensity;
   float isOn;
};
struct CBUFFER_LIGHT {
   LightBase light[MAX_LIGHTS];
   float numActiveLights;
   float _padding[3];
};

The shader declares a "matching" cbuffer as:


// HLSL vertex shader
#define MAX_LIGHTS 16
struct LightBase {
      float3 pos;
      float3 color;
      float intensity;
      float isOn;
};
cbuffer lights : register (b3) {
      LightBase light[MAX_LIGHTS];
      float numActiveLights;
}

Now, as I expected, it didn't work due to the HLSL memory organization policy regarding arrays. e.g; if I set the buffer from C++ with the following data:

I observe the following behavior when accessing e.g: light[0] from shader:

See how light[0].color and other struct members got displaced? Seems that HLSL will organize your arrays in Vector4 elements regardless of the real type of the array element.

So I think the array data passed from C++ { -7, -1, -100, 0.44, 0.66, 0.88, 1.0, 1.0 } will be organized by HLSL in the following layout;


A[0] = { -7, -1, -100, 0.44 }
A[1] = { 0.66, 0.88, 1.0, 1.0 }
A[2] = { ...}

So A[0] contains .xyz for position and .x for color struct member; A[1] contains .gb for color member, intensity and .isOn members.
But for example, calling light[0].color from the shader seems to still address &light + sizeof(float3)+sizeof(float3) following the declaration of the cbuffer! This unexpected result is due to array layout by HLSL.

If HLSL and DirectX are so fond of FLOAT4 values, it's not better to use all float4 variables in GPU-CPU exchange through cbuffers and access members (.xyzw) as needed? e.g: do you need a boolean? Use a float4 and fill the first member (or the four members and access any of them).

Any policy or good practice you can recommend regarding cbuffer arrays?

Thanks.!

Dingleberry

924

March 18, 2016 02:49 AM

Putting everything into a float4 in C++ is quite legit.

A more advanced alternative is to use packoffset https://msdn.microsoft.com/en-us/library/windows/desktop/dd607358(v=vs.85).aspx

But generally since the constants are mapping to float4s no matter what, your generated assembly might end up looking fishy if you do something like store color.x into c0.w and color.yz into c1.xy for example. When you use the variable, the first thing your compiler will probably do is rearrange your data on you, as well as perform two reads instead of one if it were neatly stored into a single register.

For a quick and dirty way to remember the mapping, remember:

HLSL packing rules are similar to performing a #pragma pack 4 with Visual Studio, which packs data into 4-byte boundaries. Additionally, HLSL packs data so that it does not cross a 16-byte boundary. Variables are packed into a given four-component vector until the variable will straddle a 4-vector boundary; the next variables will be bounced to the next four-component vector.

https://msdn.microsoft.com/en-us/library/windows/desktop/bb509632(v=vs.85).aspx

indiocolifa

135

Author

March 18, 2016 03:18 AM

Putting everything into a float4 in C++ is quite legit.

I think for my current purposes, which are *far* from a crazily tuned, high-end, high-performance engine, using float4s seems to be a good compromise. I'm also using DirectXMath which includes types/operations nicely mapped to SSE/ARM Neon SIMD registers, so that's another reason for keeping 16-byte sized members for good.

Thank you very much.

Ashaman73

13,718

March 18, 2016 07:13 AM

Have you tried to move the intensity right after the pos, so that all your 3-component vectors are aligned again ? pos.x,pos.y,pos.z, intensity,col.r,col.g,col.b,is_on

Ashaman

Gnoblins: Website - Facebook - Twitter - Youtube - Steam Greenlit - IndieDB - Gamedev Log

indiocolifa

135

Author

March 18, 2016 08:45 PM

Have you tried to move the intensity right after the pos, so that all your 3-component vectors are aligned again ? pos.x,pos.y,pos.z, intensity,col.r,col.g,col.b,is_on

I "solved" it now with float4s in both sides , but your idea should work since pos.rgb and intensity is 16-bytes wide, the same as col.rgb plus is_on. Thanks, may be i'll try that next since I need to add attenuation parameters.

Comments about HLSL array packing policy

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Comments about HLSL array packing policy

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines