• 11
• 9
• 10
• 9
• 10

# alignment for between shader stages data structure

This topic is 571 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey Guys,

We know that for shader constant buffer we have to follow the alignment rule:


cbuffer MyStruct : register(b0) {
matrix mWVP;
float2 f2ColorReso;
float2 f2DepthReso;
// float fThisCausesProblem;
float4 f4LightPos;
}


I was wondering how about struct for inter shader stages? it seems we don't need the alignment rule for data from vs to ps so the following is totally correct:

struct TexColPos {
float2 Tex : TEXCOORD0;
float3 Col : COLOR0;
float4 Pos : SV_Position;
}


However, I was told that all GPU memory HW prefer aligned read and write, so may be using the following struct will be faster?

struct TexColPos {
float4 Tex : TEXCOORD0; // Tex.zw is not used;
float4 Col : COLOR0; // Col.w is not used;
float4 Pos : SV_Position;
}

// Also I was wondering is the following make any differences?
struct TexColPos {
float2 Tex : TEXCOORD0;
float2 Dummy0 : TEXCOORD1; // not used;
float3 Col : COLOR0;
float Dummy1 : COLOR1 // not used;
float4 Pos : SV_Position;
} 

This seems consume more bandwidth, but since this is between vs and ps, so shouldn't have any impact on device memory bandwidth (read and write should totally happen in cache right? am I wrong?). I have tested this and didn't notice any difference (probably my test workload is very light, and my GPU is pretty old), but it will be great if anyone could provide more insight on this.

##### Share on other sites

I was wondering how about struct for inter shader stages? it seems we don't need the alignment rule for data from vs to ps so the following is totally correct: struct TexColPos { float2 Tex : TEXCOORD0; float3 Col : COLOR0; float4 Pos : SV_Position; }

TL;DR - The HLSL compiler will pad it all by itself.  [EDIT] There is no (spoon) buffer.

The common problem with CB layout is updating from C++ code. That's where you need to be careful, since the alignment and packing rules are different for C++ and HLSL structs.

It doesn't really matter for inter-stage data. The layout is not exposed to the user. Conceptually- it's not even a buffer. The HLSL compiler will decompose your struct and assign input and output registers to each struct member.

There are further driver specific optimizations that affect how communication between shaders happens. Don't worry about it - the compiler will make an optimized decision for you.

Edited by N.I.B.

##### Share on other sites

I was wondering how about struct for inter shader stages? it seems we don't need the alignment rule for data from vs to ps so the following is totally correct: struct TexColPos { float2 Tex : TEXCOORD0; float3 Col : COLOR0; float4 Pos : SV_Position; }

TL;DR - The HLSL compiler will pad it all by itself.  [EDIT] There is no (spoon) buffer.

The common problem with CB layout is updating from C++ code. That's where you need to be careful, since the alignment and packing rules are different for C++ and HLSL structs.

It doesn't really matter for inter-stage data. The layout is not exposed to the user. Conceptually- it's not even a buffer. The HLSL compiler will decompose your struct and assign input and output registers to each struct member.
There are further driver specific optimizations that affect how communication between shaders happens. Don't worry about it - the compiler will make an optimized decision for you.

Always wondered about that. At least we don't have to worry about packing through stages. I had packed a constant buffer incorrectly once and visually i could see the benefit. As you say though, that is c++ or in my case c#

##### Share on other sites

I was wondering how about struct for inter shader stages? it seems we don't need the alignment rule for data from vs to ps so the following is totally correct: struct TexColPos { float2 Tex : TEXCOORD0; float3 Col : COLOR0; float4 Pos : SV_Position; }

TL;DR - The HLSL compiler will pad it all by itself.  [EDIT] There is no (spoon) buffer.

The common problem with CB layout is updating from C++ code. That's where you need to be careful, since the alignment and packing rules are different for C++ and HLSL structs.

It doesn't really matter for inter-stage data. The layout is not exposed to the user. Conceptually- it's not even a buffer. The HLSL compiler will decompose your struct and assign input and output registers to each struct member.

There are further driver specific optimizations that affect how communication between shaders happens. Don't worry about it - the compiler will make an optimized decision for you.

Thanks for elaborating on that. It totally make sense if the inter-stage data directly go to registers. But I was told in gs stage, inter-stage data is write to and read from a kind of ring-buffer in memory. So in that case, does the packing matters ?  or HLSL compiler will pad it or pack it?

Thanks