DX12 Constant buffer padding for array of structs

Recommended Posts

I am confused why this code works because the lights array is not 16 bytes aligned.

struct Light
{
    float4 position;
    float radius;
    float intensity;
    // How does this work without adding
    // uint _pad0, _pad1;
};

cbuffer lightData : register(b0)
{
    uint lightCount;
    uint _pad0;
    uint _pad1;
    uint _pad2;
    // Shouldn't the shader be not able to read the second element in the light struct
    // Because after float intensity, we need 8 more bytes to make it 16 byte aligned?
    Light lights[NUM_LIGHTS];
}

This has erased everything I thought I knew about constant buffer alignment. Any explanation will help clear my head.

Thank you

Share this post


Link to post
Share on other sites

Just compile your code with FXC and see the printed layout. Are you sure it works?

I compiled this:

struct Light
{
    float4 position;
    float radius;
    float intensity;
};

cbuffer lightData : register(b0)
{
	uint dummy1;
	Light lights[9];
	uint dummy2;
};

float4 main(uint idx : SV_VertexID) : SV_Position
{
	return (dummy1 + lights[0].position * lights[7].radius + lights[8].intensity + dummy2).xxxx;
}

And got this:

// cbuffer lightData
// {
//
//   uint dummy1;                       // Offset:    0 Size:     4
//
//   struct Light
//   {
//
//       float4 position;               // Offset:   16
//       float radius;                  // Offset:   32
//       float intensity;               // Offset:   36
//
//   } lights[9];                       // Offset:   16 Size:   280
//   uint dummy2;                       // Offset:  296 Size:     4
//
// }

It will align lights[0] at 16-bytes. From this it looks like sizeof(Light)=32. What puzzles me is 32*9 = 288 and not 280 as reported. What's totally NOT understandable is dummy2 being at offset 296 = 16 + 32*9 - 8, as if it figured that in lights[8], there's 8 bytes padding, so let's put dummy2 there. Would anyone care guessing wtf?

So I totally don't understand this now :D My FXC.exe is 6.3.9600 from Windows Kit 8.1.

One thing is for sure, directly in cbuffers, you can have 4-byte constants (uint, float, int, ...) on 4-byte boundaries. Also, structs will be aligned to 16 bytes.

The final takeaway is to always query the compiled version for offsets of individual constants, so you know where to memcpy what.

Edited by pcmaster

Share this post


Link to post
Share on other sites

Yes. This has broken my understanding of the constant buffer alignment. The behavior seems odd. If you place the numLights after the lights array, everything breaks.

cbuffer lightData : register(b0)
{
    // If this part is placed after the lights array, everything breaks
  	///
    uint lightCount;
    uint _pad0;
    uint _pad1;
    uint _pad2;
  	///
    Light lights[NUM_LIGHTS];
};

 

Edited by mark_braga

Share this post


Link to post
Share on other sites

If you put an array of some type in an array inside of a cbuffer, the compiler will insert padding if the size of the type is not 16-byte aligned. So in your case your light struct is 24 bytes, so you'll get 8 bytes of padding after each element in the array. It will also always start the array on a 16-byte boundary, which means there may be some padding before the array depending on what else is declared in the cbuffer.

Share this post


Link to post
Share on other sites

unless on some nvidia hardware with last chance optimization, you should just stick to a structured buffer for large storage of constants. You do not have the exotic ( not c++ compatible ) alignment rules, you do not have the restriction of updating the full buffer ( dx11.1 windows 8 minimum for that, just saying fyi, as dx12 is not a problem here anyway ).

Edited by galop1n

Share this post


Link to post
Share on other sites
15 minutes ago, galop1n said:

Because it would be waste, and because it is not C++. Don't overthink it z)

Then my recommendation remains - always use shader reflection (ID3D11ShaderReflection) to get the offsets of individual members, you can assume almost nothing.

Share this post


Link to post
Share on other sites
1 hour ago, pcmaster said:

Then my recommendation remains - always use shader reflection (ID3D11ShaderReflection) to get the offsets of individual members, you can assume almost nothing.

My recommendation is even better, use a StructuredBuffer for that :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Similar Content

    • By VietNN
      Hi all,
      I want to copy  just 1 mipmap level of a texture and I am doing like this:
      void CopyTextureRegion( &CD3DX12_TEXTURE_COPY_LOCATION(pDstData, mipmapIndex), 0, 0, 0, &CD3DX12_TEXTURE_COPY_LOCATION(pSrcData, pLayout), nullptr ); - pDstData : is DEFAULT_HEAP, pSrcData is UPLOAD_HEAP(buffer size was get by GetCopyableFootprints from pDstData with highest miplevel), pLayout is D3D12_PLACED_SUBRESOURCE_FOOTPRINT
      - I think the mipmapIndex will point the exact location data of Dest texture, but does it know where to get data location from Src texture because pLayout just contain info of this mipmap(Offset and Footprint).  (???)
      - pLayout has a member name Offset, and I try to modify it but it(Offset) need 512 Alignment but real offset in Src texture does not.
      So what I need to do to match the location of mip texture in Src Texture ?
      @SoldierOfLight @galop1n
    • By _void_
      Hello!
      I am wondering if there is a way to find out how many resources you could bind to the command list directly without putting them in a descriptor table.
      Specifically, I am referring to these guys:
      - SetGraphicsRoot32BitConstant
      - SetGraphicsRoot32BitConstants
      - SetGraphicsRootConstantBufferView
      - SetGraphicsRootShaderResourceView
      - SetGraphicsRootUnorderedAccessView
      I remember from early presentations on D3D12 that the count of allowed resources is hardware dependent and quite small. But I would like to learn some more concrete figures.
    • By lubbe75
      I am trying to set up my sampler correctly so that textures are filtered the way I want. I want to use linear filtering for both min and mag, and I don't want to use any mipmap at all.
      To make sure that mipmap is turned off I set the MipLevels to 1 for my textures.
      For the sampler filter I have tried all kind of combinations, but somehow the mag filter works fine while the min filter doesn't seem to work at all. As I zoom out there seems to be a nearest point filter.
      Is there a catch in Dx12 that makes my min filter not working?
      Do I need to filter manually in my shader? I don't think so since the mag filter works correctly.
      My pixel shader is just a simple texture lookup:
      textureMap.Sample(g_sampler, input.uv); My sampler setup looks like this (SharpDX):
      sampler = new StaticSamplerDescription() { Filter = Filter.MinMagLinearMipPoint, AddressU = TextureAddressMode.Wrap, AddressV = TextureAddressMode.Wrap, AddressW = TextureAddressMode.Wrap, ComparisonFunc = Comparison.Never, BorderColor = StaticBorderColor.TransparentBlack, ShaderRegister = 0, RegisterSpace = 0, ShaderVisibility = ShaderVisibility.Pixel, };  
    • By lubbe75
      Does anyone have a working example of how to implement MSAA in DX12? I have read short descriptions and I have seen code fragments on how to do it with DirectX Tool Kit.
      I get the idea, but with all the pipeline states, root descriptions etc I somehow get lost on the way.
      Could someone help me with a link pointing to a small implementation in DirectX 12 (or SharpDX with DX12)?
       
    • By HD86
      I have a vertex buffer on a default heap. I need a CPU pointer to that buffer in order to loop through the vertices and change one value in some vertices (the color value). In the past this was possible by creating the buffer with the flag D3DUSAGE_DYNAMIC/D3D11_USAGE_DYNAMIC and using IDirect3DVertexBuffer9::Lock or ID3D11DeviceContext::Map to get a pointer.
      What is the correct way to do the same in DX 12? As far as I understand, the method ID3D12Resource::Map cannot be used on a default heap because default heaps cannot be accessed directly from the CPU. The documentation says that upload heaps are intended for CPU-write-once, GPU-read-once usage, so I don't think these are equivalent to the "dynamic" buffers. Is the readback heap equivalent to what was called a dynamic buffer? Or should I create a custom heap?
      I am thinking to do the following:
      -Create a temporary readback heap.
      -Copy the data from the default heap to the readback heap using UpdateSubresources.
      -Get a CPU pointer to the readback heap using Map and edit the data.
      -Copy the data back to the default heap using UpdateSubresources.
      What do you think about this?
  • Popular Now