mark_braga

DX12 Constant buffer padding for array of structs

Recommended Posts

I am confused why this code works because the lights array is not 16 bytes aligned.

struct Light
{
    float4 position;
    float radius;
    float intensity;
    // How does this work without adding
    // uint _pad0, _pad1;
};

cbuffer lightData : register(b0)
{
    uint lightCount;
    uint _pad0;
    uint _pad1;
    uint _pad2;
    // Shouldn't the shader be not able to read the second element in the light struct
    // Because after float intensity, we need 8 more bytes to make it 16 byte aligned?
    Light lights[NUM_LIGHTS];
}

This has erased everything I thought I knew about constant buffer alignment. Any explanation will help clear my head.

Thank you

Share this post


Link to post
Share on other sites

Just compile your code with FXC and see the printed layout. Are you sure it works?

I compiled this:

struct Light
{
    float4 position;
    float radius;
    float intensity;
};

cbuffer lightData : register(b0)
{
	uint dummy1;
	Light lights[9];
	uint dummy2;
};

float4 main(uint idx : SV_VertexID) : SV_Position
{
	return (dummy1 + lights[0].position * lights[7].radius + lights[8].intensity + dummy2).xxxx;
}

And got this:

// cbuffer lightData
// {
//
//   uint dummy1;                       // Offset:    0 Size:     4
//
//   struct Light
//   {
//
//       float4 position;               // Offset:   16
//       float radius;                  // Offset:   32
//       float intensity;               // Offset:   36
//
//   } lights[9];                       // Offset:   16 Size:   280
//   uint dummy2;                       // Offset:  296 Size:     4
//
// }

It will align lights[0] at 16-bytes. From this it looks like sizeof(Light)=32. What puzzles me is 32*9 = 288 and not 280 as reported. What's totally NOT understandable is dummy2 being at offset 296 = 16 + 32*9 - 8, as if it figured that in lights[8], there's 8 bytes padding, so let's put dummy2 there. Would anyone care guessing wtf?

So I totally don't understand this now :D My FXC.exe is 6.3.9600 from Windows Kit 8.1.

One thing is for sure, directly in cbuffers, you can have 4-byte constants (uint, float, int, ...) on 4-byte boundaries. Also, structs will be aligned to 16 bytes.

The final takeaway is to always query the compiled version for offsets of individual constants, so you know where to memcpy what.

Edited by pcmaster

Share this post


Link to post
Share on other sites

Yes. This has broken my understanding of the constant buffer alignment. The behavior seems odd. If you place the numLights after the lights array, everything breaks.

cbuffer lightData : register(b0)
{
    // If this part is placed after the lights array, everything breaks
  	///
    uint lightCount;
    uint _pad0;
    uint _pad1;
    uint _pad2;
  	///
    Light lights[NUM_LIGHTS];
};

 

Edited by mark_braga

Share this post


Link to post
Share on other sites

If you put an array of some type in an array inside of a cbuffer, the compiler will insert padding if the size of the type is not 16-byte aligned. So in your case your light struct is 24 bytes, so you'll get 8 bytes of padding after each element in the array. It will also always start the array on a 16-byte boundary, which means there may be some padding before the array depending on what else is declared in the cbuffer.

Share this post


Link to post
Share on other sites

unless on some nvidia hardware with last chance optimization, you should just stick to a structured buffer for large storage of constants. You do not have the exotic ( not c++ compatible ) alignment rules, you do not have the restriction of updating the full buffer ( dx11.1 windows 8 minimum for that, just saying fyi, as dx12 is not a problem here anyway ).

Edited by galop1n

Share this post


Link to post
Share on other sites
15 minutes ago, galop1n said:

Because it would be waste, and because it is not C++. Don't overthink it z)

Then my recommendation remains - always use shader reflection (ID3D11ShaderReflection) to get the offsets of individual members, you can assume almost nothing.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Forum Statistics

    • Total Topics
      628278
    • Total Posts
      2981785
  • Similar Content

    • By lubbe75
      I am looking for some example projects and tutorials using sharpDX, in particular DX12 examples using sharpDX. I have only found a few. Among them the porting of Microsoft's D3D12 Hello World examples (https://github.com/RobyDX/SharpDX_D3D12HelloWorld), and Johan Falk's tutorials (http://www.johanfalk.eu/).
      For instance, I would like to see an example how to use multisampling, and debugging using sharpDX DX12.
      Let me know if you have any useful examples.
      Thanks!
    • By lubbe75
      I'm writing a 3D engine using SharpDX and DX12. It takes a handle to a System.Windows.Forms.Control for drawing onto. This handle is used when creating the swapchain (it's set as the OutputHandle in the SwapChainDescription). 
      After rendering I want to give up this control to another renderer (for instance a GDI renderer), so I dispose various objects, among them the swapchain. However, no other renderer seem to be able to draw on this control after my DX12 renderer has used it. I see no exceptions or strange behaviour when debugging the other renderers trying to draw, except that nothing gets drawn to the area. If I then switch back to my DX12 renderer it can still draw to the control, but no other renderers seem to be able to. If I don't use my DX12 renderer, then I am able to switch between other renderers with no problem. My DX12 renderer is clearly messing up something in the control somehow, but what could I be doing wrong with just SharpDX calls? I read a tip about not disposing when in fullscreen mode, but I don't use fullscreen so it can't be that.
      Anyway, my question is, how do I properly release this handle to my control so that others can draw to it later? Disposing things doesn't seem to be enough.
    • By Tubby94
      I'm currently learning how to store multiple objects in a single vertex buffer for efficiency reasons. So far I have a cube and pyramid rendered using ID3D12GraphicsCommandList::DrawIndexedInstanced; but when the screen is drawn, I can't see the pyramid because it is drawn inside the cube. I'm told to "Use the world transformation matrix so that the box and pyramid are disjoint in world space".
       
      Can anyone give insight on how this is accomplished? 
       
           First I init the verts in Local Space
      std::array<VPosData, 13> vertices =     {         //Cube         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         //Pyramid         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(0.0f,  +1.0f, 0.0f) }) } Then  data is stored into a container so sub meshes can be drawn individually
      SubmeshGeometry submesh; submesh.IndexCount = (UINT)indices.size(); submesh.StartIndexLocation = 0; submesh.BaseVertexLocation = 0; SubmeshGeometry pyramid; pyramid.IndexCount = (UINT)indices.size(); pyramid.StartIndexLocation = 36; pyramid.BaseVertexLocation = 8; mBoxGeo->DrawArgs["box"] = submesh; mBoxGeo->DrawArgs["pyramid"] = pyramid;  
      Objects are drawn
      mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["box"].IndexCount, 1, 0, 0, 0); mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["pyramid"].IndexCount, 1, 36, 8, 0);  
      Vertex Shader
       
      cbuffer cbPerObject : register(b0) { float4x4 gWorldViewProj; }; struct VertexIn { float3 PosL : POSITION; float4 Color : COLOR; }; struct VertexOut { float4 PosH : SV_POSITION; float4 Color : COLOR; }; VertexOut VS(VertexIn vin) { VertexOut vout; // Transform to homogeneous clip space. vout.PosH = mul(float4(vin.PosL, 1.0f), gWorldViewProj); // Just pass vertex color into the pixel shader. vout.Color = vin.Color; return vout; } float4 PS(VertexOut pin) : SV_Target { return pin.Color; }  

    • By HD86
      I don't know in advance the total number of textures my app will be using. I wanted to use this approach but it turned out to be impractical because D3D11 hardware may not allow binding more than 128 SRVs to the shaders. Next I decided to keep all the texture SRV's in a default heap that is invisible to the shaders, and when I need to render a texture I would copy its SRV from the invisible heap to another heap that is bound to the pixel shader, but this also seems impractical because ID3D12Device::CopyDescriptorsSimple cannot be used in a command list. It executes immediately when it is called. I would need to close, execute and reset the command list every time I need to switch the texture.
      What is the correct way to do this?
    • By DiligentDev
      Hello!
      I would like to introduce Diligent Engine, a project that I've been recently working on. Diligent Engine is a light-weight cross-platform abstraction layer between the application and the platform-specific graphics API. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common front-end for all supported platforms and provides interoperability with underlying native API. It also supports integration with Unity and is designed to be used as a graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. It is distributed under Apache 2.0 license and is free to use. Full source code is available for download on GitHub. The engine contains shader source code converter that allows shaders authored in HLSL to be translated to GLSL.
      The engine currently supports Direct3D11, Direct3D12, and OpenGL/GLES on Win32, Universal Windows and Android platforms.
      API Basics
      Initialization
      The engine can perform initialization of the API or attach to already existing D3D11/D3D12 device or OpenGL/GLES context. For instance, the following code shows how the engine can be initialized in D3D12 mode:
      #include "RenderDeviceFactoryD3D12.h" using namespace Diligent; // ...  GetEngineFactoryD3D12Type GetEngineFactoryD3D12 = nullptr; // Load the dll and import GetEngineFactoryD3D12() function LoadGraphicsEngineD3D12(GetEngineFactoryD3D12); auto *pFactoryD3D11 = GetEngineFactoryD3D12(); EngineD3D12Attribs EngD3D12Attribs; EngD3D12Attribs.CPUDescriptorHeapAllocationSize[0] = 1024; EngD3D12Attribs.CPUDescriptorHeapAllocationSize[1] = 32; EngD3D12Attribs.CPUDescriptorHeapAllocationSize[2] = 16; EngD3D12Attribs.CPUDescriptorHeapAllocationSize[3] = 16; EngD3D12Attribs.NumCommandsToFlushCmdList = 64; RefCntAutoPtr<IRenderDevice> pRenderDevice; RefCntAutoPtr<IDeviceContext> pImmediateContext; SwapChainDesc SwapChainDesc; RefCntAutoPtr<ISwapChain> pSwapChain; pFactoryD3D11->CreateDeviceAndContextsD3D12( EngD3D12Attribs, &pRenderDevice, &pImmediateContext, 0 ); pFactoryD3D11->CreateSwapChainD3D12( pRenderDevice, pImmediateContext, SwapChainDesc, hWnd, &pSwapChain ); Creating Resources
      Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. To create a buffer, you need to populate BufferDesc structure and call IRenderDevice::CreateBuffer(). The following code creates a uniform (constant) buffer:
      BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); Similar, to create a texture, populate TextureDesc structure and call IRenderDevice::CreateTexture() as in the following example:
      TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); Initializing Pipeline State
      Diligent Engine follows Direct3D12 style to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.)
      Creating Shaders
      To create a shader, populate ShaderCreationAttribs structure. An important member is ShaderCreationAttribs::SourceLanguage. The following are valid values for this member:
      SHADER_SOURCE_LANGUAGE_DEFAULT  - The shader source format matches the underlying graphics API: HLSL for D3D11 or D3D12 mode, and GLSL for OpenGL and OpenGLES modes. SHADER_SOURCE_LANGUAGE_HLSL  - The shader source is in HLSL. For OpenGL and OpenGLES modes, the source code will be converted to GLSL. See shader converter for details. SHADER_SOURCE_LANGUAGE_GLSL  - The shader source is in GLSL. There is currently no GLSL to HLSL converter. To allow grouping of resources based on the frequency of expected change, Diligent Engine introduces classification of shader variables:
      Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. This post describes the resource binding model in Diligent Engine.
      The following is an example of shader initialization:
      ShaderCreationAttribs Attrs; Attrs.Desc.Name = "MyPixelShader"; Attrs.FilePath = "MyShaderFile.fx"; Attrs.SearchDirectories = "shaders;shaders\\inc;"; Attrs.EntryPoint = "MyPixelShader"; Attrs.Desc.ShaderType = SHADER_TYPE_PIXEL; Attrs.SourceLanguage = SHADER_SOURCE_LANGUAGE_HLSL; BasicShaderSourceStreamFactory BasicSSSFactory(Attrs.SearchDirectories); Attrs.pShaderSourceStreamFactory = &BasicSSSFactory; ShaderVariableDesc ShaderVars[] =  {     {"g_StaticTexture", SHADER_VARIABLE_TYPE_STATIC},     {"g_MutableTexture", SHADER_VARIABLE_TYPE_MUTABLE},     {"g_DynamicTexture", SHADER_VARIABLE_TYPE_DYNAMIC} }; Attrs.Desc.VariableDesc = ShaderVars; Attrs.Desc.NumVariables = _countof(ShaderVars); Attrs.Desc.DefaultVariableType = SHADER_VARIABLE_TYPE_STATIC; StaticSamplerDesc StaticSampler; StaticSampler.Desc.MinFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MagFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MipFilter = FILTER_TYPE_LINEAR; StaticSampler.TextureName = "g_MutableTexture"; Attrs.Desc.NumStaticSamplers = 1; Attrs.Desc.StaticSamplers = &StaticSampler; ShaderMacroHelper Macros; Macros.AddShaderMacro("USE_SHADOWS", 1); Macros.AddShaderMacro("NUM_SHADOW_SAMPLES", 4); Macros.Finalize(); Attrs.Macros = Macros; RefCntAutoPtr<IShader> pShader; m_pDevice->CreateShader( Attrs, &pShader ); Creating the Pipeline State Object
      To create a pipeline state object, define instance of PipelineStateDesc structure. The structure defines the pipeline specifics such as if the pipeline is a compute pipeline, number and format of render targets as well as depth-stencil format:
      // This is a graphics pipeline PSODesc.IsComputePipeline = false; PSODesc.GraphicsPipeline.NumRenderTargets = 1; PSODesc.GraphicsPipeline.RTVFormats[0] = TEX_FORMAT_RGBA8_UNORM_SRGB; PSODesc.GraphicsPipeline.DSVFormat = TEX_FORMAT_D32_FLOAT; The structure also defines depth-stencil, rasterizer, blend state, input layout and other parameters. For instance, rasterizer state can be defined as in the code snippet below:
      // Init rasterizer state RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; //RSDesc.MultisampleEnable = false; // do not allow msaa (fonts would be degraded) RasterizerDesc.AntialiasedLineEnable = False; When all fields are populated, call IRenderDevice::CreatePipelineState() to create the PSO:
      m_pDev->CreatePipelineState(PSODesc, &m_pPSO); Binding Shader Resources
      Shader resource binding in Diligent Engine is based on grouping variables in 3 different groups (static, mutable and dynamic). Static variables are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. They are bound directly to the shader object:
       
      PixelShader->GetShaderVariable( "g_tex2DShadowMap" )->Set( pShadowMapSRV ); Mutable and dynamic variables are bound via a new object called Shader Resource Binding (SRB), which is created by the pipeline state:
      m_pPSO->CreateShaderResourceBinding(&m_pSRB); Dynamic and mutable resources are then bound through SRB object:
      m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "tex2DDiffuse")->Set(pDiffuseTexSRV); m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); The difference between mutable and dynamic resources is that mutable ones can only be set once for every instance of a shader resource binding. Dynamic resources can be set multiple times. It is important to properly set the variable type as this may affect performance. Static variables are generally most efficient, followed by mutable. Dynamic variables are most expensive from performance point of view. This post explains shader resource binding in more details.
      Setting the Pipeline State and Invoking Draw Command
      Before any draw command can be invoked, all required vertex and index buffers as well as the pipeline state should be bound to the device context:
      // Clear render target const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); m_pContext->SetPipelineState(m_pPSO); Also, all shader resources must be committed to the device context:
      m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); When all required states and resources are bound, IDeviceContext::Draw() can be used to execute draw command or IDeviceContext::DispatchCompute() can be used to execute compute command. Note that for a draw command, graphics pipeline must be bound, and for dispatch command, compute pipeline must be bound. Draw() takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
      DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); Build Instructions
      Please visit this page for detailed build instructions.
      Samples
      The engine contains two graphics samples that demonstrate how the API can be used.
      AntTweakBar sample demonstrates how to use AntTweakBar library to create simple user interface. It can also be thought of as Diligent Engine’s “Hello World” example.

       
      Atmospheric scattering sample is a more advanced one. It demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to textures, using compute shaders and unordered access views, etc. 

       
      The engine also includes Asteroids performance benchmark based on this demo developed by Intel. It renders 50,000 unique textured asteroids and lets compare performance of D3D11 and D3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures. 

      Integration with Unity
      Diligent Engine supports integration with Unity through Unity low-level native plugin interface. The engine relies on Native API Interoperability to attach to the graphics API initialized by Unity. After Diligent Engine device and context are created, they can be used us usual to create resources and issue rendering commands. GhostCubePlugin shows an example how Diligent Engine can be used to render a ghost cube only visible as a reflection in a mirror.

       
  • Popular Now