Sign in to follow this  
noodleBowl

DX11 Constant buffer and names?

Recommended Posts

I've gotten to part in my DirectX 11 project where I need to pass the MVP matrices to my vertex shader. And I'm a little lost when it comes to the use of the constant buffer with the vertex shader

I understand I need to set up the constant buffer just like any other buffer:

1. Create a buffer description with the D3D11_BIND_CONSTANT_BUFFER flag
2. Map my matrix data into the constant buffer
3. Use VSSetConstantBuffers to actually use the buffer

But I get lost at the VertexShader part, how does my vertex shader know to use this constant buffer when we get to the shader side of things

In the example I'm following I see they have this as their vertex shader, but I don't understand how the shader knows to use the MatrixBuffer cbuffer. They just use the members directly. What if there was multiple cbuffer declarations like the Microsoft documentation says you could have?

//Inside vertex shader
cbuffer MatrixBuffer
{
    matrix worldMatrix;
    matrix viewMatrix;
    matrix projectionMatrix;
};

struct VertexInputType
{
    float4 position : POSITION;
    float4 color : COLOR;
};

struct PixelInputType
{
    float4 position : SV_POSITION;
    float4 color : COLOR;
};

PixelInputType ColorVertexShader(VertexInputType input)
{
    PixelInputType output;
    

    // Change the position vector to be 4 units for proper matrix calculations.
    input.position.w = 1.0f;

    // Calculate the position of the vertex against the world, view, and projection matrices.
    output.position = mul(input.position, worldMatrix);
    output.position = mul(output.position, viewMatrix);
    output.position = mul(output.position, projectionMatrix);
    
    // Store the input color for the pixel shader to use.
    output.color = input.color;
    
    return output;
}

 

Share this post


Link to post
Share on other sites

You'll notice on the VS/GS/PS SetConstantBuffers function there is a start slot argument, and an input array of constant buffer pointers.

When a shader is compiled it assigns a constant buffer slot to each of the cbuffers in the shader. The example shader you posted makes it easy because there is only one constant buffer, which means that it's assigned to slot/register 0. There are 16 constant buffer registers but you shouldn't need that many.

I personally do not trust automatic register assignment at all! I believe it assigns them from top-to-bottom, but if you are paranoid like me you can set the constant buffer slot like so:

cbuffer MatrixBuffer : register(b0)
{
    matrix worldMatrix;
    matrix viewMatrix;
    matrix projectionMatrix;
};

where MatrixBuffer is now assigned to constant buffer slot/register 0.

Otherwise, the mapping to variables in the constant buffer itself is based on the byte data that you map into the constant buffer. I suggest that you read this article about constant buffer packing rules. You have to make sure that byte data you map in matches the appropriate data type in your shader, and that your types meet the 4-byte alignment, 16-byte boundary rules.

Feel free to ask more questions about this because it can cause wacky behavior if you aren't aware of it. For instance if you have a constant buffer like this:

cbuffer MatrixBuffer
{
    float2 SomeData;
    float4 SomeOtherData;
};

You will need to map in 8 floats of data total. Two for the opening float2, two to garbage-pad the remaining 2 floats in the 16 byte (or 4 float) boundary, and then 4 floats to map to the float4. The two floats are required for padding in order for the data to be where you expect it.

Share this post


Link to post
Share on other sites

Normally we explicitly define the register slots.

So for const buffers you would do:

 
cbuffer MyBuffer0 : register(b0)
{
// Declarations..
};

cbuffer MyBuffer1 : register(b1)
{
// Declarations..
};

cbuffer MyBuffer2 : register(b2)
{
// Declarations..
};

If you do not explicitly tell the register slots, the compiler will assign them for you and you have to retrieve them via HLSL reflection (which is cumbersome and error prone).

When you call VSSetConstantBuffers( 0, ... ) the 0 will correspond to MyBuffer0, and VSSetConstantBuffers( 1, .. ) will correspond to MyBuffer1, etc.

 

In the case of your buffer:

cbuffer MatrixBuffer
{
    matrix worldMatrix;
    matrix viewMatrix;
    matrix projectionMatrix;
};

If the buffer you bind via VSSetConstantBuffers is less than the 192 bytes required for this structure (4x4 x 4 bytes per float x 3 matrices) the debug layer will complain, but you are guaranteed that reading const buffers out of bounds will return 0.

Share this post


Link to post
Share on other sites
22 hours ago, BrentMorris said:

I believe it assigns them from top-to-bottom, but if you are paranoid like me you can set the constant buffer slot like so:

Indeed, the compiler will assign resource slots in increasing order based on where the resource was declared in the file. However the catch here is that it will do this for resources that are actually used by the shader program. So if you have 3 textures in a row, but you only use the first and third, the first texture will be assigned to t0 and the third will be assigned to t1! In those cases the only reliable way to bind things correctly is to use the reflection API's to query the slot for each resource.

Share this post


Link to post
Share on other sites

Awesome! When I was reading those docs I wasn't really sure what those register declarations were for. This definitely made it clearer. Thanks!

When it comes to the packing can you explain what is meant by 16-byte boundary rules. Are you saying that in addition to my data being 4 byte aligned it must be divisible by 16 too?

Example:

//Example cbuffer
cbuffer test : register(b0)
{
   float a;
   float b;
}

//Above cbuffer is 4 byte aligned. 4 bytes per float * 2 floats = 8 bytes total; 8 mod 4 = 0 so meets being 4 byte aligned
//BUT does not meet 16-byte boundary rules
//4 bytes per float * 2 floats = 8 bytes total; 8 mod 16 = 8 so does not meet 16-byte boundary rules

So to have the above meet the 16-byte boundary rules I need to add 2 "garbage" floats, which would make the total bytes of the cbuffer divisible by 16

Edited by noodleBowl

Share this post


Link to post
Share on other sites

No. That declaration is just fine.


What the alignment means is that if you've got:

float3 a;
float2 b;

Then the address of b when you write the data from C++ starts at 0x0000010 instead of starting at 0x0000000C because there's 4 bytes of padding between a & b

Please read the msdn article BrentMorris left you. It has plenty of examples on how the padding works.

Share this post


Link to post
Share on other sites

Went back and re-read the packing article. Totally missed that part about how things are auto placed into 4 slot vectors and bumped to the next one if it does not fit entirely.

So it does make sense that my example was fine since it was only 2 floats. And why the example @Matias Goldberg had needs to have 2 of these vector components. The float3 fits into the 1st component, but since the next variable which is a float2 cannot be completely contained in the 1st component it gets placed into the next one

One thing is still shady for me and that is the 16 byte boundary rule. I really don't understand what the article means by it. Are we just placing our variables in 16-byte blocks?

In the article they have

//2 x 16byte elements
cbuffer IE
{
  float1 val1;
  float1 val2;
  float1 val3;
  float2 val4;
}

//3 float1 x 4 bytes = 12
//1 float2 x 4 bytes =  8
//___________________  20 bytes total
//First 16 bytes placed into a container. Next 4 bytes bumped into the next 16 byte container?

Is this idea right?

Share this post


Link to post
Share on other sites

In DXBC assembly, constant buffers are made up of "elements" that are 16 bytes wide. So the constant buffer will always be made up of N elements, where the total size is then 16 * N bytes. This is why you have to create your constant buffers rounded up to the next multiple of 16 bytes when you call CreateBuffer(). This is also the reason for trying to pack vector types so that they don't cross 16-byte element boundaries. DXBC is basically virtual ISA that works in terms of 4-component vectors, which means that registers and instructions can typically work with 4 values at a time. This applies to constant buffers as well, where each element is 16-byte value that can be treated as a 4-component vector, and can be used in instructions as if it were a register. As an example, let's look at a simple shader and it's resulting DXBC output from the compiler:

cbuffer MyConstants
{
    float4 MyValue;
};

float4 PSMain() : SV_Target0
{
    return MyValue * 8.0f;
}

// ps_5_0
// dcl_globalFlags refactoringAllowed
// dcl_constantbuffer CB0[1], immediateIndexed
// dcl_output o0.xyzw
// mul o0.xyzw, cb0[0].xyzw, l(8.000000, 8.000000, 8.000000, 8.000000)
// ret

You'll see that the whole program is really just a single instruction, where it basically says "multiply the first float4 element from the constant buffer with 8.0". Since "MyValue" is a float4 and is lined up on exactly with a constant buffer "element", the DXBC assembly can reference all of that data and multiply it with a single instruction. Now let's try another example where we split up "MyValue" so that it straddles a 16-byte boundary, which causes it to be located in two different constant buffer elements:

cbuffer MyConstants
{
    float3 SomeOtherValue;
    float MyValue_X;
    float3 MyValue_XYZ;
};

float4 PSMain() : SV_Target0
{
    return float4(MyValue_X, MyValue_XYZ) * 8.0f;
}

// ps_5_0
// dcl_globalFlags refactoringAllowed
// dcl_constantbuffer CB0[2], immediateIndexed
// dcl_output o0.xyzw
// mul o0.x, cb0[0].w, l(8.000000)
// mul o0.yzw, cb0[1].xxyz, l(0.000000, 8.000000, 8.000000, 8.000000)
// ret

In this case the compiler has to emit two separate instructions to perform the multiply, since the instruction can only use a single constant buffer element as an operand. 

Do keep in mind that this is all rather specific to the particulars of DXBC's virtual ISA, which can be (and very often is) very different from the actual native instructions executed by the GPU. For example, Nvidia and AMD have long ago dropped the notion of vector instructions within a single execution thread, and instead only work with scalar operations. So in that case a float4 multiply will always expand out to 4 individual instructions, and so it necessarily doesn't gain them anything to have the source data aligned to a 16-bute boundary in the constant buffer. The new open-source DirectX shader compiler (dxc) has a completely different (scalar) output format, and so they might even change the packing rules for that compiler in the future.

Share this post


Link to post
Share on other sites
15 hours ago, MJP said:

Do keep in mind that this is all rather specific to the particulars of DXBC's virtual ISA, which can be (and very often is) very different from the actual native instructions executed by the GPU. For example, Nvidia and AMD have long ago dropped the notion of vector instructions within a single execution thread, and instead only work with scalar operations. So in that case a float4 multiply will always expand out to 4 individual instructions, and so it necessarily doesn't gain them anything to have the source data aligned to a 16-bute boundary in the constant buffer. The new open-source DirectX shader compiler (dxc) has a completely different (scalar) output format, and so they might even change the packing rules for that compiler in the future.

Since I think that sounds confusing to a beginner, I'll translate it to plain english: modern GPUs no longer work like that (they don't need such crazy alignments... for the most part, there a few exceptions not worth mentioning right now) but we're stuck with these overly conservative alignments.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627753
    • Total Posts
      2978946
  • Similar Content

    • By evensotrue
      Hello there !
      We are Team HOF
      We've made a horror game called The Factory
      Please download it here!

      Link: https://gamejolt.com/games/TheFactory/289746
      Feedback and comments are appreciated!

      Trailer

      Controls:

      WASD- Moving around
      F-Flashlight
      Q-Sliding 

      Gameplay- Find ways to go out from the factory, avoid from the killer
    • By 7th_Continuum
      I'm trying to implement a frictional constraint using position based dynamics, but it is only half working. I am using the formulation in the paper "Unified Particle Physics for Real-Time Applications".
      Here is my implementation:
      Particle *np = particle->nbs[j].neighbour; vec3 r = particle->x - np->x; float r_length = glm::length(r); float distDiff = r_length - restDistance; if(distDiff >= 0 ) continue; //Frictional Constraint vec3 n = r/r_length; vec3 xxi = particle->x - xi[particle->getIndex()]; vec3 xxj = np->x - xi[np->getIndex()]; vec3 tangentialDisplacement = (xxi - xxj) - glm::dot(xxi - xxj, n) * n; float td_length = glm::length(tangentialDisplacement); float genMass = ( imass / (imass + imass) ); if(td_length < (staticFriciton * distDiff)){ particle->x += genMass * tangentialDisplacement; np->x += -genMass * tangentialDisplacement; }else{ float upper = kineticFriction * distDiff; particle->x += genMass * tangentialDisplacement * std::min(upper/td_length, 1.f); np->x += -genMass * tangentialDisplacement * std::min(upper/td_length, 1.f); }  
    • By ScyllaBus
      Using my loop based on this: https://gafferongames.com/post/fix_your_timestep/
      Trying to get my game to run at fixed 60FPS (both update and render) for all machines. Studied the link above and have been stuck on this game loop for weeks trying to get it to work smoothly to glide this image across the screen. I had dealt constantly with jittering and possible tearing. I can't recall what I did to fix it exactly, but I believe it may have something to do with not rounding a variable properly (such as delta).
       
      So yeah, currently the loop works but I'm afraid as I develop the game more and have to render more, eventually something I'm doing in my loop could cause slowdowns or larger CPU usage. Does the structure of the game loop below seem okay or is there something I can do to optimize it?
      The 2D game is a generic sidescroller. Not too heavy on physics, mainly just simple platformer physics. I feel as though I'm using way too much CPU.
       
      void Game::mainLoop() { double fps = 60.0f; int frameSkip = 5; int deltaSkip = frameSkip; double miliPerFrame = 1000.0 / fps; double xx = 0.0f; double playSpeed = 5; Uint64 previous = SDL_GetPerformanceCounter(); double accumulator = 0.0f; bool shouldRender = false; bool running = true; while(running){ Uint64 current = SDL_GetPerformanceCounter(); double elapsed = (current-previous) * 1000; elapsed = (double) (elapsed / SDL_GetPerformanceFrequency() ); previous = current; // handleEvents() handleEvents(); // when we press escape reset x to 0 to keep testing // when he goes off screen if(key_states[SDL_SCANCODE_ESCAPE]) xx = 0; accumulator+=elapsed; if(accumulator >= miliPerFrame * frameSkip) accumulator = 0; shouldRender = accumulator >= miliPerFrame; while(accumulator >= miliPerFrame){ // update() //cout << playSpeed << endl; double delta = ceil(elapsed); if(delta > deltaSkip) delta = 1; //if(elapsed >= 1) delta = elapsed; xx+= playSpeed * delta;// * (1 / fps); // /update() accumulator -= miliPerFrame; //get what's left over } if(shouldRender){ // render() SDL_SetRenderDrawColor(gameRenderer, 0xFF, 0xFF, 0xFF, 0xFF); SDL_RenderClear(gameRenderer); imageController.drawImage("colorkeytest", floor(xx), 0); SDL_RenderPresent(gameRenderer); // /render() } } }  
    • By schneckerstein
      Hello,
      I manged so far to implement NVIDIA's NDF-Filtering at a basic level (the paper can be found here). Here is my code so far:
      //... // project the half vector on the normal (?) float3 hppWS = halfVector / dot(halfVector, geometricNormal) float2 hpp = float2(dot(hppWS, wTangent), dot(hppWS, wBitangent)); // compute the pixel footprint float2x2 dhduv = float2x2(ddx(hpp), ddy(hpp)); // compute the rectangular area of the pixel footprint float2 rectFp = min((abs(dhduv[0]) + abs(dhduv[1])) * 0.5, 0.3); // map the area to ggx roughness float2 covMx = rectFp * rectFp * 2; roughness = sqrt(roughness * roughness + covMx); //... Now I want combine this with LEAN mapping as state in Chapter 5.5 of the NDF paper.
      But I struggle to understand what theses sections actually means in Code: 
      I suppose the first-order moments are the B coefficent of the LEAN map, however things like
      float3 hppWS = halfVector / dot(halfVector, float3(lean_B, 0)); doesn't bring up anything usefull.
      Next theres:
      This simply means:
      // M and B are the coefficents from the LEAN map float2x2 sigma_mat = float2x2( M.x - B.x * B.x, M.z - B.x * B.y, M.z - B.x * B.y, M.y - B.y * B.y); does it?
      Finally:
      This is the part confuses me the most: how am I suppose to convolute two matrices? I know the concept of convolution in terms of functions, not matrices. Should I multiple them? That didn't make any usefully output.
      I hope someone can help with this maybe too specific question, I'm really despaired to make this work and i've spend too many hours of trial & error...
      Cheers,
      Julian
    • By ilovegames
      Your home planet was attacked. Now you have to use your spaceship to battle the invaders. Powerful 3D arcade with outer space background. Very addictive. Good luck!
      Download https://falcoware.com/StarFighter.php
       




  • Popular Now