# Multiples of 16B constant buffer or Microsoft gone mad.

This topic is 1185 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

heya

i was doing my stuff when i got a weird error. i could not create a constant buffer, filling in the actual size of the structure (12B = 3 floats) in 'D3D11_BUFFER_DESC.ByteWidth'

i debugged fof hours just to discover that it works for 48B, but not 47B... then i thought.. is it multiples of 16 for some reason. after a few hours I found this, saying:

Remarks

This structure is used by ID3D11Device::CreateBuffer to create buffer resources.

In addition to this structure, you can also use the CD3D11_BUFFER_DESC derived structure, which is defined in D3D11.h and behaves like an inherited class, to help create a buffer description.

If the bind flag is D3D11_BIND_CONSTANT_BUFFER, you must set the ByteWidth value in multiples of 16, and less than or equal to D3D11_REQ_CONSTANT_BUFFER_ELEMENT_COUNT.

what the hell? why? what is this? how do i know what size do i need to set based on the size of my struct?

most importantly... is there another, better way?!

##### Share on other sites

Also, alignment.

a-har a-har a-har... not.

seriously, one thing is to read documentation the other is reading hundreds of pages just to prevent something you have no idea of what. most functions and other programming instruments offer functionality that isn't needed most of the time. that's like reading about rocket science just because you are interested in a fuel tank that is also used in rockets by coincidence.

##### Share on other sites

You'll want to familiarize yourself with constant buffer packing rules.

this? because HLSL expects 16B aligned data for faster processing(ala less address computation) ?

Edited by InfoGeek

##### Share on other sites

One of the most, if not the most, useful programming skills you will learn is how to learn. I'm not saying you don't know how to learn - I'm saying learning how to learn to use an API (identifying what is important to get started, what is the "functionality that isn't needed most of the time", and so on) or language or piece of hardware is a skill in itself that takes practice and experience.

Understanding that data alignment is a fundamental concept with 3d APIs (or nearly any hardware-related API really) is something you must learn. Now, when you go and try out a new 3d API in the future, you won't blindly dig through 100s of pages of documentation, but instead you will say "ah yes I bet it's this".

##### Share on other sites

One of the most, if not the most, useful programming skills you will learn is how to learn. I'm not saying you don't know how to learn - I'm saying learning how to learn to use an API (identifying what is important to get started, what is the "functionality that isn't needed most of the time", and so on) or language or piece of hardware is a skill in itself that takes practice and experience.

Understanding that data alignment is a fundamental concept with 3d APIs (or nearly any hardware-related API really) is something you must learn. Now, when you go and try out a new 3d API in the future, you won't blindly dig through 100s of pages of documentation, but instead you will say "ah yes I bet it's this".

while i agree, this doesn't really help. or are you implying that i should go and learn everything by myself? if so then, socialisation and tools to collaboratively solve problems exist for a reason.

##### Share on other sites

OpenGL aligns buffer range binding to 32 bytes (for most cards). JVM aligns object's fields to 8 bytes (which is important to estimate the size of your objects). Alignment is everywhere dude.

##### Share on other sites

Almost every piece of hardware has alignment requirements.

There is really only one exception:  The x86 family decided in the early 1980s to allow misaligned integers and to (silently) take a performance hit instead. Here's one of many results, suggesting somewhere around a 2x performance hit, although it is hard to be certain due to the nature of modern processors. Back in the 386 era it was closer to a 3x performance hit. In 2011 Intel "fixed" them starting in the Sandy Bridge architecture, I'm not sure about AMD. Of course, the value can still cross two cache lines which is still somewhat painful.

Just about every other chipset out there will crash on misaligned values.

Keep your data properly aligned. Don't force your compiler to use strange packing values. It is usually a sign you're doing something wrong.

##### Share on other sites

thanks Samith for the debug layer, it seems to report the alignment requirement. for anyone who views this thread with similar problems this is a note to pay attention to:

If you run your application with the debug layer enabled, the application will be substantially slower. But, to ensure that your application is clean of errors and warnings before you ship it, use the debug layer.

i can see how it can get 2x works, as doing 2 reads instead of 1 to obtain the whole value(yeah, optimisation might lower the hits but purely theoretically i think that's about right).

1. i am aware that alignment exists and it's impact, but i only met and comprehend the concept of 4B and 8B alignments, because that's the "word" "dword" and makes sense for 32bit and 64bit reads. i can't imagine why 16B would be forced...

2. assuming it is as it is, what would be a solution to this problem?
do i just ignore the 4 byte overhead(or 12B--15B if I only wanted to pass 1 char). if ignoring is an option then i might as well create more variable "for free" in the remaining 16B alignment gaps with no additional performance hits, yeah?

Edited by InfoGeek

##### Share on other sites

One of the most, if not the most, useful programming skills you will learn is how to learn. I'm not saying you don't know how to learn - I'm saying learning how to learn to use an API (identifying what is important to get started, what is the "functionality that isn't needed most of the time", and so on) or language or piece of hardware is a skill in itself that takes practice and experience.

Understanding that data alignment is a fundamental concept with 3d APIs (or nearly any hardware-related API really) is something you must learn. Now, when you go and try out a new 3d API in the future, you won't blindly dig through 100s of pages of documentation, but instead you will say "ah yes I bet it's this".

while i agree, this doesn't really help. or are you implying that i should go and learn everything by myself? if so then, socialisation and tools to collaboratively solve problems exist for a reason.

Honestly, I was just responding to your second post and trying to encourage you about this process (including the socialization that has occurred because of it) since you seemed frustrated in that post.

Edited by achild

##### Share on other sites

Honestly, I was just responding to your second post and trying to encourage you since you seemed frustrated.

i failed to grasp that. i thank you and apologise at the same time.

##### Share on other sites

Honestly, I was just responding to your second post and trying to encourage you since you seemed frustrated.

i failed to grasp that. i thank you and apologise at the same time.

Don't even worry about it - I should have quoted what I was responding to.

##### Share on other sites

Don't even worry about it - I should have quoted what I was responding to.

The GPU registers are 16 bytes wide or 4 floats.  The alignment ensures the constants fit nicely in the registers without straddling them. A float4 stored across two registers would be inefficient. On the HLSL side, cbuffer structures are automatically aligned and padded as necessary.  On the C++ side, you'll need to manually pad your structures to match the "hidden" padding on the HLSL side.

that explain it. is it written somewhere in the documentation manuals?

also i'm still hoping to get a endorsement on on the "free" padding notion, because i wouldn't want to meet some little details that would damage me in the long run if i get it wrong.

Edited by InfoGeek

##### Share on other sites

also i'm still hoping to get a endorsement on on the "free" padding notion, because i wouldn't want to meet some little details that would damage me in the long run if i get it wrong.

I'm not entirely sure what you mean by "free", but here's an example: if you have a constant buffer that contains only a single float (so, 4 bytes), then no matter what that constant buffer will be 16 bytes wide, even if you only care about the 4 byte float. So yeah, you have 12 bytes that you can put stuff in without increasing the size of your constant buffer. So, the extra space is essentially "free".

##### Share on other sites

I'm not entirely sure what you mean by "free", but here's an example: if you have a constant buffer that contains only a single float (so, 4 bytes), then no matter what that constant buffer will be 16 bytes wide, even if you only care about the 4 byte float. So yeah, you have 12 bytes that you can put stuff in without increasing the size of your constant buffer. So, the extra space is essentially "free".

that's exactly what i meant by "free".

thanks everyone, it seems you have explained the whole lot i need(unless there is actually a better way to pass 3 floats to HLSL rather than via Constant Buffer wasting alignment space).

edit: unless megadan want to provide some links on where he got the information on GPU registers being 16B and HLSL auto-padding :3

Edited by InfoGeek

##### Share on other sites

edit: unless megadan want to provide some links on where he got the information on GPU registers being 16B and HLSL auto-padding :3

The "Dimension" column shows the size of registers in "components". For the ConstantBuffer registers: 4 32-bit floats (or ints) x 4 components = 16 bytes.

##### Share on other sites

Also, alignment.

a-har a-har a-har... not.

seriously, one thing is to read documentation the other is reading hundreds of pages just to prevent something you have no idea of what. most functions and other programming instruments offer functionality that isn't needed most of the time. that's like reading about rocket science just because you are interested in a fuel tank that is also used in rockets by coincidence.

RTFM, when reading msdn documentation always, ALWAYS, read the remarks section! In this case the allignment of 16 bytes is done because you can nicely stuff 4 floats in there, or a vector4. Coincidentally CPUs like this allignement because the can now write a nice block to another nice block without having to do address computations.

By allowing you to specify your own buffers you can enforce the allignment by using __declspec( align( 16 ) ) before struct in MSVC.

__declspec( align( 16 ) ) struct MyStruct
{
}


This will pad your structure so it always lands on a 16 byte boundary.

And even in D3D9 times MS actually always wrote on a 16Byte boundary to the shaders in the effect code everything was sent as a float4 to the shader even if you called a setFloatParam on the effect or shader. MS has just made it easier for you to construct your buffers, without lifting this restriction.

Edited by NightCreature83

##### Share on other sites

This topic is 1185 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628730
• Total Posts
2984431

• 25
• 11
• 10
• 16
• 14