# DX11 HLSL Addition of two float4 yields zero

## Recommended Posts

Posted (edited)
26 minutes ago, Magogan said:

You do realize that the dot product can be negative and saturate clamps i﻿t to the interval from 0 to 1, so ﻿it would be 0?﻿﻿﻿

Oh ok, I didn't realize that if it's negative it would clamp to zero. Then it makes sense.

So all the buffers need to be multiples of 16 bytes? So the declspec just makes sure that they are aligned to boundaries when allocated but doesn't pad them automatically?

And should I also do it on this struct as well?

__declspec(align(16)) struct CameraPosition
{
DirectX::XMFLOAT3 EyePosition;
};


Hold on, I just tested it and it appears declspec automatically pads the class as well.

I did sizeof(CameraPosition) without declspec and it was 12, with declspec the size was 16. So why do I need to pad anything?

Edited by VanillaSnake21

##### Share on other sites
21 minutes ago, VanillaSnake21 said:

So all the buffers need to be multiples of 16 bytes? So the declspec﻿﻿﻿ just makes sure that they are aligned to boundaries when allocated but doesn't pad them automatically? ﻿

Yes and no. __declspec(align(16)) does only work for some stack allocations (normal definitions, but apparently not on function args), not on the heap - it adds padding at the end to make the size a multiple of 16 though. But the structure itself doesn't even need to be aligned on the CPU side unless you use XMVECTOR or XMMATRIX (which you shouldn't do, just use XMFLOAT(n) or XMFLOAT(n)X(m)). You need to align the vectors so they do not lie in multiple 16 byte chunks, e.g. float3 float3 float float wouldn't work because the first element of the second float3 is in the first 16 bytes and the other 2 elements are in the second 16 byte chunk. Just add padding such that this doesn't happen (in this case float3 float float3 float) and you'll be fine.

You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

##### Share on other sites
2 minutes ago, Magogan said:

Yes and no. __declspec(align(16)) does only work for some stack allocations (normal definitions, but apparently not on functi﻿on args), not on the heap - it adds padding at the end to make the size a multiple of 16 though. But the structure itself doesn't even need to be aligned on the CPU side unless you use XMVECTOR or XMMATRIX (which you shouldn't do, just use﻿ XMFLOAT(n) or XMFLOAT(n)X(m)). You need to align the vectors so they do not lie in multiple 16 byte chunks, e.g. float3 float3 float float wouldn't work because the first element of the second float3 is in the first 16 bytes and the other 2 elements are in the second 16 byte chunk. Just add padding such that this doesn't happen (in this case float3 float float3 float﻿) and you'll be fine.

﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿ You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

This is just really new stuff to me so I'm trying to get a hang of it. I currently use XMVECTOR and XMMATRIX liberally on the cpu structures anytime I need computations, you're saying I shouldn't do that for speed reasons? So what can I use then because XMFLOAT doesn't allow any kind of math operations (can't add them multiply etc).

Also you're saying that I can have unaligned structures on the CPU but then have them aligned on the GPU, I just want to make sure I understand that correctly. So on the cpu it's sufficient to just use declspec, but I dont' necessarily have to add the elements to the structure?

This is what I mean:

//CPU
__declspec(align(16)) struct DirectionalLight
{
DirectX::XMFLOAT3 LightDirection;
DirectX::XMFLOAT4 LightColor;

};

cbuffer dlight
{
float3 direction;
float4 color;
};

So that should be fine and it's not going to mess with the buffer copying or anything like that (since the cpu structure has one less element)?

Quote

﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿﻿ You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

But what if I want to update the buffer in various places in my code? For example lets say I have this buffer:

__declspec(align(16)) struct cbuff_perframe
{

DirectX::XMMATRIX View;
DirectX::XMMATRIX Projection;

DirectX::XMMATRIX World;
DirectX::XMFLOAT3 CameraEyePosition;
};

And I want to update the View and Projection inside main Update method of the app, but I'd like to update the World matrix and the CamEyePosition inside the update method of other objects, how would I do that?

So in other words how can I just update lets say the CameraEyePosition by itself in one place and World in other place if they're in the same buffer?

##### Share on other sites
Posted (edited)

You can use XMVECTOR and XMMATRIX but the thing is that those must be aligned to 16 bytes. And if you want to have those on the heap things get really messy as you would need to use aligned_malloc or whatever it was called. So, for computations use XMVECTOR and XMMATRIX, if you need to store data on the heap in general (that includes the buffers but also other classes and structs), use the XMFLOAT* data types.

The dlight structure in the shader is correct, the one on the CPU side needs to match it though. What I was trying to say is the structure itself doesn't need to be aligned (its memory address doesn't need to be a multiple of 16) but the elements inside need to have the same layout as in the shader and they need to be organized into 16 byte chunks.

So both on the CPU and GPU the layout needs to be:

------ start of structure, cpu memory location doesn't need to be multiple of 16 (e.g. 0xCDF12528 would be fine).
float3
float
------ 16 bytes
float4
------ 32 bytes

I don't know of any way to update only parts of the constant buffer, so just update the whole one or use multiple buffers.

Edited by Magogan

##### Share on other sites
Posted (edited)
6 minutes ago, Magogan said:

You can use XMVECTOR and XMMATRIX but the thing is that those must be aligned to 16 bytes. And if you want to have those on the heap things get really messy as you would need to use aligned_malloc or whatever it was called. So, for computations use XMVECTOR and XMMATRIX, if you need to store data in a buffer or a struct, use the XMFLOAT* data types.

The dlight structure in the shader is correct, the one on the CPU side needs to match it though. What I was trying to say is the structure itself doesn't need to be aligned (it's memory address doesn't need to be a multiple of 16) but the elements inside need to have the same layout as in the shader and they need to be organized into 16 byte chunks.

﻿﻿﻿ So both on the CPU and GPU the layout ﻿needs to be:


------ start of structure, cpu memory location doesn't need to be multiple of 16 (e.g. 0xCDF12528) would be fine.
float3
float
------ 16 bytes
float4
------ 32 bytes

I don't know of any way to update only parts of the constant buffer, so just update the whole one or use multiple buffers.

Ok, I'll keep it in mind. Thanks for your help!

Edited by VanillaSnake21

## Create an account

Register a new account

• 9
• 12
• 15
• 12
• 24