# Copy from UAV to cbuffer in DX11 without Map/Unmap

## Recommended Posts

Hi guys,
is it possible to copy from RWStructuredBuffer<float2x4> to a cbuffer of the same size using CopyResource function?
According MSDN if size, format, etc is the same, it should work.
There is a note "You can't use an Immutable resource as a destination." - I guess by immutable they mean D3D11_USAGE_IMMUTABLE, so I used radher D3D11_USAGE_DEFAULT.

the RWStructuredBuffer<float2x4> is created as this:

D3D11_BUFFER_DESC desc;
desc.ByteWidth = 2048; //64 lights * size of float2x4
desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
desc.StructureByteStride = 32; //size of float2x4
desc.Usage = D3D11_USAGE_DEFAULT;
hr = m_p_device->CreateBuffer(&desc, 0, &sourceBuffer);

D3D11_UNORDERED_ACCESS_VIEW_DESC uavd;
uavd.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavd.Format = DXGI_FORMAT_UNKNOWN;
uavd.Buffer.NumElements = 64;
hr = m_p_device->CreateUnorderedAccessView(sourceBuffer, &uavd, &sourceBufferView);

// generating 64 lights and store them in the sourceBuffer

then the cbuffer is created as this:

D3D11_BUFFER_DESC desc;
desc.ByteWidth = 2048; //64 lights * size of float2x4
desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
desc.Usage = D3D11_USAGE_DEFAULT;
hr = m_p_device->CreateBuffer(&desc, 0, &destinationBuffer);

then the copy is done via deferred context:

m_p_deferred_context->CopyResource(destinationBuffer, sourceBuffer);

// call the final lighting shader

In my lighting shader, I have 64 lights, float4 for color, float4 for position in view space, therefore float2x4.
The colors and positions of the lights are generated in another shader on the fly, so I store them in RWStructuredBuffer<float2x4>.
Then in my final lighting shader, I have to read all 64 lights per pixel, so I could just read the data again from RWStructuredBuffer<float2x4>.
However, since I'm doing tons of other texture reading, I think it totally breaks the texture cache, because I get a huge fps drop.
So I tried to move the RWStructuredBuffer<float2x4> data into a cbuffer and I got almost double performance.
The problem is, it appears that the data layout of these buffers is somehow different.

For debuging, I divided the screen into 8x8=64 squares and every square displayes a color of the light from the RWStructuredBuffer<float2x4>;
If I read it as RWStructuredBuffer<float2x4>, everything is correct a few red, green and white lights:

However if I read it now from the copied cbuffer, I got this, the color channels are somehow messed up.
Obviously, some data was copied and even the pattern was preserved:

Any idea, what could happend, how to do it correctly?

I could just do Map/Unmap, but since it's a deferred context, it's a bit tricky, moreover, I'd like to avoid any CPU communication and another staging buffer, so I'd like to just use CopyResource.

Thanks.

Edited by gamer9xxx

##### Share on other sites

float2x4 stuff[64];   - Is not 2048 bytes, it's 4096 bytes as each 'register' in a constant buffer is padded to float4.

No such padding will occur with a StructuredBuffer, so perhaps you're copying a 2048 byte structured buffer into the first half of a constant buffer that the compiler is expecting to be 4096? You probably wanted float4x2 stuff[64] instead?

Can you show me your cbuffer layout so we can be sure that that's the problem? I expect either you've only got half the data in the right place or it has been transposed between float2x4 and float4x2.

##### Share on other sites

It seems you are right.
My cbuffer looks as you wrote.

cbuffer GILights : register(b2)
{
float2x4 GIColorViewPosition[64];
};

But when I change it to the float4x2, the problem is when I try to read this:

float4 color = GIColorViewPosition[ i ][ 0 ];

the compiler complains, it cannot convert float2 to float4, perhaps it's related to the fact I compile the shader with D3DCOMPILE_PACK_MATRIX_ROW_MAJOR.

Is it really that, this flag packs not just matrix type, but all the float#x# types and all related int, bool, etc versions of this type?

When I store lights in RWStructuredBuffer<float4x2> then read them from RWStructuredBuffer<float2x4>, I will get exactly the same broken image, so it must be the problem you just described.

Edited by gamer9xxx

##### Share on other sites

float4x2 and float2x4 are every bit as much a 'matrix' as float4x4 for the purposes of packing.

/Zpr (Row Major Packing) will affect float2x4/float4x2 and will cause them to take 4096 bytes instead of 2048 and vice versa depending on whether that flag is set.

This shader, when compiled with /Zpr is a 2048 byte constant buffer and reads float4's:

cbuffer B
{
float2x4 stuff[64];
}

float4 main(uint i : I) : SV_TARGET
{
return stuff[i][0] + stuff[i][1];
}

## Create an account

Register a new account