[D3D12] How to correctly update constant buffers in different scenarios.

Started by
40 comments, last by SoldierOfLight 7 years, 10 months ago

Hello,

This topic is super confusing for me and I hope you'll help me to understand it better. Here several usage scenarios:

1. I have n objects and for every object I need to change a constant data before draw. Let's say the constant data ConstData have one integer field and is different for every object. I have a constant buffer consBuffer and I need to update it. In pseudo code:


ConstData cd;

cd.data = 1;

ptr = consBuffer.Map();
memcpy(ptr, cd);
constBuffer.Unmap();

draw(obj1);

cd.data = 2;

ptr = consBuffer.Map();
memcpy(ptr, cd);
constBuffer.Unmap();

draw(obj2);

...

It well known that draw doesn't happen immediately but goes to the gpu queue for later execution. And from the code I wrote I see that when the gpu is ready to actually draw object1 the constant data will be overridden by later calls to memcpy. Does that mean that I need to have a different constant buffer for every object? So if I have 100 objects and 2 back buffers I need 100 * 2 = 200 constant buffers. Is it right?

2. If I want to store a constant data directly in root signature, can I update the root signature before every draw call? Taking my previous example, I will not have calls to Map()/memcpy()/Unmap(), but will write directly to signature. Do I need to wait until previous object or frame finished rendering?

3. If I have rarely changed constant buffer, a projection matrix, for example, do I need to have different constant buffers for every render target? In other words, if projection matrix was changed can I safely update buffer only once and hope that changes will propagate to all frames in the queue?

Advertisement

3 - yes you should/could differentiate between static and dynamic cbuffers and use a different strategy for each.
A projection matrix is still dynamic though if it changes every frame, so you need at least two versions of it to sustain two frames in flight.

2 - yes you can put a small amount of constants directly in the root. These are "automatically versioned" so you don't have to worry about lifetimes/overwriting/etc - the driver takes care of it. This is a good choice for cbuffers that are both small and very dynamic (e.g. Changing every draw).

A note on terminology - the root signature is a data structure description. It defines the layout of the root, but holds no values.
The device has an automatically version Ed root that you can store values in, and a pointer to a signature, which tells it how to interpret those values.

1 - you can either allocate multiple copies of every dynamic cbuffer (number times modified per frame * number of frames in flight), or any other system with the same total memory usage.
I'm currently allocating dynamic cbuffers with regular malloc, and when the user asks to bind one, I memcpy its contents into a per-frame stack, and put the stack pointer into the root.

Thanks @Hodgman.

I'm currently allocating dynamic cbuffers with regular mallow, and when the user asks to bind one, I memcpy its contents into a per-frame stack, and put the stack pointer into the root.

I'm afraid I didn't get it. Can I ask you to explain it in more detail, with the code maybe?

I think Hodgman meant malloc, unless there's some other keyword i'm not aware of. About the constant buffers in your first question, yes, you will need to have a separate constant buffer for each object, or more specifically you will need to have enough memory in a heap to hold every changed constant buffer, so sizeof(ConstantBuffer)*numconstantbuffers*numFrames.

EDIT: In fact, you will need to have enough memory for each constant buffer per frame AS WELL as padding in between each constant buffer so that each constant buffer starts at a 256 byte aligned offset from the beginning of the heap

The stack that Hodgman was talking about could work like this:

You allocate enough memory for a heap to store all the constant buffers you will need (for a frame, as in a heap per frame, but you could also just create one giant heap if you want, but if you ever need to expand the heap for a frame, it's going to be tough). You get a pointer to the beginning of the heap, this is the start of the "stack". That pointer we will call the stack pointer. Every time a user binds a constant buffer, you memcpy that constant buffer to current stack pointer, then increment the stack pointer. Constant reads have to be 256 byte aligned, so i believe when you increment the stack pointer, you will have to make sure it's at a 256 byte interval (from the beginning of the heap). The first constant buffer you bind, you memcpy it to the stack pointer, which is the beginning of the heap, then increment the stack pointer. The next constant buffer you bind then gets memcpy'd to the new position of the constant buffer heap's "stack pointer", and then the stack pointer is increased again. Next time you come to this frame, you reset the stack pointer to the beginning of the heap and do it all over. You do this for each frame.

Thank you iedoc, now it's almost clear. The last piece of puzzle is synchronization. When I map gpu memory and memcpy the data it's not available on the gpu immediately. It can happen that actual drawing in gpu can happen before constant data arrive from cpu. This is theory, because I never saw any sync in all samples I investigated. And btw, how can I synchronize, the mapping is not a part of queue/command list api so I can't set a signal or put a barrier on it?

Basically for an upload heap, it will upload the data the first time you use it, so there is no need to put a fence on the upload heap to make sure the data is uploaded before you make a draw call that uses it. For a default heap though, you need to use a fence because the GPU can be doing copying from an upload heap to a default heap for example at the same time it is drawing. For constant buffers that change often, you don't want to put them in a default heap, otherwise you will be spending a lot of extra time waiting on fences, and doing extra copying that you dont need (since the upload heap will be uploaded either way every frame)

I think the upload heap actually gets copied to a cache on the GPU the first time its used when a command list is executed, so every draw call that needs data from the upload heap just reads from the cache.

My take on this would be to use per-instance data in addition to per-vertex data, instead of constant buffers. For example, when you are drawing 2500 identical cubes but with different colors and world transformation, I would add an input element for the color and another for the world transform, each element spanning one instance. Afterwards, just set the instance data buffer as the second vertex buffer (this is why IASetVertexBuffers allows you to set more than one vertex buffer, which does not make much sense alone).

In general, it would be something like this


// CreateDeviceResources()
ID3D12Resource *vertexBuffer = CreateCommitedResource with D3D12_HEAP_TYPE_DEFAULT
ID3D12Resource *instanceBuffer = CreateCommitedResource with D3D12_HEAP_TYPE_UPLOAD
// Buffers do not have to be unmapped before sent to the graphics processor. Keeping it mapped can reduce latency
void *instanceBufferData;
instanceBuffer->Map(..., &instanceBufferData);

// Update()
memcpy(instanceBufferData, newData, sizeof(VertexShaderInput_Instance));

// Render()
ID3D12GraphicsCommandList *presentList;

D3D12_VERTEX_BUFFER_VIEW vertexAndInstanceBuffer[] = {vertexBufferView, instanceBufferView};
presentList->IASetVertexBuffers(0, _countof(vertexAndInstanceBuffer), vertexAndInstanceBuffer)

presentList->DrawInstanced(_countof(vertices), instanceCount, 0, 0);

Thank you guys, now it's crystal clear.

For constant buffers I read the following advice that may be useful to you:

"

• Placement dependent on usage:

• Write once/Read once => UPLOAD
• Write once/Read many => Copy to DEFAULT
"
From page 28 of the document Hodes_Stephan_DirectX12 And Vulkan.pdf

-potential energy is easily made kinetic-

Basically for an upload heap, it will upload the data the first time you use it, so there is no need to put a fence on the upload heap to make sure the data is uploaded before you make a draw call that uses it. For a default heap though, you need to use a fence because the GPU can be doing copying from an upload heap to a default heap for example at the same time it is drawing. For constant buffers that change often, you don't want to put them in a default heap, otherwise you will be spending a lot of extra time waiting on fences, and doing extra copying that you dont need (since the upload heap will be uploaded either way every frame)

I think the upload heap actually gets copied to a cache on the GPU the first time its used when a command list is executed, so every draw call that needs data from the upload heap just reads from the cache.

There is no necessity in having explicit fence to wait for the copy operation from the upload to default heap to complete as long as your command lists are executed on the same command queue:

- copy from upload to default heap (cmd list 1)

- transition your copy dest constant buffer into shader read state (cmd list 2 or could be a part of cmd list 1)

- do drawing using the constant buffer (cmd list 3)

However, I have not seen Microsoft, AMD or Intel in their D3D12 samples following this pattern.

This topic is closed to new replies.

Advertisement