matt77hias

Deferred device contexts

Recommended Posts

I know you can record D3D11 instructions via deferred device context in command buffers which can then be replayed by the immediate context. But how does this work for buffers?

Lets say I need to do the following to render a single model:

  1. Update the model buffer.
  2. Bind the model buffer.
  3. Bind the SRVs.
  4. Bind the input layout.
  5. Draw the model.

Is it possible to concatenate multiple such sequences of commands before replaying? Or do I have to replay after only one sequence, because each sequence re-updates the buffer data?

How many threads with each their own deferred context does one normally set loose on the models? Does one still use the immediate context for direct model rendering, or does it only replay commands?

Share this post


Link to post
Share on other sites

When updating buffers, you have two options, UpdateSubResource() and Map(). Updatesubresource just works on deferred contexts normally, meaning, it probably creates a copy of the buffer and uploads to GPU. When using Map(), you can only use WRITE_DISCARD or NO_OVERWRITE flags. WRITE_DISCARD is a buffer rename operation, meaning that it allocates a new copy of the buffer in CPU accessable memory and provides you the pointer that you can write to. When using NO_OVERWRITE, then you say to the driver that you just want access to the memory, and your application will explicitly ensure that there will be no race conditions for the resource, so neither the GPU, nor an other CPU thread will be competing for the same memory. 

Take constant buffers for example which are usually updated with WRITE_DISCARD. If you have a global constant buffer like PerFrameVariables, then you have to update that for each deferred context that will reference it.

Share this post


Link to post
Share on other sites
53 minutes ago, turanszkij said:

which are usually updated with WRITE_DISCARD

But so assuming, I use the Map function with the flag WRITE_DISCARD for my constant buffers.

It just works in case of multiple deferred contexts performing this Map operation on the same resource at the same time?  I guess yes, but than a local copy needs to be stored as part of the command in the command list?

Furthermore, does it also work if one deferred context performs multiple Map operations to the same resource at the same time? I guess D3D11 uses the same mechanism and multiple local copies are stored for the same command list?

So if my reasoning is correct, this explodes with regard of memory usage if one records huge command lists?

Share this post


Link to post
Share on other sites

You on the application side can keep a single buffer resource and you can Map it however many times you like from different contexts as well. The allocations are done by the driver and I assume they allocate constant buffers from the command list memory. You might want to avoid doing this a very high amount per frame, for example AMD GCN drivers have a command buffer of 4MB and once you extend that limit, then there is probably some sort of synchronization involved. 

Share this post


Link to post
Share on other sites
10 minutes ago, turanszkij said:

for example AMD GCN drivers have a command buffer of 4MB

Ah ok, so some amount of memory is allocated in advance by the driver and more is allocated if you need more.

But 4MB seems quite a lot for some model data for example, so one or two deferred contexts may record all or half of the commands and the immediate context needs to only replay two command lists per frame in this case.

Share this post


Link to post
Share on other sites
4 hours ago, turanszkij said:

The allocations are done by the driver and I assume they allocate constant buffers from the command list memory.

I don't know about AMD but Nvidia doesn't.  See here: https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0

Also it sort of doesn't make sense.  IIRC gpu's consume command buffers from a circular buffer.  I don't think you would break that continuity with constant buffers.(It would take longer to find a command)  I could be wrong though... maybe someone with more knowledge can chime in.

Share this post


Link to post
Share on other sites
9 hours ago, turanszkij said:

When updating buffers, you have two options, UpdateSubResource() and Map(). Updatesubresource just works on deferred contexts normally, meaning, it probably creates a copy of the buffer and uploads to GPU. When using Map(), you can only use WRITE_DISCARD or NO_OVERWRITE flags. WRITE_DISCARD is a buffer rename operation, meaning that it allocates a new copy of the buffer in CPU accessable memory and provides you the pointer that you can write to. When using NO_OVERWRITE, then you say to the driver that you just want access to the memory, and your application will explicitly ensure that there will be no race conditions for the resource, so neither the GPU, nor an other CPU thread will be competing for the same memory. 

Take constant buffers for example which are usually updated with WRITE_DISCARD. If you have a global constant buffer like PerFrameVariables, then you have to update that for each deferred context that will reference it.

I thought that if you update this using the immediate context prior to recording your commands in the command buffer you didn't have to update.  Just have to rebind to each context.

Share this post


Link to post
Share on other sites
12 hours ago, ErnieDingo said:

I thought that if you update this using the immediate context prior to recording your commands in the command buffer you didn't have to update.  Just have to rebind to each context.

You are probably right, I have only used them a long time ago, can't remember that well. :)

Share this post


Link to post
Share on other sites
20 hours ago, ErnieDingo said:

I thought that if you update this using the immediate context prior to recording your commands in the command buffer you didn't have to update.  Just have to rebind to each context.

But does the command allocate memory for buffer mappings or buffer updates?

Share this post


Link to post
Share on other sites
3 hours ago, matt77hias said:

But does the command allocate memory for buffer mappings or buffer updates?

Turans explanation is pretty complete but i have some gaps in my own knowledge.   If you are doing updates during the command recording phase then its allocating memory for each map discard.  I'm not sure though about updates outside of MAP_ DISCARD.     Hodge has more experience on this front.   

I do multithreading and command buffer recording in my dx 11 engine but nothing to the level you are after.  I've set my constant buffers up prior to command recording to reduce the amount of buffer update calls during replay.  And i try avoiding doing map discard updates anyway more than once per frame only because it becomes a bad habit and encourages done bad design decisions. 

From what i remember.  Everything you do in a deferred context incurs memory consumption.   I would give you a more definitive answer.  I'm just in the middle of Thailand on a phone 😁😁😁

 

Share this post


Link to post
Share on other sites
10 hours ago, ErnieDingo said:

I've set my constant buffers up prior to command recording to reduce the amount of buffer update calls during replay.

But if you do not update buffers more than once every frame, then I assume you have some buffer resource for every model? I currently reuse one buffer resource for all my models, so more like an update per draw.

Share this post


Link to post
Share on other sites

Maybe think about a constant buffer pool (instead of fixed per-asset buffers), where before each draw operation you pick a "fresh" buffer (of a suitable size), and memcpy your transforms and whatnot into it. You'll promise the driver that you won't be overwriting it under GPU's hands (NO_OVERWRITE), and you hold that promise by not writing into the constant buffer more often than every 2 frames since the last use (simple solution, pronounce buffers fresh again when you're sure GPU passed enough frames) or use fencing (complex solution) to be sure GPU isn't reading anymore. That way, driver won't have to make hidden copies of it. This approach is multi-platform-friendly. The pool needs to be big enough and/or you have to implement the complex solution with more precise fences.

Edit: An even simpler solution is stick to your per-asset cbuffer, but have multiple copies (2-3 frames) and cycle them each frame, also with NO_OVERWRITE.

Also, do read the article from NVIDIA linked by Infinisearch, it's a good approach.

Edited by pcmaster

Share this post


Link to post
Share on other sites
44 minutes ago, matt77hias said:

But if you do not update buffers more than once every frame, then I assume you have some buffer resource for every model? I currently reuse one buffer resource for all my models, so more like an update per draw.

Yah.  I do use instance buffers for instance specific attributes say like color and a cbuffer for any material related attributes.   I'm not too complicated at the moment. 

Share this post


Link to post
Share on other sites
2 hours ago, pcmaster said:

Also, do read the article from NVIDIA linked by Infinisearch, it's a good approach

Thanks for pointing out, I didn't receive an email/popup for every post in this topic. :)

 

I need to refactor some day. Currently I do one update/draw call involving two bindings to the shaders (vertex and pixel). So that would be ok for d3d11.0, but can be optimized for d3d11.1 by adding them together and binding a location in that larger buffer. But why is there a 256 bytes alignment instead of the normal 16 bytes alignment?

Didn't know though that UpdateSubresource is implemented as Map/memcpy/Unmap for NVidia hardware.

Edited by matt77hias

Share this post


Link to post
Share on other sites
1 hour ago, matt77hias said:

Thanks for pointing out, I didn't receive an email/popup for every post in this topic. :)

IIRC there are two more articles in that series... they were all informative IIRC.

edit - I was wrong the one with three articles were on structured buffers, not constant buffers.

Edited by Infinisearch

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now