• Advertisement
Sign in to follow this  

[D3D12] Updating Constant Buffer Data

This topic is 989 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

 

I'm trying to figure out the best way to manage constant buffers in a Direct3D 12 pipeline.
 
The provided examples show one constant buffer being updated per frame (Hello Samples), or two constant buffers both being updated at the same time in the frame (Multi-threading Sample).  The examples are very small, however, and I'm wondering how well this scales and applies to larger scenes.
 
Basically, what's the new usage pattern to replace Direct3D 11's
  set constant buffer for thing 1
  draw thing 1
  set constant buffer for thing 2
  draw thing 2
  ...
  set constant buffer for thing n
  draw thing n
 
From here https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0 it seems that mapping a very large buffer at the start of the frame is the currently suggested way of doing this.  Is this practical for scenes with large amounts of objects?  For example, take a crowd of people scattered about a scene.  Each person needs at least a world matrix (let's say it's pre-multiplied with view and projection, so a 4x4 matrix) and a set of skinning data (say an upper limit of 96 bones packed into 4x3 matrices).  If (for argument's sake) you also allow non-uniform scaling, each instance also needs a world inverse transpose matrix to correctly apply normal mapping.  Even if you were to postpone the pre-multiplication of the world, view, and projection matrices to the vertex shader, you're still only saving 32 bytes per buffer.  It just seems like a whole lot of storage, especially once you go further down the pipeline and realize you need some of this data again in shadow map passes per casting light.  Even with culling helping, I imagine you have to at least set aside the memory for the worst-case scenario.
 
I guess more clearly stated, is this the practical way that we should be updating constant buffers at this point?  Am I just underestimating current hardware?

Edit:  I suppose another question would be, should I just go ahead and use root constant buffers for data that updates this frequently?  I would still have a few infrequently updated constant buffers in a descriptor heap for per frame data and similar uses, but could use 1 or 2 root constant buffers for the model renderer's root signature and still have 8 DWORDs left for anything else I need.  https://msdn.microsoft.com/en-us/library/windows/desktop/dn899209(v=vs.85).aspx
 
Thanks,
WFP
Edited by WFP

Share this post


Link to post
Share on other sites
Advertisement

Basically, what's the new usage pattern to replace Direct3D 11's
  set constant buffer for thing 1
  draw thing 1
  set constant buffer for thing 2
  draw thing 2
  ...
  set constant buffer for thing n
  draw thing n

D3D11 can use the other pattern fine and is generally faster there too, just saying (though there are a few annoying issues because you can't bind by offset nor map with no_overwrite, and constant buffers couldn't be bigger than 64kb, but you can workaround those issues).
 

Is this practical for scenes with large amounts of objects?

Yes.
 

Each person needs at least a world matrix (let's say it's pre-multiplied with view and projection, so a 4x4 matrix) and a set of skinning data (say an upper limit of 96 bones packed into 4x3 matrices).

Best case scenario you need 96 4x3 matrices since you can concatenate the world matrix to the bone matrices before sending them; while sending the viewProj matrix in another constant buffer. Note that if a character needs 40 matrices, you write those 40 and skip 56. It doesn't help the GPU's cache, but it does help with bandwidth.
But still, yes, you're left with 4.608 bytes per object, which at 64kb per constant buffer limit, you can only pack 14 objects per constant buffer.
In such case, just use a texture buffer instead of a constant buffer which doesn't have the 64kb limit. There aren't many differences anyway in modern hardware, and in older hardware skinning would still be the preferred choice for storing skinning matrices anyway.

 

It just seems like a whole lot of storage

Yes. But you're thinking about skinned crowd rendering which is one of the things games FAKE. A LOT OF FAKE.
Use high quality normal mapped impostors, upload a couple animation matrices and shared them randomly with characters so they don't notice you're actually playing 32-64 different animations and repeating them. Use stick figures very far away. Play a video about crowds on the background. Use sound to fool people there's more people there actually is. Be creative.
Most objects rendered in a game don't involve skinning, except crowds. Which is where we fake.
 

especially once you go further down the pipeline and realize you need some of this data again in shadow map passes per casting light.

Whoa wait a second. Yes, you can do that. And it works well for non-skinned objects. But if you've uploaded the matrices in world space in one buffer, and the view/proj matrices in another buffer (and concatenate in the vertex shader); you don't need to reupload the data from the first buffer! You just need to reupload the data for the second buffer (which is 64-128 bytes per pass depending on whether you send the view and viewProj matrices, or just the viewProj)
 
Finally: What do you think was happening before? You were doing exactly the same, but with more calls to map/unmap and XSSetConstantBuffers. And whatever extra you have to do now, was being done behind the scenes. Now you just get to see the cost for yourself.

Share this post


Link to post
Share on other sites

Thanks for the detailed response, Matias.  I think I've got a much better understanding of what needs to be done now.  It seems like the biggest thing will be to stop pre-multiplying the world transforms with the view and projection matrices so I can just zip through all my data at once, pushing the new values to the constant buffer heap, then fire off their respective draw calls with the correct offsets into the descriptor heap.

 

Yeah, the comment on the second trip through the data for the shadow map pass was poorly worded on my part, but you seemed to do fine getting an explanation out around it anyway :).

 

I've definitely been a little spoiled by the abstraction D3D11 gives you, and from the documentation (https://msdn.microsoft.com/en-us/library/windows/desktop/dn899223(v=vs.85).aspx) it seems that root descriptors have a similar effect to the old map/unmap pattern (just without the actual map/unmap calls).  Regardless, I'm definitely glad to be getting more familiar with how the hardware actually works.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement