Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


[DX11] Fastest way to update a constant buffer per draw call


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 360GAMZ   Members   -  Reputation: 133

Like
0Likes
Like

Posted 05 December 2011 - 07:30 PM

Let's say I have 500 draw calls per frame and all 500 draw calls use the same shader and that shader uses one constant buffer. Let's also assume that the data in the constant buffer needs to be built dynamically for each draw call. What would be the most desirable way to update the constant buffers, in terms of efficiency?

A) Create a single constant buffer and call Map/Unmap on that same constant buffer before each draw call.

B) Create 500 constant buffers, one for each draw call, and call Map/Unmap on the draw call's own constant buffer.

C) Or, another idea?

I know that for (A) the driver will rename the buffer each time I Map it, discarding the previous contents which is fine. But is it ok to expect that the driver can handle hundreds or even thousands of renames per frame? And I assume the rename process consumes some time, too.

On the other hand, (B) avoids the renaming and any associated overhead at the expense of possibly more video memory being consumed (500 constant buffers, even if fewer draw calls are actually used) and more code complexity.

Sponsor:

#2 Hodgman   Moderators   -  Reputation: 31938

Like
2Likes
Like

Posted 05 December 2011 - 09:08 PM

On older GPUs, there's no such thing as a cbuffer; there's just one global set of shader registers. On these cards, when you ask to set a cbuffer, it copies the register-id/value pairs out of the cbuffer and into the command-buffer. The GPU consumes the command-buffer in order, reading out the register values before reading the draw-call.
For these kinds of GPUs, I'd theorise that option (A) would be the most efficient, as there really is no cbuffer management going on behind the scenes.

On newer GPUs, it's possible for cbuffers to be stored in VRAM, and then moved into registers when required. On these cards, when you put data into a cbuffer, it can actually perform a VRAM transfer (and possibly issue a cache-invalidation command to the command-buffer). When you bind a cbuffer, you're writing a command into the command buffer that instructs the GPU to fetch some register values from VRAM.
On these cards, using option (B) would allow you to perform all of the VRAM transfers well in advance of any draw-calls that use that data, which reduces the amount of data flowing through the command-buffer. However, as you're still moving the same amount of data to the GPU every frame anyway (as you're regenerating the cbuffers each frame), there isn't really a bandwidth saving here... though it still might be more efficient...
You'd probably have to test it (on multiple GPUs) to find out Posted Image


On really old GPUs, there's no such thing as cbuffers AND there's no such thing as shader registers! On these cards, when you set a cbuffer, the driver actually takes the compiled shader code and inserts new instructions into it that contain your shader values (now as hard-coded numbers, not variables). On this class of GPUs, no matter what you do, setting shader variables is going to be bad for performance, as every change-of-variables actually produces a whole new shader program ;)

#3 360GAMZ   Members   -  Reputation: 133

Like
0Likes
Like

Posted 05 December 2011 - 11:41 PM

This is a DX11 compliant card. An NVIDIA GeForce GTX 460, for example. The cbuffers are indeed in VRAM on this type of graphics card. I suppose my question boils down to, is it ok to assume that the driver for this class of modern graphics card can handle hundreds or even thousands of buffer renames each frame without breaking a sweat? Or is the buffer renaming mechanism really there only to handle a few rare cases of multiple Map/Unmaps to the same buffer?





#4 phantom   Moderators   -  Reputation: 7592

Like
0Likes
Like

Posted 06 December 2011 - 07:45 AM

Is there any reason you can't generate the data up front, before issuing draw calls, then build one large cbuffer and index in the shader based on an instance ID? Maybe split this up over a few buffers depending on cbuffer size so you aren't updating a massive chunk of data in one go.

So; [generate all data] -> [bind] -> [draw objects as required with indexing]

Generating data at render time seems like Bad Voodoo to me anyway; render time should just be rendering, sort your data out before hand.

#5 Tordin   Members   -  Reputation: 604

Like
1Likes
Like

Posted 06 December 2011 - 10:33 AM

Let's say I have 500 draw calls per frame and all 500 draw calls use the same shader and that shader uses one constant buffer. Let's also assume that the data in the constant buffer needs to be built dynamically for each draw call. What would be the most desirable way to update the constant buffers, in terms of efficiency?

A) Create a single constant buffer and call Map/Unmap on that same constant buffer before each draw call.

B) Create 500 constant buffers, one for each draw call, and call Map/Unmap on the draw call's own constant buffer.

C) Or, another idea?

I know that for (A) the driver will rename the buffer each time I Map it, discarding the previous contents which is fine. But is it ok to expect that the driver can handle hundreds or even thousands of renames per frame? And I assume the rename process consumes some time, too.

On the other hand, (B) avoids the renaming and any associated overhead at the expense of possibly more video memory being consumed (500 constant buffers, even if fewer draw calls are actually used) and more code complexity.



To A) i belive that you shuld use UpdateResource instead.
think i read it in the sdk that states that it´s faster for constant buffers.

map/unmap is for vertexbuffers and textures i think.
NOTE, not 100% sure.


"There will be major features. none to be thought of yet"

#6 360GAMZ   Members   -  Reputation: 133

Like
0Likes
Like

Posted 06 December 2011 - 01:22 PM

1323179144[/url]' post='4891086']
Is there any reason you can't generate the data up front, before issuing draw calls, then build one large cbuffer and index in the shader based on an instance ID? Maybe split this up over a few buffers depending on cbuffer size so you aren't updating a massive chunk of data in one go.

So; [generate all data] -> [bind] -> [draw objects as required with indexing]

Generating data at render time seems like Bad Voodoo to me anyway; render time should just be rendering, sort your data out before hand.


I'm trying to implement what DICE has done for Battlefield 3, in terms of using buffers to store per-instance matrices to reduce draw calls. The constant buffer will hold data such as the number of matrices in the bone matrix palette, and that number (as well as additional data being stored in it) can be different for each type of object and so needs to be updated for each draw call. Here's a link to the DICE presentation. The instancing section is the first section in the Performance section, about half way through the doc.
http://publications.dice.se/attachments/GDC11_DX11inBF3_Public.pdf


#7 phantom   Moderators   -  Reputation: 7592

Like
1Likes
Like

Posted 06 December 2011 - 04:29 PM

Right, I see... well, what I said above still stands for most of your data, if you look at slides 30/31 you'll see they have a very small cbuffer for the per-draw call data so you might want to consider how much you place in it.

I suspect if you are moving around small enough buffers either option would be fine; we had a pure CPU limited rendering test at work which was drawing 50,000 cubes and, for each draw call, was doing a map/unmap for a cbuffer on mulitple contexts (6 iirc, there might have been one per context but don't quote me on that, its been a while since I played with that bit of the code). With that test we were good up until around 15,000 draw calls before the driver started to get into trouble internally with memory issues.

Do whatever makes organisation sense I guess...

#8 360GAMZ   Members   -  Reputation: 133

Like
0Likes
Like

Posted 06 December 2011 - 06:25 PM

My cbuffer consists of eight 32-bit integers, so only 2 vector registers. Pretty darn small. We won't have anywhere near 15,000 draw calls. Probably under 1,000, but we need to maintain 60 FPS at all times. Also, since DICE is using this method, the hardware vendors may target it for optimization in their drivers. Though, there's no telling whether DICE is using a single cbuffer and relying on renaming by the driver, or using a bunch of cbuffers. Or, using UpdateSubresource instead of Map/Unmap, as Tordin mentioned earlier.





#9 phantom   Moderators   -  Reputation: 7592

Like
0Likes
Like

Posted 06 December 2011 - 06:49 PM

I've not seen anything which says prefered UpdateSubresource over Map/Unmap; a quick look at the SDK docs would suggest that best case the UpdateSubresource function will put it straight into "destination memory", worst case it creates an extra buffer, copies there first and then later copied again into destination memory when the command buffer is flushed. A discard-map would likely do much the same but probably quicker as it doesn't have to worry about checking for resource contention, it can just throw away the reference and grab a new chunk/reuse a chunk of memory.

In short I'd probably go for a discard-map + a cbuffer per object type but make it easy to go with multiples if it proves to be a bottleneck.

#10 360GAMZ   Members   -  Reputation: 133

Like
0Likes
Like

Posted 06 December 2011 - 06:55 PM

Sounds like a plan. Thanks again for all of your help!




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS