Jump to content
  • Advertisement
Sign in to follow this  
cippyboy

DX11 Single vs Multiple Constant Buffers

This topic is 1931 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So basically after upgrading from DX9 to DX10 I read a lot of docs from Microsoft about how it's better to organize constant buffers by update frequency, so I made 3 types of constant buffers:

 

PerFrame (view & projection matrix)

PerMaterial (MaterialColor, specular, shinyness,etc)

PerObject ( world matrix )

 

I didn't really thought about performance considerations though but one day it stroke me, how about I just make 1 buffer to encapsulate all data ? So after I did this I noticed that performance actually increased by ~3-5%, even though I was updating an entire sightly bigger buffer. I thought that maybe drivers at the time (I was having a HD5770, a first gen DX11 device) are not that optimized for multiple constant buffers and reverted back to multiple buffers.

 

I now have a HD7850 and after doing this little test again, I'm seeing a performance boost of up to +50% for ~100 drawcalls when having a single huge constant buffer per object. So in effect, the difference is not smaller, it's bigger, signalling that there's something inherently wrong with having too many constant buffers binded. I'm now assuming this may be because my buffers are fairly small. The huge constant buffers is around 460 bytes ( I only have 4 matrices, one light and a few other variables ), so perhaps the multiple buffer switches are more advantageous when you are doing something like fetching an entire vertex buffer (for real-time ambient aocclusion based on vertices) or when you work with skinned meshes of 100 bones each.

 

My question is if you have tried to render a scene with multiple buffers and with a single huge buffer and compared performance ?

Share this post


Link to post
Share on other sites
Advertisement

I suspect that with the size of your constant buffers, that the cost to bind is actually greater than the cost to pass a small amount of data to the shader. How many objects are you rendering? how many objects per material? the frequency of updates on these constant buffers would have implications in regards to your performance. Even an object with 1 bone will contain 2x as many bytes as your current per object buffer. Try to increase the size of that per object buffer and see what happens to your performance data.

Share this post


Link to post
Share on other sites

There are around 107 objects and for the most part it's just 1-3 objects per material, so I have around 107 materials too. However, with the 3 constant buffers I was only updating the perframe buffer once per frame and then each material had it's own permaterial buffer and each object it's own perobject buffer that didn't change ( I don't animate any objects or material properties currently), so only the view/projection matrices changed.

 

Also, why do you say 1 bone would be 2x my current per object buffer ? 1 bone would be just a float4x4 so that's like 64 bytes. I'm planning to do some skinning in the near future and I have around 30 bones, so I'm curious how that will go.

Share this post


Link to post
Share on other sites

Do you know if the performance difference is on the CPU-side, GPU-side, or both?

How many ms per frame is your game using in both scenarios?

How are you creating/updating the buffers?

How are you applying these state changes?

Share this post


Link to post
Share on other sites

How can I tell if it's CPU or GPU since both techniques result in a 99% GPU usage ? (according to Catalyst control center)

3 CB result in ~400 FPS (or 2.5ms per frame), and 1 CB results in ~660 FPS ( or ~1.51 ms per frame). The CPU is not a bottleneck, even GPU Perf Client says this, CPU is doing ~0.27ms work per frame.

 

Creating them with usage default, and using UpdateSubResource. I was previously using Map/Unmap with write discard but there's just no performance difference between the two.

How am I applying state changes ? Through PSSetConstantBuffers :). I do admit I call PSSetConstantBuffers 3 times for 3 constant buffers instead of calling it once with all the 3 buffers, but I kind of doubt my speed penalty of ~50% is due to state changes.

Share this post


Link to post
Share on other sites

PSSetConstantBuffers 3 times per object or 3 times per frame?

The only difference in your D3D code between 1 and 3 constant buffers should be how many bytes you send to the constant buffer for each object, except for once-per-frame setup. If you draw 1000 objects with 1 constant buffer you do 1 Map and 1 Draw per object. If you draw 1000 objects with 2 constant buffers, you still do 1 Map and 1 Draw per object, it's just that you Map a constant buffer with fewer bytes since the camera matrices are in a once-per-frame setup constant buffer that doesn't change between objects, and so doesn't need to be touched. You should never set the constant buffers with *SetConstantBuffers more than once per frame. Even if you change shaders the constant buffers remain set and do not need a new call to *SetConstantBuffers.

Share this post


Link to post
Share on other sites

"each object it's own perobject buffer that didn't change"

 

This is definitely the cause of your observed performance differences.  With - say - 100 objects, you're making 100 SetConstantBiuffers calls, and they're more expensive than discarding and refilling a single buffer.

 

In the past, I've observed that the best performance comes from:

 

 - One per-frame buffer.

 - One per-material buffer, irrespective of how many materials you have.

 - One per-object buffer, irrespective of how many objects you have.

 

When changing materials you just Map with Discard, then write in the new material properties.  When drawing a new object you also Map with Discard and write in the new object properties.  This is substantially cheaper than having to switch buffers each time, and also lends itself well to piggybacking instancing on top of the same code when time comes to do that.

Share this post


Link to post
Share on other sites

This is definitely the cause of your observed performance differences.  With - say - 100 objects, you're making 100 SetConstantBiuffers calls, and they're more expensive than discarding and refilling a single buffer.

 

 

interesting, benchmarking my engine I found calling SetConstantBuffers with pre filled (ie. one per scene material) buffers faster than having one buffer mapped with Map for every material change.

From what I understand, a constant buffer exists in video memory, so calling SetConstantBuffer is just updating "a pointer to data" on the GPU while Map is actually moving data from the cpu.

 

At the end of the day, always benchmark (on different GPUs) before committing to a strategy.

Edited by kunos

Share this post


Link to post
Share on other sites

"each object it's own perobject buffer that didn't change"

 

This is definitely the cause of your observed performance differences.  With - say - 100 objects, you're making 100 SetConstantBiuffers calls, and they're more expensive than discarding and refilling a single buffer.

 

In the past, I've observed that the best performance comes from:

 

 - One per-frame buffer.

 - One per-material buffer, irrespective of how many materials you have.

 - One per-object buffer, irrespective of how many objects you have.

 

When changing materials you just Map with Discard, then write in the new material properties.  When drawing a new object you also Map with Discard and write in the new object properties.  This is substantially cheaper than having to switch buffers each time, and also lends itself well to piggybacking instancing on top of the same code when time comes to do that.

 

I tried that once, all materials having their own PerMaterial buffer versus having a single material buffer and constantly updating that but the performance difference was almost 0.

 

So basically one buffer update (the per frame buffer) and 300 buffer sets ( even though I set the same perframe buffer which should be a no-op ), is slower than 100 buffer updates and 100 buffer sets.

 

Have you tried just making one single buffer versus the 3 you're mapping/discarding on a per object basis ? That was my question all along, if someone tried one buffer versus multiple, I'm really curious about your results.

Share this post


Link to post
Share on other sites

Tried it right now, and got zero difference between them, even with abnormally large buffers, though in certain cases with one very large buffer and one very small and already a high usage of bandwidth to video memory for other things there can certainly be a difference...

 

Consider a case where a complex shader indexes into a constant buffer of a couple of thousand vectors that doesn't change between objects, and there's 1000 objects where the only change is the translation matrix, then uploading 64 bytes per object is much better than 32,000 bytes, especially if each frame there is also a couple of different transfers of reasonably large dynamic textures going on.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!