Jump to content

View more

Image of the Day

Working on an auto spawn system. #gamedev #indiedev #screenshotsaturday https://t.co/Mm2kfekz7b
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

Sign up now

cbuffer per object or share one cbuffer

4: Adsense

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 Quat   Members   


Posted 08 August 2014 - 12:34 PM

Say you have N static objects with different world transforms.  What method is better:


1. N constant buffers.  Update once, then just change binding per object.  Pro: No Map/Discard overhead for static objects.  Con: Binding overhead.


2. 1 constant buffer.  Map/Discard per object.  Pro: Lower memory presumably (probably not a big deal), only one bind call.  Con: Lots of Map/Discard. 


Googling around people seem to say both methods work out about the same performance wise.  Is there any recommendations from AMD/NVIDIA? 


I looked at http://gamedevs.org/uploads/efficient-buffer-management.pdf


and it says "One for per - object constants (World matrix, dynamic material properties, etc )", but it is not clear is it means exactly one cbuffer, or one cbuffer for every object. 


#2 Promit   Senior Moderators   


Posted 08 August 2014 - 01:52 PM

In general, if you repeatedly map/discard the same buffer, the driver is going to be forced to silently allocate in the background. After all, there's a potentially long delay between draw submission and hardware processing. Better to do it explicitly up front, IMO. Most of the documentation now seems to recommend triple buffering for every dynamic buffer to avoid stalls -- that's a three frame latency. Anything that involves mapping any given buffer at high frequency makes me nervous.


That said, I have gotten the impression that the drivers' internal handling for cbuffers is not similar to the other, more traditional buffers. I think that the high frequency Map/Discard pattern is common enough to justify significant optimizations to that code flow inside the driver. But that basically has to involve hundreds of shadow copies internally, so once again I wouldn't do this if at all possible. The memory savings are likely to be an illusion due to the internal copies.


(One alternative possibility is that many cbuffers are a waste of memory if the entire buffer is simply copied into the command buffer. I have no evidence to suggest this actually happens, but it might be a reasonable optimization for small high frequency buffers.)

Edited by Promit, 08 August 2014 - 01:54 PM.

SlimDX | Shark Eaters for iOS | Ventspace Blog | Twitter | Proud supporter of diversity and inclusiveness in game development

#3 kauna   Members   


Posted 09 August 2014 - 12:03 AM

I use a generic buffer object (one buffer of few megabytes for all objects) to send transform matrices etc - filled once per frame.  I use also a cbuffer for certain parameters (filled once per draw call - ie. multiple times per frame). I haven't observed any performance loss for this configuration with few hundred draw calls per frame. Advantages are that the generic buffer object can hold all the required transform matrices for one frame. A cbuffer might run out of space when drawing lots of skinned meshes for example.



#4 Jason Z   Members   


Posted 09 August 2014 - 06:29 AM

It all depends.  If you render 100 objects that all use the same cbuffer and have to update it each time, then you are synching the pipeline with every draw call.  If you have a variety of effects that use different cbuffers (but multiple objects using each one) then that can hide the synching action from one draw call to the next.


On the other side, if you have one cbuffer for each object, then you are having to set the cbuffer each draw call.  You have to profile to see if that cost is more or less than the synching effects from above. 

#5 Adam_42   Members   


Posted 09 August 2014 - 09:34 AM


Have a read of http://fgiesen.wordpress.com/2013/03/05/mopping-up/


At least in that case one single dynamic constant buffer that you rewrite repeatedly was significantly faster than lots of individual constant buffers that you rewrite once per frame.


However, it may be different if you don't update them all. You need to test it.

#6 mhagain   Members   


Posted 09 August 2014 - 05:42 PM

Last time I benchmarked this (and admittedly it was a few years ago) sharing a single cbuffer was substantially faster.


If you have a single cbuffer per-object you're still going to get the overhead of mapping and allocation, but in addition to this you'll also get the extra overhead of a buffer change per-object.


You could of course triple-buffer per-object, but that doesn't avoid the buffer changes and may be quite heavy on memory usage.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.