cbuffer per object or share one cbuffer

Started by
4 comments, last by 21st Century Moose 9 years, 8 months ago

Say you have N static objects with different world transforms. What method is better:

1. N constant buffers. Update once, then just change binding per object. Pro: No Map/Discard overhead for static objects. Con: Binding overhead.

2. 1 constant buffer. Map/Discard per object. Pro: Lower memory presumably (probably not a big deal), only one bind call. Con: Lots of Map/Discard.

Googling around people seem to say both methods work out about the same performance wise. Is there any recommendations from AMD/NVIDIA?

I looked at http://gamedevs.org/uploads/efficient-buffer-management.pdf

and it says "One for per - object constants (World matrix, dynamic material properties, etc )", but it is not clear is it means exactly one cbuffer, or one cbuffer for every object.

-----Quat
Advertisement

In general, if you repeatedly map/discard the same buffer, the driver is going to be forced to silently allocate in the background. After all, there's a potentially long delay between draw submission and hardware processing. Better to do it explicitly up front, IMO. Most of the documentation now seems to recommend triple buffering for every dynamic buffer to avoid stalls -- that's a three frame latency. Anything that involves mapping any given buffer at high frequency makes me nervous.

That said, I have gotten the impression that the drivers' internal handling for cbuffers is not similar to the other, more traditional buffers. I think that the high frequency Map/Discard pattern is common enough to justify significant optimizations to that code flow inside the driver. But that basically has to involve hundreds of shadow copies internally, so once again I wouldn't do this if at all possible. The memory savings are likely to be an illusion due to the internal copies.

(One alternative possibility is that many cbuffers are a waste of memory if the entire buffer is simply copied into the command buffer. I have no evidence to suggest this actually happens, but it might be a reasonable optimization for small high frequency buffers.)

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

I use a generic buffer object (one buffer of few megabytes for all objects) to send transform matrices etc - filled once per frame. I use also a cbuffer for certain parameters (filled once per draw call - ie. multiple times per frame). I haven't observed any performance loss for this configuration with few hundred draw calls per frame. Advantages are that the generic buffer object can hold all the required transform matrices for one frame. A cbuffer might run out of space when drawing lots of skinned meshes for example.

Cheers!

It all depends. If you render 100 objects that all use the same cbuffer and have to update it each time, then you are synching the pipeline with every draw call. If you have a variety of effects that use different cbuffers (but multiple objects using each one) then that can hide the synching action from one draw call to the next.

On the other side, if you have one cbuffer for each object, then you are having to set the cbuffer each draw call. You have to profile to see if that cost is more or less than the synching effects from above.

Have a read of http://fgiesen.wordpress.com/2013/03/05/mopping-up/

At least in that case one single dynamic constant buffer that you rewrite repeatedly was significantly faster than lots of individual constant buffers that you rewrite once per frame.

However, it may be different if you don't update them all. You need to test it.

Last time I benchmarked this (and admittedly it was a few years ago) sharing a single cbuffer was substantially faster.

If you have a single cbuffer per-object you're still going to get the overhead of mapping and allocation, but in addition to this you'll also get the extra overhead of a buffer change per-object.

You could of course triple-buffer per-object, but that doesn't avoid the buffer changes and may be quite heavy on memory usage.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement