Why no Constant Buffer offsets like for Vertex Buffers?

Started by
6 comments, last by Juliean 8 years, 9 months ago

This might sound stupid..I was reviewing my instanced sprites code, and something bothered me..

[its not about instancing, just the overall architecture/usage of cbuffers]

With a draw call I can say which piece of a vertex buffer to use. Meaning I can let it there and use it multiple times by saying offset and number for each drawable..ok

So, Instead of having a "drawable" cbuffer for each drawable/'renderable unit', or sharing one between multiple ones and remap it everytime (which is bad due syncing), why cant I just allocate a huge ass cbuffer and use it in slices just like vertex buffers? I could allocate a cbuffer for all my sprites, have each sprite hold an offset, and only map it once per frame..

So on the draw call I could have extra params for the cbuffers.

Well, I see you can have lots of different cbuffers on a shader, but its nothing a description structure couldnt solve. (actually, I think an array with offsets for each cbuffer would be enough)

Why the difference?

I hope Im making sense. This stuff is again cold in my head.

Advertisement

It was simply not available at this point in time. With DirectX11.1, you have access to this kind of functionality, via ID3D11DeviceContext1::VSSetConstantBuffers1 and equivalents. pFirstConstant does just what you described. I can't recall the requirements for DirectX11.1, but I suppose you need a pretty new GPU and Windows 8. So if you are still at DirectX10, it seems you are out of luck.

EDIT: Pardon, I linked the wrong MSDN page, should be fixed now

A little different than I expected. I was thinking in offseting from the draw call, just like for vertex buffers.

Whit the VSSetConstantBuffers1, I quite dont get it. You still need to bind it every time for each drawable, with the offset, and call draw() for each sprite.

If it where from the draw call, you wouldnt need to care about when or ordering. Just about updating all the data to the buffer and before rendering everything bind it only once.

The advantage of VSSetConstantBuffers1 over VSSetConstantBuffers would be I guess:

It doesnt move the buffer in the case you bind an already bound one with just a different offset.

Still a single buffer instead of a bunch of small ones.

But why not from the draw call?


But why not from the draw call?

Oups, I misread that part of your question.

As for why: You can see that the complexity of the parameters to be bound is quite high. You have an array with N entries of offsets for each bound cbuffer, and that for every shader stage. This would bloat the draw call quite a bit.

I also don't see what you are gaining outside of one less API call per shader stage for binding the buffers. Whenever you call Draw() you need to have the information about cbuffers around eigther way. So if you call Bind() and Draw(), or Draw with parameters doesn't change anything in my mind - IDK, but in my architecture of the renderer it would make things even worse.

The main advantage of having it the way that DirectX11.1 imposes that I can see, is being able to have one large cbuffer, that you can update once per frame with a single memcpy. This can improve performance quite a bit over having to map one cbuffer per drawable.

In terms of GPU support this will work with all feature levels; according to the documentation emulated in the runtime for 9.x levels and available in the driver for 10+ levels. I've personally used it and can vouch for it in D3D_FEATURE_LEVEL_10_0, so you don't need a recent GPU for it at all.

Unfortunately your objective of just binding once and reading from a user-specified offset isn't too easily obtainable. You could in theory do it with a DrawInstanced call (remembering that 1 is a valid value for InstanceCount) but SV_InstanceID is 0-based irrespective of the value of StartInstanceLocation.

Personally I'd classify worrying about this as "sweating the small stuff". Unless you're doing something extremely strange, binding cbuffers is extremely unlikely to be a bottleneck.

As for why it's not possible directly from the draw call, the API itself can tell you this. cbuffers are shader-stage state, whereas the draw call operates on input assembler state.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Did you originally come from a OpenGL background Juliean? I think the hardware abstraction layer optimizes the draw calls. You may have a bottleneck in ur render loop. Dont do any updates in the render but on a separate thread.

So on the draw call I could have extra params for the cbuffers.
Well, I see you can have lots of different cbuffers on a shader, but its nothing a description structure couldnt solve. (actually, I think an array with offsets for each cbuffer would be enough)

There are two issues with that:

1. cbuffers often live in the L1 cache or in special "register file" (and in older hardware, in actual physical read-only registers). Putting an offset means they can change per draw. Changing a cbuffer per draw can result in a flush. And flushes result in minor GPU pipeline stalls, that is tiny if done once, but can accumulate quickly with many draws.

2. There can be many cbuffers. Many of them completely static for the entire frame (e.g. view and projection matrices, fog parameters). So, the API would either make all of these cbuffers dynamic (which goes against the very one purpose they're optimized for) or specify which cbuffers will be offsetted, which implies parsing a structure and validating it (e.g. what if you offset cbuffer4 and there is no cbuffer4?) PER DRAW. This would be expensive in CPU terms.

What you want to achieve can be easily be done via baseInstance parameter and indexing to a huge const or tex buffer array in the shader. Note that baseParamter = 10 means SV_InstanceID is still 0 based, but attaching an extra instanced vertex buffer filled from 0 to 4096 (assuming the limit is 4096 per draw) will workaround the issue.


Did you originally come from a OpenGL background Juliean? I think the hardware abstraction layer optimizes the draw calls. You may have a bottleneck in ur render loop. Dont do any updates in the render but on a separate thread.

No, I started out with DirectX, and only recently adopted an OpenGL4 renderer. I gotta say I can't make much sense of what you are implying with your post right now, so if you could elaborate on that in more detail I'd be very grateful. What I was implying with in my post is that

a) in order to achieve what the OP wants on Draw-Call level, each draw call has to have eigther 1) 5 additional parameters merely for cbuffer offsets for each shader stage or 2) a double array for each shader stage and all bound buffers. This is not so much even a problem on performance end, but more for the API bloat it imposes.

It is a strict fact that you get more performance if you map one 128 MB Cbuffer and copy data to it in one pass than you would if you mapped an amount of say 1024 buffers of the same size seperately though. Thats why what I was saying is that i definately makes sense in what DirectX11.1 proposes, but having offsets being passes in into the drawcall doesn't gain much, and can make matters worse on an architectural level.

This topic is closed to new replies.

Advertisement