Vertex and Index Buffer Locking

Started by
9 comments, last by 21st Century Moose 10 years, 5 months ago

I was cleaning up my dynamic vertex and index buffer code, when I started wondering something

Am I locking these buffers right? This is a little hard to explain for me so please let me know what does not make sense

currently I just lock the entire buffer like this:


vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);

But since these are dynamic buffers wouldnt I want to be locking like this?


vBuffer->Lock(vertexDataAmountUsed, vertexBufferSize - vertexDataAmountUsed, (void**) &vertices, bufferLockFlag);
iBuffer->Lock(indexDataAmountUsed, indexBufferSize - indexDataAmountUsed, (void**) &indices, bufferLockFlag);

For example lets say we are:

- Drawing quads

- Each quad has 4 vertices and 6 indices, since we are using a index buffer

- That each quad has a vertex data size of 10

- That each quad has a index data size of 2

- That our vertex buffer can hold 3 quads before it is "full"

- Meaning that our buffers sizes are:

vertexBufferSize: 'Vertex Data size per quad' * 'number of quads the buffer can hold' = 10 * 3 = 30

indexBufferSize: 'Index Data size per quad' * 'number of quads the buffer can hold' = 2 * 3 = 6

- That we are starting with fresh empty buffers

vertexDataAmountUsed = 0;
indexDataAmountUsed = 0;

We lock the buffers like so


vBuffer->Lock(vertexDataAmountUsed, vertexBufferSize - vertexDataAmountUsed, (void**) &vertices, bufferLockFlag);
iBuffer->Lock(indexDataAmountUsed, indexBufferSize - indexDataAmountUsed, (void**) &indices, bufferLockFlag);

Meaning that we are locking the entire buffer for both the index and vertex buffers because:

- vertexDataAmountUsed and indexDataAmountUsed is 0. This tells the lock call to use a 0 offset for the lock (First param in the lock calls)

- We are locking the entire buffers worth for both buffers, based on the amount of data to lock (Second param in the lock calls)

vertexBufferSize - vertexDataAmountUsed = 30 - 0 = 30

indexBufferSize - indexDataAmountUsed = 6 - 0 = 6

So down the line, lets say we do not need a fresh buffer. That we still have room left.

Lets say there are 2 quads worth of data in for each buffer meaning:

vertexDataAmountUsed = 'Vertex data per quad' * 2 = 10 * 2 = 20

indexDataAmountUsed = 'Index data per quad' * 2 = 2 * 2 = 4

So this means we can hold one more quad before we have to use the DISCARD flag to get a fresh buffer

Meaning that when we lock again using


vBuffer->Lock(vertexDataAmountUsed, vertexBufferSize - vertexDataAmountUsed, (void**) &vertices, bufferLockFlag);
iBuffer->Lock(indexDataAmountUsed, indexBufferSize - indexDataAmountUsed, (void**) &indices, bufferLockFlag);

We are locking like so:

- We are starting the lock at the 20 data amount offset mark for the vertex buffer (First param in the vertex buffer lock)

- We are starting the lock at 4 data amount offset mark for the index buffer (First param in the index buffer lock)

- For the vertex buffer we are only locking what we have left available (Second param in the vertex buffer lock). In this case:

vertexBufferSize - vertexDataAmountUsed = 30 - 20 = 10

- For the index buffer we are only locking what we have left available (Second param in the vertex buffer lock). In this case:

indexBufferSize - indexDataAmountUsed = 6 - 4 = 2

Meaning that only the remaining data quad spot left in the buffers was locked

Now, what am I really asking?

Well, I want to know if what I just described above is correct? Is that how I should be locking dynamic vertex and index buffers?

That I can say vertexBufferSize - vertexDataAmountUsed to lock my whole buffer assuming it matches my max vertex buffer size?

I know if you use 0, 0 in the first and second param it locks the whole thing, but can this be used as an alternative?

Or should I just stick with locking the entire thing?

Advertisement

From API point-of-view, you are correct. But - the OffsetToLock and SizeToLock parameters are hints to the driver, and there's no guranttee that it will actually lock only those parts of the buffer.

From performance POV, locking the entire buffer with the DISCARD flag is better. The driver will most likely just allocate a new buffer, then it doesn't have to merge the new and old parts of the buffer. It does have the drawback of using more memory, so take care when mapping very large buffers.

The real question is why are using dynamic VB/IB? Drivers don't really like that...

From performance POV, locking the entire buffer with the DISCARD flag is better. The driver will most likely just allocate a new buffer, then it doesn't have to merge the new and old parts of the buffer. It does have the drawback of using more memory, so take care when mapping very large buffers.

The real question is why are using dynamic VB/IB? Drivers don't really like that...


I'm using dynamic buffers because I'm creating a spritebatcher. And since the data in them is almost always changing a dynamic buffer should be the way to go

Your statement actually makes me really confused because it's a performance optimization according to this article by Microsoft

http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263%28v=vs.85%29.aspx#Using_Dynamic_Vertex_and_Index_Buffers

Actually, this article is stating exacly what I said. Ordered by better performance:

- Don't use dynamic VB if you don't need to. This is the most common case.

- Use MAP_DISCARD. Like I said, what really happens is that the driver will allocate a new buffer, so you don't interfere the current draw, and don't need to merge the old copy of the buffer with the new one.

- Use MAP_NOOVERWRITE. This is useful if you have large buffer but only need to change a small portion of it. This has some overhead, and in some cases can cause the GPU to stall, but in most cases there are no performance implications.

I used to do driver optimzations, we hated when games mapped VB/IB, we disabled some optimizations for dynamic VBs. I only saw one game that used dynamic IB, and a mere few that used dyamic VB.

If you are using DX10+, consider using GS for billboarding, especially if you have fixed number of sprites. DX10 SDK has a sample called ParticleGS - it implements a particle system using GS, SO and DrawAuto().

<snip>

Sorry, but this doesn't make much sense. What would you recommend for, say, text rendering, where the text being rendered may change every frame? Or a dynamic particle system where the number of particles being drawn may change every frame?

For sure static buffers are preferable where possible, and keeping as much of your geometry as possible static is the right thing to do, but there are scenarios where no approach other than dynamic offers itself as a reasonable solution. Similarly in D3D10+ you simply must use dynamic buffers (or default with UpdateSubresource) for some scenarios. Issuing what looks like a blanket ban on dynamic buffers "just because" seems to me to be denying the existence of those scenarios.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

what i do is create a number of dynamic VBO equal to the swap frame count. so if im double buffered i create 2. frame A i write to 0 and frame b i write to 1 rinse wash repeat. hopefully the driver wont stall as much since i'm not reusing the same VBO info on the next draw.


What would you recommend for, say, text rendering, where the text being rendered may change every frame? Or a dynamic particle system where the number of particles being drawn may change every frame?

For dynamic particles, use GS/SO/DrawAuto().

For text, instacing will do (even if you don't want to use GS).


Issuing what looks like a blanket ban on dynamic buffers "just because" seems to me to be denying the existence of those scenarios

I never said don't used them, I said they are more costly then static buffers, so use them with caution, and know that you have alternatives.

For dynamic particles, use GS/SO/DrawAuto().

From the code in the first post, I would guess that noodleBowl is using DX9, so that won't help him too much.

For text, instacing will do (even if you don't want to use GS).

You don't need dynamic buffers when you do instancing? I mean the per-instance buffer. And as each letter requres just 4 vertices, I don't think instancing would help here. You would be filling the per-instance buffer with as much data as you would the main buffer, wouldn't you?


What would you recommend for, say, text rendering, where the text being rendered may change every frame? Or a dynamic particle system where the number of particles being drawn may change every frame?

For dynamic particles, use GS/SO/DrawAuto().

For text, instacing will do (even if you don't want to use GS).

And how does any of that handle the fact that the text and/or the number of particles may change each frame? Not forgetting the other points raised above?

The discard/no-overwrite pattern is well-known and has been advised for as long as dynamic buffers have existed in D3D. Microsoft recommend it, the major GPU vendors recommend it and have even published papers discussing it; this is the one pattern where so much has been written hinting "use this, it's the fast path", that claims against it which only emerge now must be viewed with suspicion.

If we were talking OpenGL and glMapBuffer (not glMapBufferRange) then yes, warnings against it are appropriate, but the D3D buffer locking mechanism has never had those problems when used properly.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


You would be filling the per-instance buffer with as much data as you would the main buffer, wouldn't you?

No, even in DX9 you can save 50% of the bandwidth.


claims against it which only emerge now must be viewed with suspicion

GPUs evolve. APIs evolve. And so techniques evolve.

And none of this dicussion makes any sense, because I never said not to use dynamic buffers...

This topic is closed to new replies.

Advertisement