vertex buffer locking

Started by
4 comments, last by ET3D 17 years, 11 months ago
Is there any problem with locking a vertex buffer for an extended period of time (say, less than a frame)? I need to throw a dynamic vertex buffer into my rendering class. I have three options for adding vertices: 1) Add vertices to a vector, then just before rendering, copy them to the vertex buffer. (the simple choice) 2) Lock the vertex buffer each time I need to add a vertex. (even if I just lock small portions of the VB, this seems like the obvious bad choice). 3) Lock the vertex buffer at the start of adding vertices, unlock it just before rendering. It seems like option (3) would be the fastest unless there are some sort of caching or vid. card problems. Suggestions?
-------Harmotion - Free 1v1 top-down shooter!Double Jump StudiosBlog
Advertisement
I also think option 3 is preferable since you lock the vb only once (per frame), dynamicaly fill it with vertices and unlcok it once before rendering.
Copying a vector to the vertex buffer (PER FRAME !), especially a high number of vertices will be slow and reduce your frame rate, and it still requires locking and unlocking the vb. (think, for copying data to a vb you need access to it, which is done by locking it)
Option number 2 is out of the question if you ask me ;)

There is nothing that can't be solved. Just people that can't solve it. :)
I'd actually hazard a guess that #1 is the better option.

Holding a lock for an extended period of time (okay, a frame is 5-20ms, but relatively its a long time) means that the GPU cant use that resource (or anything indirectly related) for the duration that the lock is held. Best case is it just stalls that particular resource, worst case is that it somehow creates a dependency that stalls other parts.

Given that GPU's tend to run ahead and render ahead of what is currently being displayed you could possibly lock the buffer for a future frame and consequently stall it from completing a previous frame that is already deep in the pipeline.

Bottom line is that the two strategies should be fairly trivial to implement in code, so go do some stress-testing and benchmarking [grin]

I would imagine the trade-off is going to be whether the extended lock stalls performance versus whether throwing up a big chunk of data in a single copy operation is slow due to bandwidth overheads...

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Well, option (3) isn't going to work now that I started implementing it. The whole reason I wanted to do it this way is that it is indirect. Objects simply add themselves to the render class. The render class does not draw them until they have been sorted and optimized. Thus, for option (3), once the dynamic VB is full, it must be rendered before more vertices can be added.

So I decided on option #1, and to keep down on bus speed, I'm going to throw in a dynamic index buffer as well to limit the number of vertices sent.

I hope the memcpy isn't too slow from the vector to the vbuffer and ibuffer. I don't think it will be a bottle neck. And with this method I can throw the vertex generation in a seperate thread.

Thanks for the suggestions.
-------Harmotion - Free 1v1 top-down shooter!Double Jump StudiosBlog
Your best bet would be to try and generalise things as much as possible, and try out all 3 options and profile.
I'd say that 1 and 3 are going to be very close in performance, with 1 maybe taking the edge since apparently locking VBs for a long time is A Bad Thing.
2 is going to be slow as hell. The last time I tried that, D3D had a hissy fit at me, and started spewing debug messages saying something like "Vertex buffer locked more than once per frame: severe performance warning". Basically, don't do it - it'll confuse the hell out of D3D / the driver, and probably cause it to try and upload resources several times in one frame.
In my experience copying an array into a vertex buffer is significantly faster than writing small chunks of memory to it. (Even when the lock is short.)

My guess for the reason is that locking can give you an AGP memory chunk to write to, and that uses writethrough instead of writeback for normal memory.

This topic is closed to new replies.

Advertisement