One or mulitple vertexbuffer locks? (Performance of vertexbuffer manipulation)

Started by
4 comments, last by MJP 15 years, 7 months ago
What is faster in singlethreaded situation? (Vertexbuffer is created with writeonly usage flag) Locking the whole vertexbuffer and writing lots of data sequentially. Locking the vertexbuffer for each primitve and write the data Can D3DLOCK_NOOVERWRITE enhance this someway? Is locking expensive? Is there an advantage to not lock the whole vertexbuffer? Is it possible to write (Lock is in the main D3D device thread) into the vertexbuffer from different threads without the D3D Multihread Flag. I think lock need the Flag, am I right? Thx, Vertex
Advertisement
Making a DX call, any DX call, once per primitive simply won't work. That's too frequent.

For specific concerns about performance, I would highly recommend you simply try it and profile. Performance considerations are much more complex than what call you're making, and how often you're doing it, especially when we're talking about Lock calls.

Lock calls are not inherently slower than other DX calls, but the tricky bit about them is that if the buffer is currently in use, they will wait (stall) until it is available, which might take a while.

There are various optimizations you can make, including using D3DLOCK_NOOVERWRITE, but that flag specifically isn't a "Make it go faster" flag. It has a distinct impact on what a Lock call does (or doesn't do when you use the flag) and as a result you might not get the results you expect.

As for multithreading, if you plan on calling *any* D3D call that involves an interface object ID3DSomething, from a thread that isn't the one used to create your window, you need to use the multithreaded flag. Failing to do so might cause different behaviour depending on the different Video Drivers installed, and ultimately might break things without even a hint at what the problem really is.

Remember, when it comes to performance, just try it.
Sirob Yes.» - status: Work-O-Rama.
D3DLOCK_NOOVERWRITE is best used in a situation like a particle system. In this case, you'd allocate a vertex buffer of say 1000 vertices in your particle system classes setup code. Then when you come to update your particle system, you may only have 100 particles active. You lock the vertex buffer with the D3DLOCK_NOOVERWRITE flag to say to D3D "I will not overwrite any vertices that are in use", since there's none in use at all (This is the first frame). You then fill in 100 verts, unlock and render.

Next frame, you need 500 vertices, so you lock with D3DLOCK_NOOVERWRITE from offset 100, which tells D3D you need vertices 100-599 locked, you won't touch any others, and vertices 100-599 are not being used currently. Fill, unlock, render.

Third frame, you need another 500 vertices. That'd mean locking vertices 600-1099, which is past the end of the buffer, so you lock the first 500 verts with D3DLOCK_DISCARD to say to D3D "I'm done with the existing vertices in the buffer, I want to be allowed to start filling it again". This marks all vertices as not being in use, so you can lock with nooverwrite on the next frame. Fill, unlock, render.

Fourth frame, you need 400 vertices. Because of the discard flag on the last frame, vertices 500-999 are no longer in use (Internally, D3D has given you a different vertex buffer pointer when you locked with discard). You can lock vertices 500-999 with D3DLOCK_NOOVERWRITE, fill, etc.


In this situation, you have a fixed sized vertex buffer, but you don't know how much of it you'll be using ahead of time in any given frame. That's where you get the best usage from D3DLOCK_NOOVERWRITE and D3DLOCK_DISCARD.
Thx @ both!

Quote:Original post by sirob
Making a DX call, any DX call, once per primitive simply won't work. That's too frequent.

For specific concerns about performance, I would highly recommend you simply try it and profile. Performance considerations are much more complex than what call you're making, and how often you're doing it, especially when we're talking about Lock calls.

Lock calls are not inherently slower than other DX calls, but the tricky bit about them is that if the buffer is currently in use, they will wait (stall) until it is available, which might take a while.
Who can lock the buffer beside myself? The graphics hardware/driver? When I only create and fill the VB (before rendering) and do not manipulate it during rendering, would it be better to lock it once or several times? (Could it be that the VB gets locked?)
Quote:Original post by sirob
There are various optimizations you can make, including using D3DLOCK_NOOVERWRITE, but that flag specifically isn't a "Make it go faster" flag. It has a distinct impact on what a Lock call does (or doesn't do when you use the flag) and as a result you might not get the results you expect.
Thx to Evil Steve for the detailed explanation.
Quote:Original post by sirob
As for multithreading, if you plan on calling *any* D3D call that involves an interface object ID3DSomething, from a thread that isn't the one used to create your window, you need to use the multithreaded flag. Failing to do so might cause different behaviour depending on the different Video Drivers installed, and ultimately might break things without even a hint at what the problem really is.
Wouldn't it be possible to lock the VB for a known size or the whole VB in the main (D3D device created) Thread and afterwards let some threads simultaniously write/memcpy into the vb? Although I think that bandwidth should be the bottleneck here.

Quote:Original post by Vertex333
Who can lock the buffer beside myself? The graphics hardware/driver? When I only create and fill the VB (before rendering) and do not manipulate it during rendering, would it be better to lock it once or several times? (Could it be that the VB gets locked?)
Yep. If the driver is submitting the data to the card (For managed buffers you lock withoit the WRITEONLY flag), or if the driver is rendering from that buffer (for default pool buffers), then you can't have it because it's in use. If that happens, the CPU will wait for the driver / GPU to finish with the buffer before Lock() succeeds. This all happens behind the scenes, so you'll just see Lock() taking 20ms instead of 1ms or whatever when there's a stall; the function will block and not return till it's available.

Quote:Original post by Vertex333
Wouldn't it be possible to lock the VB for a known size or the whole VB in the main (D3D device created) Thread and afterwards let some threads simultaniously write/memcpy into the vb? Although I think that bandwidth should be the bottleneck here.
It'd be preferable to write into a normal memory buffer from the thread, and then lock, memcpy and unlock the VB in the main thread. You want to keep the VB locked for as short a time as possible so the driver can use it, and the overhead of thread switching and suchlike would make that take longer than just memcpy()ing in the main thread.
Just to add...if you're doing locking and you want to profile to check for performance implications, use PIX to capture a sequence of frames. If any locks are causing stalls in the pipeline, that portion of the frame will show up as red on the graph.

This topic is closed to new replies.

Advertisement