Vertexbuffer - huge lag.

Started by
24 comments, last by gchewood 9 years, 9 months ago

I'm using XNA but I assume the problem is analogous in DX9?

So I'm having a huge problem rendering a model that's created dynamically at runtime. It renders fine but it's creating a huge lag.

Here's the issue illustrated through a comparison:

1) Model with 10,000 vertices created in 3ds max and rendered with shader X in XNA ---> 200fps (after everything else has happened in the game)

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

This is a completely unacceptable drop and I assume I'm doing something wrong. Something like minecraft would be impossible to run if this was a necessary drop. I can post my code if necessary but I'm just using the same approach used by the 3D particles sample. I've profiled the projected with NProf and all of the time is being spent in GraphicsDevice.Present()

Help?

Advertisement

Are you setting things up to get debug warnings? When I use a debug device in DX11 (I don't remember how that works in DX9) and I do something not completely legit, my framerate plummets from all the reporting of warnings every frame.

No, I'm not setting up any debug warnings. Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense. It must be more of a conceptual problem that I'm fundamentally missing.

And I've also tried using the 'NoOverwrite' option as described in Shawn Hargreaves' blog. No change.

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps


Are you recreating/reuploading the model each frame? Uploading a model takes time, so uploading once will necessarily be faster than uploading it repeatedly.

Did you create the vertex buffer with similar properties for both models? The properties of a vertex buffer (e.g. dynamic vs non-dynamic) can affect speed.

Are you using the same vertex attributes between both models? The same shaders? Different shaders have different performance characteristics.

Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense.


That's so cute that you think Microsoft's code samples are necessarily the best way to do things. tongue.png

Sean Middleditch – Game Systems Engineer – Join my team!

Thanks for the reply Shaun, even if you called me cute (maybe I should change my avatar)

No, the SetData function is only called when necessary. Not every frame.
No, I didn't create the vertex buffer similarly for both models. I just use the inbuilt approach for rendering a model in the pre-fabricated example. I wasn't making the comparison to say they should be identical in the frame rate. If the run-time version was at 180fps or something, I'd just assume that was a necessary price to pay. But 100fps seems criminal. Otherwise, the vertex types and shaders are the same.

I don't think Microsofts codes are necessarily the best. But the fact that I've used that particular 3D particle sample a lot, and have found it performs very nicely, seems to be a good sign.

It's normal enough to see all (or most) of your time being spent in Present: have a read of this: http://tomsdxfaq.blogspot.com/

As for causes of your performance drop, the first thing to do is check the vertex buffer creation and locking flags. For a dynamic buffer using in this manner, you should be creating with D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, and locking with D3DLOCK_DISCARD. You must definitely should not be calling CreateVertexBuffer each frame; create it once and reuse it (with a discard lock) each time you need to update. Also be careful that you don't attempt to read from the buffer while you have it locked.

Assuming that these are all correct, you'll need to talk a little about how you're creating the new vertex data (CPU-side) to load into the buffer. My own guess is that you're possibly doing CPU-side skinning or frame interpolation; if the former then there's a high probability that the slow down is not from your usage of a dynamic buffer, but more simply because CPU-side skinning is slow. If the latter you can quite easily switch frame interpolation to run on the GPU and thereby keep your vertex data entirely static. Either way, there is a probability that your performance issue is coming from extra CPU-side work associated with using dynamic data, and thinking a little about how you can make this data (or as much of it as possible) static can reap huge rewards.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Cool, I'll have a read through that page.

Yep, those flags are all set (or their equivalents in XNA) and I've tried with the Discard option.

I'm not calling createvertexbuffer every frame, just once at the beginning. I'm not entirely clear on the part where you say "careful that you don't attempt to read from the buffer while you have it locked". What do you mean by 'read' in this context? As I say, the model is dynamically being added to. But not each frame.

The vertex creation CPU side it quite simple. There are about 4-6 base models that I'm combining to make a larger structure. Those base models are loaded into vertex and index arrays at the beginning. Based on the user input, these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.

From what you've mentioned seems like you are not doing anything out of the ordinary. Do you have the actual draw timings for comparison?, FPS is reliable performance metric especially when developing.

What do you mean by the draw timings? As in, the number of ms taken in the Present() function?

Why will that give any more info than the FPS I've already mentioned? Everything else about the solution is identical.

these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.

Can you give more detail on how you're doing this part? Are you, for example, combining them to an std::vector (or whatever the equivalent container object is in your C# code), then copying them to a locked vertex buffer? You'll get better performance if you lock the buffer first, then write directly to the locked memory; for one you'll avoid an extra memory copy, another thing is that you'll also avoid a lot of runtime allocation and garbage collection.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement