# Vertexbuffer - huge lag.

I'm using XNA but I assume the problem is analogous in DX9?

So I'm having a huge problem rendering a model that's created dynamically at runtime. It renders fine but it's creating a huge lag.

Here's the issue illustrated through a comparison:

1) Model with 10,000 vertices created in 3ds max and rendered with shader X in XNA ---> 200fps (after everything else has happened in the game)

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

This is a completely unacceptable drop and I assume I'm doing something wrong. Something like minecraft would be impossible to run if this was a necessary drop. I can post my code if necessary but I'm just using the same approach used by the 3D particles sample. I've profiled the projected with NProf and all of the time is being spent in GraphicsDevice.Present()

Help?

Are you setting things up to get debug warnings? When I use a debug device in DX11 (I don't remember how that works in DX9) and I do something not completely legit, my framerate plummets from all the reporting of warnings every frame.

No, I'm not setting up any debug warnings. Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense. It must be more of a conceptual problem that I'm fundamentally missing.

And I've also tried using the 'NoOverwrite' option as described in Shawn Hargreaves' blog. No change.

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

Did you create the vertex buffer with similar properties for both models? The properties of a vertex buffer (e.g. dynamic vs non-dynamic) can affect speed.

Are you using the same vertex attributes between both models? The same shaders? Different shaders have different performance characteristics.

Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense.

That's so cute that you think Microsoft's code samples are necessarily the best way to do things.
Thanks for the reply Shaun, even if you called me cute (maybe I should change my avatar)

No, the SetData function is only called when necessary. Not every frame.
No, I didn't create the vertex buffer similarly for both models. I just use the inbuilt approach for rendering a model in the pre-fabricated example. I wasn't making the comparison to say they should be identical in the frame rate. If the run-time version was at 180fps or something, I'd just assume that was a necessary price to pay. But 100fps seems criminal. Otherwise, the vertex types and shaders are the same.

I don't think Microsofts codes are necessarily the best. But the fact that I've used that particular 3D particle sample a lot, and have found it performs very nicely, seems to be a good sign.

Cool, I'll have a read through that page.

Yep, those flags are all set (or their equivalents in XNA) and I've tried with the Discard option.

I'm not calling createvertexbuffer every frame, just once at the beginning. I'm not entirely clear on the part where you say "careful that you don't attempt to read from the buffer while you have it locked". What do you mean by 'read' in this context? As I say, the model is dynamically being added to. But not each frame.

The vertex creation CPU side it quite simple. There are about 4-6 base models that I'm combining to make a larger structure. Those base models are loaded into vertex and index arrays at the beginning. Based on the user input, these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.

From what you've mentioned seems like you are not doing anything out of the ordinary. Do you have the actual draw timings for comparison?, FPS is reliable performance metric especially when developing.

What do you mean by the draw timings? As in, the number of ms taken in the Present() function?

these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.

Can you give more detail on how you're doing this part?  Are you, for example, combining them to an std::vector (or whatever the equivalent container object is in your C# code), then copying them to a locked vertex buffer?  You'll get better performance if you lock the buffer first, then write directly to the locked memory; for one you'll avoid an extra memory copy, another thing is that you'll also avoid a lot of runtime allocation and garbage collection.

Yeah, I'm combining them into an array and then copying that to the vertex buffer. But that's not every frame, it's just when the model is occasionally updated. So I assumed that couldn't be the cause of the frame rate issue?

As for your suggestion, can you explain that a little more in depth. It sounds like you just said the same thing twice. How is copying them to a locked vertex buffer different than locking the buffer then writing directly to the locked memory?

1) Model with 10,000 vertices created in 3ds max and rendered with shader X in XNA ---> 200fps (after everything else has happened in the game)

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

I wouldn't call a difference of 5 ms a huge lag when you've changed from a static to a dynamic vertex buffer, and the amount of data you sent per frame is of 10.000 vertices. Not to mention the GPU now has to sync more often (i.e. wait the CPU's data to arrive; or the CPU having to wait the GPU to finish); which means that if you do more work, the framerate won't drop (because one of your components were idle waiting).

If your game was originally running at 60 fps (no vsync), it would've drop to 46.29 fps. Measure your timings in milliseconds.

As long as you're not recreating the vertex buffer every frame, and creating it with D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, and locking with D3DLOCK_DISCARD; there's not much you can do.

Reducing the size of the vertex should help.

Thanks for the response Matias. I'm quite shocked that you say that's not a big drop actually.

Further evidence to support my general suspicion that I'm fundamentally doing something wrong:

I just re-ran the game and drew each of the individual pieces (the base pieces I mentioned previously) separately with the appropriate locations and the same shader.

As I'd mentioned before, if I drew everything as a single model, I got 200fps. With the vertex buffer I was getting 100fps. With this new test I just ran (which I would think should be the most inefficient by far), I got about 180fps.

So yeah, I'm not buying that it's a necessary cost at all, there must be something else to it!

But that's not every frame, it's just when the model is occasionally updated. So I assumed that couldn't be the cause of the frame rate issue?

It should be very easy to test this. Just update your dynamic vertex buffer once. What's the framerate then? (Put a breakpoint on the code that updates it to be absolutely sure it's not being called accidentally).

If it really is a 5ms drop even without updating the vertex buffer, then yes, you're doing something wrong somewhere. That wouldn't be expected. (edit: but as L. Spiro mentioned, there's so much we don't know about how you're drawing things).

0

@phi_t

Ok, yep tried just updating the vertex buffer in 1 big jump. Same outcome. Around 100fps.

@L.Spiro

Yes, I realise fps isn't a linear measurement. But I assumed everyone knows that and therefore there would be no issue? Or am I missing something other than the semantic preference?

By occasional, I mean it's not a set time interval. It's based on the user input. Like in something like minecraft. Yet the framerate remains entirely proportional to the vertex count. Whether or not it's being constantly updated or left for a few minutes.

Yes, I'm using an index buffer. And the code for the vertex buffer is

vb = new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);


By 'single model', I mean it was loaded as the standard XNA model and only uses a single material. That means just 1 draw call right? Yes, I realise that uses a vertex buffer too, I meant 'vertex buffer' for the explicit one that I created, as opposed to the one generated automatically by an XNA model.

And I'm pretty sure the number of state changes was identical in both situations. Why do you say I added multiple draw calls and state changes?
As for you not having much information, obviously the issue is that it's currently part of a fairly complicated program. Distilling the part that's problematic so I could show you the code takes time, so I was hoping it would be resolved easily without requiring that. But now that it hasn't, maybe I should just upload some code?

What happens if the only thing you do is draw the stuff in your DynamicVertexBuffer? Does it still take 5ms?

Does your model use textures? Are the textures identical in both cases (mipmaps, dimensions,  format, etc...). Are you using the same sampling states? What happens if you change the shader so it doesn't sample textures? What's the difference between the two methods then?

Take a capture in PIX and compare the two scenarios. Does anything stand out w.r.t draw calls, state changes, etc?

Have you tried anything that can profile GPU performance, like Intel GPA?

If the only thing I draw is the dynamic vertex buffer, it jumps to 10ms (for the same 10,000 vertex buffer). Isn't that to be expected? There were various other things going on as well.

Yes, the models use the identical textures. They're using the same shader, which is where the sampling states are set. So the same.

If I change the shader to not use textures, the outcome is more or less the same, only slightly better performance which I guess is to be expected.

I'm completely new to using PIX so I'll certainly try that but it'll take a while to familiarise myself with it!

You have a scene S and you draw it all in X calls and Y state changes (which could both be 1) using a static vertex buffer at 5 milliseconds per frame.

You draw the same scene S with the same number of calls (X) and state changes (Y) using a dynamic vertex buffer which is not updated after being set at 10 milliseconds per frame.

The difference between a static vertex buffer and the same dynamic vertex buffer in scene S is 5 milliseconds.

Case closed.  The end.

Unless you want to show exactly what flags you are passing when creating the dynamic vertex buffer, which, regardless of how complex your project is, is just one line of code.

Or unless you try PIX and find out some other difference between the 2 render methods.

L. Spiro

Lol, well everyone else's responses don't seem to match your certainty on that?

I thought I already did show you the code used to create the vertex buffer. Again:

vb=new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);


and then to set the vertices:

vb.SetData(activeVertices, 0, activeVertices.Length, SetDataOptions.Discard);


And that performs the same no matter what I set for SetDataOptions. That's what you were asking for right?

So why is a dynamic vertex buffer with just 1 draw call slower than doing like 20 or so from the individual models? That doesn't seem right?

Lol, well everyone else's responses don't seem to match your certainty on that?

So why is a dynamic vertex buffer with just 1 draw call slower than doing like 20 or so from the individual models? That doesn't seem right?

It really should be about as fast, from my experience. But with the information you have given us, the only conclusion is that it is much slower (I think that's what L. Spiro was getting at).

No one's going to be able to help you anymore at this point, given in the info in this thread. Either you'll have to upload a repro to some repository somewhere and hope someone is nice enough to look at it, or you'll need to do more detective work yourself.

Yeah, I'm gonna run through some tutorials with PIX and see if that can help me. I'm also gonna see if I can cut the polycount a bit and then try geometry instancing instead. If I'm really not making any big mistakes in my use of the vertex buffer, it just seems that's not the ideal solution to my problem.

It's that last comparison though that's still making me skeptical. My, admittedly uninformed, intuition doesn't seem to accept that a dynamic vertex buffer which is supposedly created for this very purpose, is slower than drawing each piece independently.

gchewood, why you are creating such massive vertex buffer?

vb=new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);


I thought you said your mesh is 10,000 vertices, doesn't it mean you're creating 1,000,000 vertex buffer here? I wonder if this can cause FPS drop.

Hi.

Looks like it's time to see your create mesh function.

may be lots of duplicated vertices.

no index buffer.

gchewood, why you are creating such massive vertex buffer?

vb=new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);


I thought you said your mesh is 10,000 vertices, doesn't it mean you're creating 1,000,000 vertex buffer here? I wonder if this can cause FPS drop.

No, sorry. That's my mistake for not making it clear. The vertex buffer is around 20,000 vertices as the array 'vertices' at that point is of size 200. It's that size as that's around the maximum it will need to be.

Hi.

Looks like it's time to see your create mesh function.

may be lots of duplicated vertices.

no index buffer.

Hmmmm, ok. This is a potential problem. Just checked the index buffer for when I'm up to about 10,000 vertices. The index buffer is at about 200,000.

Right.

So that seems problematic.

Yep, the issue is with the code I'm using the create the base vertex and index arrays. They're being extracted from a .X model. I just checked, the vertices, normals and texture coordinates are all fine. But it's just creating a hideous number of indices for some reason. Right, at least I know where the issue is. I'm glad my intuition was right.

Thanks for all of the help, given my very vaguely described problem everyone. If anyone has ever extracted the index data from a model in xna before, please post with any info. Otherwise, I'm sure I'll figure it out.

Thanks

