Jump to content

  • Log In with Google      Sign In   
  • Create Account

Vertexbuffer - huge lag.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
25 replies to this topic

#1 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 10:03 AM

I'm using XNA but I assume the problem is analogous in DX9?

 

So I'm having a huge problem rendering a model that's created dynamically at runtime. It renders fine but it's creating a huge lag.

 

Here's the issue illustrated through a comparison:

 

1) Model with 10,000 vertices created in 3ds max and rendered with shader X in XNA ---> 200fps (after everything else has happened in the game)

 

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

This is a completely unacceptable drop and I assume I'm doing something wrong. Something like minecraft would be impossible to run if this was a necessary drop. I can post my code if necessary but I'm just using the same approach used by the 3D particles sample. I've profiled the projected with NProf and all of the time is being spent in GraphicsDevice.Present() 

 

Help?



Sponsor:

#2 cephalo   Members   -  Reputation: 575

Like
2Likes
Like

Posted 07 July 2014 - 10:24 AM

Are you setting things up to get debug warnings? When I use a debug device in DX11 (I don't remember how that works in DX9) and I do something not completely legit, my framerate plummets from all the reporting of warnings every frame.



#3 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 10:44 AM

No, I'm not setting up any debug warnings. Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense. It must be more of a conceptual problem that I'm fundamentally missing. 

And I've also tried using the 'NoOverwrite' option as described in Shawn Hargreaves' blog. No change.


Edited by gchewood, 07 July 2014 - 11:22 AM.


#4 SeanMiddleditch   Members   -  Reputation: 7261

Like
0Likes
Like

Posted 07 July 2014 - 11:29 AM

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps


Are you recreating/reuploading the model each frame? Uploading a model takes time, so uploading once will necessarily be faster than uploading it repeatedly.

Did you create the vertex buffer with similar properties for both models? The properties of a vertex buffer (e.g. dynamic vs non-dynamic) can affect speed.

Are you using the same vertex attributes between both models? The same shaders? Different shaders have different performance characteristics.
 

Like I say, I'm just using the same code found in the 3D particles sample from microsoft so there's really no room for me to be doing anything wrong in that sense.


That's so cute that you think Microsoft's code samples are necessarily the best way to do things. tongue.png

#5 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 11:37 AM

Thanks for the reply Shaun, even if you called me cute (maybe I should change my avatar)

No, the SetData function is only called when necessary. Not every frame.
No, I didn't create the vertex buffer similarly for both models. I just use the inbuilt approach for rendering a model in the pre-fabricated example. I wasn't making the comparison to say they should be identical in the frame rate. If the run-time version was at 180fps or something, I'd just assume that was a necessary price to pay. But 100fps seems criminal. Otherwise, the vertex types and shaders are the same.

I don't think Microsofts codes are necessarily the best. But the fact that I've used that particular 3D particle sample a lot, and have found it performs very nicely, seems to be a good sign.



#6 mhagain   Crossbones+   -  Reputation: 8285

Like
3Likes
Like

Posted 07 July 2014 - 11:52 AM

It's normal enough to see all (or most) of your time being spent in Present: have a read of this: http://tomsdxfaq.blogspot.com/

 

As for causes of your performance drop, the first thing to do is check the vertex buffer creation and locking flags.  For a dynamic buffer using in this manner, you should be creating with D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, and locking with D3DLOCK_DISCARD.  You must definitely should not be calling CreateVertexBuffer each frame; create it once and reuse it (with a discard lock) each time you need to update.  Also be careful that you don't attempt to read from the buffer while you have it locked.

 

Assuming that these are all correct, you'll need to talk a little about how you're creating the new vertex data (CPU-side) to load into the buffer.  My own guess is that you're possibly doing CPU-side skinning or frame interpolation; if the former then there's a high probability that the slow down is not from your usage of a dynamic buffer, but more simply because CPU-side skinning is slow.  If the latter you can quite easily switch frame interpolation to run on the GPU and thereby keep your vertex data entirely static.  Either way, there is a probability that your performance issue is coming from extra CPU-side work associated with using dynamic data, and thinking a little about how you can make this data (or as much of it as possible) static can reap huge rewards.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#7 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 12:10 PM

Cool, I'll have a read through that page.

 

Yep, those flags are all set (or their equivalents in XNA) and I've tried with the Discard option. 

I'm not calling createvertexbuffer every frame, just once at the beginning. I'm not entirely clear on the part where you say "careful that you don't attempt to read from the buffer while you have it locked". What do you mean by 'read' in this context? As I say, the model is dynamically being added to. But not each frame. 

 

The vertex creation CPU side it quite simple. There are about 4-6 base models that I'm combining to make a larger structure. Those base models are loaded into vertex and index arrays at the beginning. Based on the user input, these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.



#8 cgrant   Members   -  Reputation: 762

Like
0Likes
Like

Posted 07 July 2014 - 01:19 PM

From what you've mentioned seems like you are not doing anything out of the ordinary. Do you have the actual draw timings for comparison?, FPS is reliable performance metric especially when developing.



#9 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 01:56 PM

What do you mean by the draw timings? As in, the number of ms taken in the Present() function? 

Why will that give any more info than the FPS I've already mentioned? Everything else about the solution is identical.



#10 mhagain   Crossbones+   -  Reputation: 8285

Like
0Likes
Like

Posted 07 July 2014 - 02:35 PM

these base arrays are then combined into the larger array which is set to the vertex/index buffers when it's changed.

 

Can you give more detail on how you're doing this part?  Are you, for example, combining them to an std::vector (or whatever the equivalent container object is in your C# code), then copying them to a locked vertex buffer?  You'll get better performance if you lock the buffer first, then write directly to the locked memory; for one you'll avoid an extra memory copy, another thing is that you'll also avoid a lot of runtime allocation and garbage collection.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#11 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 03:10 PM

Yeah, I'm combining them into an array and then copying that to the vertex buffer. But that's not every frame, it's just when the model is occasionally updated. So I assumed that couldn't be the cause of the frame rate issue?

As for your suggestion, can you explain that a little more in depth. It sounds like you just said the same thing twice. How is copying them to a locked vertex buffer different than locking the buffer then writing directly to the locked memory?


Edited by gchewood, 07 July 2014 - 03:11 PM.


#12 Matias Goldberg   Crossbones+   -  Reputation: 3723

Like
2Likes
Like

Posted 07 July 2014 - 03:16 PM


1) Model with 10,000 vertices created in 3ds max and rendered with shader X in XNA ---> 200fps (after everything else has happened in the game)

 

2) Similar 10,000 vertex model constructed at runtime with a dynamic vertex buffer and rendered with shader X ---> 100fps

I wouldn't call a difference of 5 ms a huge lag when you've changed from a static to a dynamic vertex buffer, and the amount of data you sent per frame is of 10.000 vertices. Not to mention the GPU now has to sync more often (i.e. wait the CPU's data to arrive; or the CPU having to wait the GPU to finish); which means that if you do more work, the framerate won't drop (because one of your components were idle waiting).

 

If your game was originally running at 60 fps (no vsync), it would've drop to 46.29 fps. Measure your timings in milliseconds.

 

As long as you're not recreating the vertex buffer every frame, and creating it with D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, and locking with D3DLOCK_DISCARD; there's not much you can do.

Reducing the size of the vertex should help.



#13 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 03:31 PM

Thanks for the response Matias. I'm quite shocked that you say that's not a big drop actually. 

 

Further evidence to support my general suspicion that I'm fundamentally doing something wrong:

 

I just re-ran the game and drew each of the individual pieces (the base pieces I mentioned previously) separately with the appropriate locations and the same shader.

As I'd mentioned before, if I drew everything as a single model, I got 200fps. With the vertex buffer I was getting 100fps. With this new test I just ran (which I would think should be the most inefficient by far), I got about 180fps. 

 

So yeah, I'm not buying that it's a necessary cost at all, there must be something else to it!


Edited by gchewood, 07 July 2014 - 03:32 PM.


#14 L. Spiro   Crossbones+   -  Reputation: 14423

Like
3Likes
Like

Posted 07 July 2014 - 03:51 PM

I got 200fps. With the vertex buffer I was getting 100fps.

I repeat: Do not time in FPS, time in milliseconds.
You did not drop by 100 FPS, you dropped by 5 milliseconds.
 
Before we can really give you any meaningful advice you are long overdue in giving as real information.
How often is “occasionally” updating?
Show the update routine.
Show the code for making the vertex buffer.
Are you using index buffers?
 
 

As I'd mentioned before, if I drew everything as a single model, I got 200fps.

What does that mean?
A single draw call?

 

With the vertex buffer I was getting 100fps.

What does that mean? You were drawing without a vertex buffer before?

 

So yeah, I'm not buying that it's a necessary cost at all, there must be something else to it!

Without clarification on what you actually changed to go from 5 milliseconds per frame to 5.55555 milliseconds per frame (see how we don’t measure things in FPS because it’s misleading?), I assume:
#1: You were drawing everything in a single draw call originally.
#2: You went from that to drawing some things in a single draw call and other things in multiple draw calls with a dynamic buffer. That is when you noticed a decrease of 5 milliseconds per frame (200 FPS -> 100 FPS).


Then it makes perfect sense.
Drawing everything all at once with a single draw call and no state changes is of course absurdly fast.  It’s also impossible in a real game, which means there is no point in benchmarking it.

When you break the scene into multiple draw calls with state changes, that is where your performance is going.

There is a heavy hit for changing shaders (even if it is the same shader being set twice in a row), textures, render targets, vertex buffers, etc.

It’s not just because you switched to a dynamic vertex buffer.  You also added multiple draw calls and state changes.

 

That is, assuming my #1 and #2 are correct.

Again, the amount of information we don’t have here is extremely high.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#15 phil_t   Crossbones+   -  Reputation: 4109

Like
0Likes
Like

Posted 07 July 2014 - 03:51 PM


But that's not every frame, it's just when the model is occasionally updated. So I assumed that couldn't be the cause of the frame rate issue?

 

It should be very easy to test this. Just update your dynamic vertex buffer once. What's the framerate then? (Put a breakpoint on the code that updates it to be absolutely sure it's not being called accidentally).

 

If it really is a 5ms drop even without updating the vertex buffer, then yes, you're doing something wrong somewhere. That wouldn't be expected. (edit: but as L. Spiro mentioned, there's so much we don't know about how you're drawing things).


Edited by phil_t, 07 July 2014 - 03:53 PM.


#16 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 04:30 PM

@phi_t

 

Ok, yep tried just updating the vertex buffer in 1 big jump. Same outcome. Around 100fps.

 

@L.Spiro

 

Yes, I realise fps isn't a linear measurement. But I assumed everyone knows that and therefore there would be no issue? Or am I missing something other than the semantic preference?

 

By occasional, I mean it's not a set time interval. It's based on the user input. Like in something like minecraft. Yet the framerate remains entirely proportional to the vertex count. Whether or not it's being constantly updated or left for a few minutes.

 

Yes, I'm using an index buffer. And the code for the vertex buffer is 


vb = new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);

By 'single model', I mean it was loaded as the standard XNA model and only uses a single material. That means just 1 draw call right? Yes, I realise that uses a vertex buffer too, I meant 'vertex buffer' for the explicit one that I created, as opposed to the one generated automatically by an XNA model.

And I'm pretty sure the number of state changes was identical in both situations. Why do you say I added multiple draw calls and state changes?
As for you not having much information, obviously the issue is that it's currently part of a fairly complicated program. Distilling the part that's problematic so I could show you the code takes time, so I was hoping it would be resolved easily without requiring that. But now that it hasn't, maybe I should just upload some code?


Edited by gchewood, 07 July 2014 - 04:35 PM.


#17 phil_t   Crossbones+   -  Reputation: 4109

Like
0Likes
Like

Posted 07 July 2014 - 04:56 PM

What happens if the only thing you do is draw the stuff in your DynamicVertexBuffer? Does it still take 5ms?

 

Does your model use textures? Are the textures identical in both cases (mipmaps, dimensions,  format, etc...). Are you using the same sampling states? What happens if you change the shader so it doesn't sample textures? What's the difference between the two methods then?

 

Take a capture in PIX and compare the two scenarios. Does anything stand out w.r.t draw calls, state changes, etc?

 

Have you tried anything that can profile GPU performance, like Intel GPA?



#18 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 05:10 PM

If the only thing I draw is the dynamic vertex buffer, it jumps to 10ms (for the same 10,000 vertex buffer). Isn't that to be expected? There were various other things going on as well.

 

Yes, the models use the identical textures. They're using the same shader, which is where the sampling states are set. So the same.

If I change the shader to not use textures, the outcome is more or less the same, only slightly better performance which I guess is to be expected.

I'm completely new to using PIX so I'll certainly try that but it'll take a while to familiarise myself with it!


Edited by gchewood, 07 July 2014 - 05:12 PM.


#19 L. Spiro   Crossbones+   -  Reputation: 14423

Like
0Likes
Like

Posted 07 July 2014 - 05:41 PM

You have a scene S and you draw it all in X calls and Y state changes (which could both be 1) using a static vertex buffer at 5 milliseconds per frame.

You draw the same scene S with the same number of calls (X) and state changes (Y) using a dynamic vertex buffer which is not updated after being set at 10 milliseconds per frame.

 

The difference between a static vertex buffer and the same dynamic vertex buffer in scene S is 5 milliseconds.

 

Case closed.  The end.

 

 

Unless you want to show exactly what flags you are passing when creating the dynamic vertex buffer, which, regardless of how complex your project is, is just one line of code.

Or unless you try PIX and find out some other difference between the 2 render methods.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#20 gchewood   Members   -  Reputation: 236

Like
0Likes
Like

Posted 07 July 2014 - 06:56 PM

Lol, well everyone else's responses don't seem to match your certainty on that? 

 

I thought I already did show you the code used to create the vertex buffer. Again:

vb=new DynamicVertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertices.Length*100, BufferUsage.WriteOnly);

and then to set the vertices:

vb.SetData(activeVertices, 0, activeVertices.Length, SetDataOptions.Discard);

And that performs the same no matter what I set for SetDataOptions. That's what you were asking for right? 

So why is a dynamic vertex buffer with just 1 draw call slower than doing like 20 or so from the individual models? That doesn't seem right?


Edited by gchewood, 07 July 2014 - 07:07 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS