Jump to content
  • Advertisement
Sign in to follow this  
crucifer

VertexBuffer Optimization

This topic is 5388 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In my game, the terrain is dynamic. That requires me to remake 2 vertex buffers every 5-6 frames. Right now, I simply trash the 2 vertex buffers and create new ones. But when I do so, I can notice a significant performance loss between two frames. How exactly can I optimize that ? I have heard things about using two groups of vertex buffers instead of only one. Switching from group 1 to group 2 every time I got to modify the buffers so it prevents memory lock deadtime with the videocard. Unfortunatly, doing so didn't helped the performance.

Share this post


Link to post
Share on other sites
Advertisement
1) Destroying and recreating a vertex buffer regularly should be avoided at all costs. As you have discovered, it can affect your performance significantly.


2) Using multiple vertex buffers and using them in a round-robin fashion will indeed help prevent stalls at both the PC and GPU end of the pipeline on any modern T&L card. You don't need to explicitly create these multiple buffers though...


3) ...Instead, you want to use D3DUSAGE_DYNAMIC vertex buffers. With these, if you call Lock() with the right flags at the right time (depending on how you fill the buffer and what you draw from it), Direct3D and the graphics driver will handle all the multi-buffering (a.k.a. vertex buffer renaming) for you.


4) The advantage of letting D3D and the driver do the work for you is they know much more about the current hardware configuration than you do so can do tune things like how many buffers are cycled through. As long as you use the Lock() flags correctly it works fine.


5) You can find more details of how to lock dynamic buffers properly in the docs: here under the "Using Dynamic Vertex and Index Buffers" topic.


6) If you're only writing data to the buffer, you should also use the D3DUSAGE_WRITEONLY flag when creating the buffer. This allows the driver to make a more informed choice when deciding which type of memory to place the buffer in.


7) Additionally, if your device was created for software vertex processing, make sure your buffers are too!


8) It could be the case that your buffer usage isn't a bottleneck at all - what leads you to that conclusion?

For example could the real issue be the number of "batches" you have (i.e. how many Draw* calls per frame) ? - if you have too many batches (as it would be with an algorithm such as ROAM and using triangle fans), then that's an equally likely suspect.

Another could be fill-rate if you have a lot of alpha effects happening.

There are several good papers about tracking down the bottleneck in your application on the various IHV sites (most notably nVidia's and ATI's), in the Microsoft Meltdown slides, and even in the SDK docs for recent versions of the DirectX SDK.

Share this post


Link to post
Share on other sites
Thanks a lot S1CA.

I currently do use "D3DUSAGE_DYNAMIC/D3DUSAGE_WRITEONLYD3DLOCK"(creation) and "D3DLOCK_DISCARD"(lock).

The performance issue that I am experiencing is really due to the creation/deletion(actually I should write overwriting) of the buffers : When I do not take the time to recreate the buffers, the game is stable at 60FPS ... it can drop as low as 20 FPS when I create/delete the buffers at every frames.

Using two groups of vertex buffers (and switching among them) didn't helped at all last time I tried (not even a single FPS more). Here is what I was doing :
create buffer1
create buffer2
-------------------------
lock/fill/unlock buffer1
render buffer1
------------------------- next scene
lock/fill/unlock buffer2
render buffer2
------------------------- next scene
lock/fill/unlock buffer1 ... etc


Is there a more specific way to do this "multi-buffering" you were talking about ?

Share this post


Link to post
Share on other sites
Quote:
Original post by crucifer
Thanks a lot S1CA.

I currently do use "D3DUSAGE_DYNAMIC/D3DUSAGE_WRITEONLYD3DLOCK"(creation) and "D3DLOCK_DISCARD"(lock).


But are you using it as per the doc in the link I posted?

If you're using D3DLOCK_DISCARD on every Lock() you're making, are you filling all of the vertex buffer? You definately shouldn't be using D3DLOCK_DISCARD if you're only filling in a few vertices at a time!


Quote:
The performance issue that I am experiencing is really due to the creation/deletion(actually I should write overwriting) of the buffers


Well don't keep re-creating/deleting the buffers then, you don't need to. Just use the dynamic buffer. [smile] I'm probably missing something here, but I don't see any reason you need to recreate any vertex buffers.

If the number of vertices your dynamic terrain creates increases, then just make the size of your dynamic buffer large enough to hold enough vertices for your largest batch.

As you already know, tearing down and re-creating the vertex buffers is going to cost you a lot. I can't see any real reason you need to do that.

Quote:
: When I do not take the time to recreate the buffers, the game is stable at 60FPS ... it can drop as low as 20 FPS when I create/delete the buffers at every frames.


That's the kind of drop I'd expect.

There are a few things you should bear in mind when "profiling" though:

1) you should be using the RETAIL DirectX runtimes rather than the DEBUG ones you're using for development (change in the control panel). The Debug runtimes do a lot of extra validation work, output warnings to the current debugger etc.

2) you shouldn't be waiting for VSYNC (60fps sounds like you are); this imposes an artificial cap on your upper performance (i.e. 60Hz), and also skews your lower performance to a lower multiple of the refresh rate.


Quote:
Using two groups of vertex buffers (and switching among them) didn't helped at all last time I tried (not even a single FPS more). Here is what I was doing :
create buffer1
create buffer2
-------------------------
lock/fill/unlock buffer1
render buffer1
------------------------- next scene
lock/fill/unlock buffer2
render buffer2
------------------------- next scene
lock/fill/unlock buffer1 ... etc

Is there a more specific way to do this "multi-buffering" you were talking about ?


That's exactly what I mean by "multi-buffering" (i.e. more than one buffer and round robin usage). Beware that the card may be working more than a scene behind so you might need more buffers.

That's also what D3DUSAGE_DYNAMIC does when used with the correct Lock() flags. However since it's (usually) handled by the driver, it can be a lot cleverer than doing it yourself because the driver has extra knowledge that you don't (i.e. how many frames behind the GPU is rendering, the size of things such as the vertex cache, where each buffer _really_ is, etc).

I'd always recommend using D3DUSAGE_DYNAMIC buffers instead of doing that multi-buffering manually.


What exact issue are you trying to solve ?

- if you're trying to stop destroying/creating buffers every few frames, then I'd definately go for D3DUSAGE_DYNAMIC buffers.

- if the 60FPS of D3DUSAGE_DYNAMIC buffers still isn't enough, then first profile without VSYNC, then take a look at other possible bottlenecks in your application. The number of batches you're sending per frame is an important one to look at.

[edit: fixed quote tag]

[Edited by - S1CA on October 14, 2004 8:28:19 AM]

Share this post


Link to post
Share on other sites
First : Thanks about telling me that is was possible to remove the VSYNC cap. I noticed I was limited by it, but I didn't expected I could do anything to change that. How should I proceed to prevent the VSYNC to limit my FPS ?

Second : About the buffers :
Right now, by creating/destroying the buffers, I really mean "lock/fill/unlock". I don't actually do createvertexbuffer(). Also, I do use dynamic vertex buffers, and by using the lock() DISCARD flag, I fill all the buffer with my terrain (only the necessary part).

The part I do not understand from your message is how I should let the D3D do the work for me because he knows better than me what the settings are. What "work" should I leave to D3D ? and how should I proceed? By what I understand, the multi-buffering I did was the good solution (but was unsuccesful at making the FPS better because I was using only 2 buffers while I might have to use more).

Am I on the right track here ?

Share this post


Link to post
Share on other sites
Quote:
Original post by crucifer
First : Thanks about telling me that is was possible to remove the VSYNC cap. I noticed I was limited by it, but I didn't expected I could do anything to change that. How should I proceed to prevent the VSYNC to limit my FPS ?


Set the PresentationInterval of your D3DPRESENT_PARAMETERS structure to D3DPRESENT_INTERVAL_IMMEDIATE.

If you're finding that you're still capped by VSYNC, check in the advanced display properties for your graphics card. Some have extra D3D options which allow you to force all games to wait for VSYNC.


Quote:
Second : About the buffers :
Right now, by creating/destroying the buffers, I really mean "lock/fill/unlock". I don't actually do createvertexbuffer(). Also, I do use dynamic vertex buffers, and by using the lock() DISCARD flag, I fill all the buffer with my terrain (only the necessary part).


Ah right, I misunderstood then, I interpreted that as regular calls to "CreateVertexBuffer()".

When you fill the buffer, do you fill ALL of the buffer or only a few vertices at a time?

If you're only filling in a few vertices and there's enough space for another draw call worth of vertices, then you should be using D3DLOCK_NOOVERWRITE at least some of the time, as per the doc I linked to above.


Quote:
The part I do not understand from your message is how I should let the D3D do the work for me because he knows better than me what the settings are. What "work" should I leave to D3D ? and how should I proceed? By what I understand, the multi-buffering I did was the good solution (but was unsuccesful at making the FPS better because I was using only 2 buffers while I might have to use more).

Am I on the right track here ?


Yep, sorry for any confusion.

The "work" of creating multiple buffers instead of just one; the work of cycling through those buffers; the work of choosing the best number of buffers based on your usage flags; the work of choosing the best number of buffers based on internal knowledge of what the hardware is up to etc.

Although you can do that work yourself (albeit with lots of caps checking to determine things such as UMA motherboards and AGP aperture sizes), D3D and the driver are better placed to do that work for you.

D3D and the driver will do a very good job of that multiple buffer management too - but only if you use creation and lock flags which best match your usage patterns (e.g. use D3DUSAGE_WRITEONLY if you can, use D3DLOCK_NOOVERWRITE and D3DLOCK_DISCARD as per the above link, etc).

It's *very* difficult (if not impossible) to beat the driver and D3D when it comes to management of multiple buffers (when used properly).


Other tips regarding this stuff and performance in general which may help:

1) Write data sequentially to the buffer; random access is bad.


2) The buffer is likely to end up in AGP or even video memory - thus CPU reads aren't cached and might even have to come over a slow bus. Any reads of the data in the buffer will hurt your performance.


3) Ensure that any Draw* call(s) following a Lock() on a dynamic buffer only refer to vertices within the range specified in that Lock(). Make sure any indices of index buffers fall correctly within that range too!


4) The efficiency of your Lock()s and buffer creation only helps parallelism between the CPU and GPU. It's pointless trying to optimise it if the real bottleneck is somewhere else. Make sure you're not making too many Draw* calls per frame - if you're making 1000s of draw calls, then THAT is likely a much bigger bottleneck and any amount of tweaking elsewhere won't improve your performance much.


5) When trying to get your D3D performance up, do things in the following order:

a. if you're creating the device with the D3DCREATE_PUREDEVICE, remove that flag, it disables some of the checking and help D3D provides and passes the data straight through to the driver!

b. ensure you aren't waiting for VSYNC and aren't limited by anything unrelated to graphics (such as AI, sound, physics etc).

c. make any changes to algorithms, locking methods etc etc and test using the DEBUG Direct3D runtime with the output level and validation set to maximum. Then test any new code path in the debugger and check for any debug output messages from D3D.

d. fix issues related to any ERROR and WARN messages D3D has produced (obviously you can ignore some warnings if they're about something you're aware of).

e. run a Release build of your application with the RETAIL Direct3D runtime to do your profiling.

f. goto step (c) and refine until you're happy with the performance based on whatever goals you've set. Or when you're happy you've tried all possibilities. If the performance doesn't change at all or only by a very insignificant amount, then unless you're totally maxing out the hardware, the bottleneck is elsewhere in your application/graphics pipeline.

g. once you're satisfied with the performance, put the D3DCREATE_PUREDEVICE back onto your CreateDevice() call and test performance again.
If performance drops, then you've either ignored a device cap, you've ignored the D3D documentation, you're relying on something D3D is storing for you (states etc) or you're passing something which the D3D runtime was fixing for you but not warning about.
If you've got the time, fix the problem and use the D3DCREATE_PUREDEVICE flag, but it'll only shave a bit of time off and only when your app is perfectly behaved.

h. finally, put VSYNC back on if you need it.


[edit]tidied up a few words, made some parts clearer.[/edit]

[Edited by - S1CA on October 13, 2004 10:00:19 AM]

Share this post


Link to post
Share on other sites
Wow, great posts by S1CA in this thread [wink].

Anyways, both ATI and Nvidia have published some excellent papers/slidesets on D3D performance tuning:

ATI Performance Guide
Batching Performance Guide
D3D Pipeline Performance Guide

Although S1CA already covered a lot of the stuff about buffer creation/locking, these guides also take an in-depth view of topics like:

- Batching
- State changes
- Mip-mapping
- Instancing
- Shader Optimization
- Depth + Stencil Buffers

Share this post


Link to post
Share on other sites
I was just about to create a thread asking how to create vertex buffers with "unknown/changing" number of vertices. Look like D3DUSAGE_DYNAMIC is for me :)

Thx S1CA

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:

If performance drops, then you've either ignored a device cap,


Thanks for all the excellent data.
Any particular device caps to check, i.e. the usual suspects?

Regards,

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!