• Announcements

Archived

This topic is now archived and is closed to further replies.

Organization of vertex buffers in D3D

Recommended Posts

rileyriley    235
I''m new to D3D, and sort of struggling to get my head around the idea of vertex buffers (though I suspect they may be much simpler than I perceive). My main question is: is it ok to have many vertex buffers (say, one per entity in my game world) and then switch between stream sources while rendering, or should I have only one central vertex buffer into which I smash all of the models in the scene? Any general advice about optimizing them would be appreciated as well. Thanks

Share on other sites
wazoo69    157
1. REALLY really really (no really) read the DirectX FAQ''s hosted over at msdn.microsoft.com. Both the DX8 and DX9 versions have many suggestions for vb''s..

2. Have as few as possible. The less switching you have to do between VB''s is better (even though there''s been lots of optimizations in the drivers for this).

3. I don''t even bother using static vb''s anymore (even though they''re suggested by the FAQ). I stick to one or two dynamic vb''s that I empty and fill each frame.

that''s a start anyways..

Share on other sites
quote:
Original post by wazoo69
3. I don''t even bother using static vb''s anymore (even though they''re suggested by the FAQ). I stick to one or two dynamic vb''s that I empty and fill each frame.

I wouldn''t recommend this one. Static VBs (mostly) end up being in video memory, while dynamic VBs (mostly) end up being in AGP memory. As a result, static VBs are faster to render, while dynamic VBs are much much faster to write to.

The recommendation is to use static VBs whenever possible; if you have data that is static, then it should be placed in a static VB for optimum performance. The performance difference might not be apparent in simple situations/applications/tests, yet when things get complicated - you''ll appreciate the perf gain you get.

One thing to keep in mind: There''s currently no support for the "slightly dynamic" usage profile, i.e. data that is updated every now and then (100 frames or more, for example). The best thing to do with these is to make them static, to get advantage of the fast rendering (since we''re not writing frequently). Problem is: some "smart" drivers used to spot when you did this (locked a static buffer) and then moved it to AGP memory.
As far as I know, recent drivers don''t do this anymore.

A recent post from an ATI devrel guy at DXDEV:
quote:
I''ve been talking to our driver guys about how we manage VBs, and this
is
what happens in our drivers

1) We do not move STATIC VBs that are locked with 0 (ie no

2) The driver doesn''t strictly place STATIC VBs in local vidmem and
DYNAMIC
VBs in AGP. A create of a STATIC VB will attempt local allocation first,
followed by AGP if the local allocation fails. A create of a DYNAMIC VB
will
attempt AGP first, then local.

So you can see from point 2 that we treat the DYNAMIC flag as an
indicator
to use AGP memory, so the recommendation for our hardware (which should
apply from R1xx onwards) would be to NOT use this flag if you plan on
relatively infrequent updates. In this way, as long as you lock with 0
you
should always get the local vidmem version for a STATIC VB, and never
have
to worry about the driver moving the VB around no matter how many times
you
lock it.

Obviously there are things to be careful of. First, there is the
potential
stall if the VB is still in use. Second, there is the issue of "sparse"
updates. In general it is way better to update large consecutive areas
of
memory in local vidmem, and not a good idea to do a large number of
small
affect

It''s also possible that the DirectX runtime might get involved before we
get
to see the Lock. Someone from Microsoft might care to comment on this.

Dave Horne
Technical Developer Relations, Europe
ATI Technologies Inc

And an nVidia devrel seconded this. So this should apply for both IHVs.

MHaggag''s corner

Share on other sites
I''ll give you a real world example. The title I''m working on has a grand total of, IIRC, five vertex buffers. One of those is a dummy which we ended up not using, and another is because of some weird requirements in our diagnostic text -- it''s only for development. So really we have three.

How''s that for minimizing the # of VBs? Although I''m not sure that we couldn''t have reduced it to two -- one static, and one dynamic. But at a certain point you just say "good enough".

I''m not saying that it''s an A-1 priority to reduce your VBs to an absolute minimum of two or three. It''s not, and it can conceivably be a bad idea when you''re overcommitting video memory and things have to be swapped in and out. We haven''t shipped yet and we may find some hardware configurations where our current scheme doesn''t work well. But so far so good, no problems.

Share on other sites
rileyriley    235
Thanks for the info. I''m going to try to process more of the docs and code a few test runs; I''ll probably come back here for confirmation later

Share on other sites
thedo    124
Donovan,

Out of interest, do you set the size of the VBs to the maximum allowed by the GFX card? Or do you have a defined maximum? How do you handle if a scene requires more triangles than you can fit into 1 VB?

I''m trying to come up with a system with 2VBs (one dynamic, one static) and its all fine on my GFX card (GF4Ti - supports lots of triangles, but others support alot less). Just interested in other handle this really.

Cheers

Neil

WHATCHA GONNA DO WHEN THE LARGEST ARMS IN THE WORLD RUN WILD ON YOU?!?!

Share on other sites
rileyriley    235
I''m looking at the Donuts4 demo in the DX SDK docs and noticing that they use a different mesh for each type of object, and call Render() from it for every instance of every type of object on the screen!

On the other hand, it runs very poorly ;o

Are meshes somehow different from VBs? Is there internal optimization that makes it a good idea to use them in this manner? Or is this the equivalent of making a new vertex buffer for every type of object?

Thanks again

Share on other sites
rileyriley    235
Also: say I load 2 models into a vertex buffer. Now I want to rotate the second one. if I:

//set the stream source to my vertex buffer, that holds BOTH modelsd3dDevice->SetStreamSource(0, pVertexBuffer, 0, stride);//draw the first oned3dDevice->DrawPrimitive(D3DPT_TRIANGLELIST, 0, numTrisInModel1);//now I want to change the world transformd3dDevice->SetTransform(D3DTS_WORLDMATRIX, object2Transform);//and draw the second oned3dDevice->DrawPrimitive(D3DPT_TRIANGLIELIST, 0, numTrisInModel2);

So I''m using only one vertex buffer... but still require many calls to DrawPrimitive. Is that efficient? Or should I try to draw all of my models using the same VB *and* the same draw call?

Thanks

Share on other sites
quote:
Out of interest, do you set the size of the VBs to the maximum allowed by the GFX card? Or do you have a defined maximum? How do you handle if a scene requires more triangles than you can fit into 1 VB?

Well, I seem to recall doing a test and being able to create a static vertex buffer out of all remaining video memory. So, no, they''re not the maximum allowed.

What I did was take the largest level, see how much it needed, and then add another 50% or so for good measure. Near the end of the project, we profiled the levels, put some of them on a vertex diet, and brought the ceiling closer in to free up memory. The plan was possibly to have each level specify its requirements, but that ended up being unnecessary.

Dynamic is trickier of course. I tracked the maximum dynamic memory used in any frame at runtime and used that stat to come up with a reasonable ceiling. If that ceiling were ever hit, particles might disappear or characters would LOD down. This should almost never happen and in the circumstances where it would, the player is unlikely to notice because there would be so much going on on the screen.

Share on other sites
quote:
Original post by rileyriley
Are meshes somehow different from VBs? Is there internal optimization that makes it a good idea to use them in this manner? Or is this the equivalent of making a new vertex buffer for every type of object?

Generally, meshes use vertex and index buffers internally - so they''re theoretically equivalent to using your own. However, DrawSubset sets the required states (VBs, IBs, render states, ...etc) everytime it''s called - so you get some state setting redundancy. On PURE devices, that''ll make you slower.

Anyway, ID3DXMesh offer excellent optimization functionality (optimizing for the adapter''s cache size, generating adjacency, stuff like this) so you can always:
- Use a mesh interface for optimizing your buffers, then release it.
- Use meshes normally, and when you render them: Just Get the vertex and index buffers and do a DrawIndexedPrimitive.

Just keep in mind that meshes support indexed triangle lists *only*.

quote:
but still require many calls to DrawPrimitive. Is that efficient? Or should I try to draw all of my models using the same VB *and* the same draw call?

It''s natural that you do a DIP per world-matrix (and per common-state, generally).
Don''t worry too much about this at an early stage. Make a proper design (i.e. don''t call DIP excessively per frame), and when it needs optimization (later), do it.

MHaggag''s corner