• Advertisement

Archived

This topic is now archived and is closed to further replies.

Index buffers and batching question

This topic is 5039 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, so I''ve been doing lots of reading in these forums and on developer.nvidia.com about batching index buffers by textures and state types. So I currently have a vertex buffer that has all worlds static geometry in it. My question is how exactly should I use the index buffer? Say for example I had a world half of which used TextureA and the other half used TextureB for textures. This is exactly two state changes. Neglecting visibility culling- I sort the worlds gemoetry and place all the vertex indices that use TextureA into the first half of an index buffer and then all vertices that use TextureB indices are placed into the second half. Then I simply call DrawIndexedPrimitive twice, once for the first half of the index buffer and once for the second. So each frame I need to lock and write to the index buffer. I know doing this is slow, is the above a proper method? Is it best to have one hugeass index buffer or an index buffer for each state change? What does everyone else use?

Share this post


Link to post
Share on other sites
Advertisement
You could have a single index buffer for the whole mesh; the DrawIndexedPrimitive() function allows you to specify which index to start with. Pseudocode:


CreateVertexBuffer();
CreateIndexBuffer();
pDevice->BeginScene();


// The same vertices & indices will be used for both passes.
pDevice->SetStreamSource(0,VertexBuffer);
pDevice->SetIndices(0,IndexBuffer);

// First setup the first set''s render states.
SetupFirstHalfStates();

// Draw.
pDevice->DrawIndexedPrimitive(0,NumIndicesInFirstSet,NumTrianglesInFirstSet); // I forget the parameters exactly.

// First setup the second set''s render states.
SetupSecondHalfStates();

// Draw.
pDevice->DrawIndexedPrimitive(SecondSetStartingPoint,NumIndicesInSecondSet,NumTrianglesInSecondSet); // I forget the parameters exactly.

pDevice->EndScene();
pDevice->Present();

Share this post


Link to post
Share on other sites
I have recently started a project called MYOX in which I had to solve this problem too. The absolute best way that I have discovered is this:

#1) Keep ALL your framework (hard vertex) data in 2 buffers: one static vertex buffer (FVF = D3DFVF_XYZ|D3DFVF_NORMAL|D3DFVF_TEX1, 32 bytes = optimal) and one static index buffer
#2) NEVER LOCK THE BUFFERS after they are full of data. EVER!!!
#3) Designate 'index chunks' that have 4 data members, one for each call-specific part of DIP (ignore BaseVertexIndex). There is one chunk per object
#4) Throw objects (with one texture each, you never need more) that reference the index chunks into a big array which is kept sorted by texture and rendered according to the object's type. Many types can share one chunk.

Now that you have a very fast and expandable system, you can start doing more fun things like dynamic objects, actors and stuff. I don't want to divulge too much but there are a few pointers I would like to make...

: Need a scrolling texture for water? Don't lock that vertex buffer! Just use TCI--look it up in the DX9 tutorial on textures.
: It is possible, using this system, to completely avoid ANY calls to a virtual function.
: Using a big array of textures is awesome. You don't need a fancy manager or what-not...
: Avoid making complex classes to expose minimum functionality, or functionality that you *think* you or someone else might need. DONT include something because "someone else may find it helpful." Chances are, after you're finished this project you'll rewrite your next one from scratch anyway.


Okay so I've strayed a lot from the original question but I'd like to point out one more thing: even for (bone) animated objects, you will not need to lock the buffers or use more than one texture if you create the geometry correctly; same for multitextured polygons. Multitexturing can be done by simply having two objects with different alpha-overlay textures rendered in the same location. Neat, eh?

[edited by - Karl G on May 8, 2004 5:07:11 PM]

Share this post


Link to post
Share on other sites
That makes total sence (both of you, thanks for posting =)

So what happens when rendering a BSP tree? Surely you''re going to be doing some sort of visibility occlusion tests so you don''t have to send as much data to the video card each frame.

For example- Each frame you simply do a view frustum cull, gather which faces are in the view frustum, then build the index buffer from only what''s visible. The minimum to do this would require a lock and a write to an index buffer each frame then?

Is this a correct assumption?

Anyone else have any other thoughts they''d like to pipe in?

Share this post


Link to post
Share on other sites
Although others would like to argue, building a perfect view set each frame just isn''t necessary anymore. Nearly all video cards are smart enough to throw out stuff that is outside of the screen anyway, so doing that yourself is a bit fruitless. I''m not saying that you shouldn''t do gross frustum checks to see if entire batches of 1000s of polygons are out of view--that you should DEFINETLY do--but don''t rebuild an index buffer on a per-polygon basis. No part of your pipe should ever be per-polygon: it would defeat the purpose of speed increase because it requires buffer locks.

What you *could* do on the other hand is divide your terrain (or whatever) into patches of 16x16 or maybe 8x8 quads (pairs of 2 triangles that form a square) and do culling on those patches, in which case you simply don''t draw an entire set of 512 (or 128) polygons.

Share this post


Link to post
Share on other sites
You don''t stall just because you lock a buffer. Consider particle systems: you generate the position of each particle every frame.

If you properly use NOOVERWRITE and, when you get to the end of a buffer, DISCARD, on a DYNAMIC vertex buffer, the driver can do double/triple/etc-buffering of data under the hood and you''re not stalling out, for example.

Share this post


Link to post
Share on other sites
"hplus0603" is right, actually...I overexaggerated about the not locking the buffer by a tad. It's really hard to get it to work out for any kind of starting game though--if you look at the PointSprites demo, you'll notice that it runs fairly slowly (~180 FPS in standard window on my P4 2.4 GHz/hyperthreaded 512 RAM/GeForce 4). This is due to the enormous amount of memory that has to be pushed each frame to do particle effects...which requires a dynamic buffer; I was emphasizing the fact that it is possible to create a good, workable engine without needing to lock the buffer. Think about it, for just about anything in a 3d program, you want to use static buffers because copying data is costly, ESPECIALLY each frame. Lets say you have 1000 diffuse colored point sprites (12+4 = 16 bytes each) that you build each frame. 16x1000 = 16KBps x 60 frames per second = almost a MB of just pure coping per second, which must then be sent over to the graphics card to render.

That said, if there's no way around locking (IE you can't use a vertex shader to get the same effect...which is highly doubtful...) then make sure it's a dynamic buffer.


(Edit: changed some wording)

[edited by - Karl G on May 9, 2004 5:45:51 PM]

Share this post


Link to post
Share on other sites
quote:

16x1000 = 16KBps x 60 frames per second = almost a MB of just pure coping per second, which must then be sent over to the graphics card to render.



Er, you''ve got the right idea, but that''s a bad analogy. The frame rate is going to be dependent on the amount of data you transfer over, not vice-versa. Also, the AGP bus can handle plenty of data, so...

But that''s beside the point. Locking a vertex buffer is a very, very slow process, which should be avoided doing as much as possible.

Also, re: particles.

Locking the buffer is one option, but you could also try simply giving each particle it''s own matrix too, though that might be slower than the lock is anyway.

Share this post


Link to post
Share on other sites
quote:
Original post by Etnu
Also, re: particles.

Locking the buffer is one option, but you could also try simply giving each particle it''s own matrix too, though that might be slower than the lock is anyway.



Thats left me a little confused. Is it not better to batch all particles in a particle system into one vertex buffer because you then avoid multiple draw primative calls?

Share this post


Link to post
Share on other sites
quote:

Is it not better to batch all particles in a particle system into one vertex buffer because you then avoid multiple draw primative calls?



That depends on whether or not it''s faster to lock the buffer or call draw primitive a few more times. You''d have to do some performance testing to determine.

Share this post


Link to post
Share on other sites

  • Advertisement