Triangle Batching Question

Started by
8 comments, last by matches81 18 years ago
We've recently run into some performance issues with the terrain drawing of our game. Currently, our test terrain contains 800 polygons, created by a grid which draws two triangles per grid square. The grid is 21 by 21 vertices ( 400 squares, x 2 triangles per square is 800 polygons ). We draw these sections of the map by creating individual vertex buffers for a number of "Regions". Each region has a square of the map that it draws, via an individual "DrawIndexedPrimitive()" call. The number of regions affects the number of calls made to "DrawIndexedPrimitive()", but the total number of triangles drawn is always the same. Now here's the trouble. When we have 4x4 "Regions" ( total 16 ), our FPS is around 120-150. When we have 10x10 regions ( total 100 ), our FPS is around 30. Again, the number of polygons drawn in either case is 800. We've tried turning all of the "SetTextureStageState()" and other texture calls off, as well as materials and lighting. The only call that seems to make things a lot faster when we remove, is the call to "DrawIndexedPrimitive()". Clearly something pretty inefficient happens when we use 10x10 regions. 800 triangles is by no means too much for our video hardware, but I'm trying to figure out where to start for improving performance. One thing I'd thought of is that instead of drawing small batches per "Region", we could just draw them all at once with a single "DrawIndexedPrimitive()" call. But is 100 calls really that much overhead? It seems like most games, with so many models and different textures to set, would need to do far more than 100 calls per frame. The other thing I'd thought of is that I've heard "DrawIndexedPrimitive()" is not as efficient as using "DrawPrimitive()" with a trianglestrip or trianglelist. I would have to repeat vertices, but maybe it would be worth the extra memory. Do you guys have any suggestions? Any other questions about our app that would give you a better idea about what is going on? I greatly appreciate any help. jujumbura
Advertisement
Yes, call overhead is very costly. It's worth hundreds of polygons drawn, at least. So drawing your 800 polys at once should be faster than most other ways of drawing it.

Changing vertex buffers also costs. I'd suggest putting all your geometry into one buffer. You can easily choose to draw just a part of it if needed.

Also, DrawIndexedPrimitive is the *most efficient* way to draw. It can achieve considerably better results than triangle strips, since it's the only drawing method which can take advantage of the post-transform vertex cache. However, you have to make sure to order the triangles such that they repeat quickly enough. (ID3DXMesh::Optimize can do this for you.)
Hmm. So that makes sense, but if I try to use a single VB as much as possible, I'm confused about how I achieve certain affects. How would I then cut up my map into partial regions for things like frustrum clipping? Or changing textures, or translation? If I do them all with one DrawPrimitive call?
Create a static index buffer for each region and one dynamic index buffer for the vertex buffer. Clear the dynamic index buffer. For each region visible, add the region's index buffer to the large index buffer. Then DrawIndexedPrimitives, using the large index buffer instead of multiple times with each index buffer. This is the method I use, it seems to be pretty fast.
So you're suggesting that after I determine what portions of the area of the map I wish to draw, I then compile all of their little index buffers into one big one? That's an interesting idea. But don't I need to Lock() all of the little static index buffers in order to extract their data? Why not just maintain an equivalent of what WOULD be in the index buffers in system memory? Then you could avoid the locking process for each of the little buffers( which I thought relatively slow ). I guess maybe it comes down to whether you have more RAM available on your system or your card...

Also, could anyone refresh me as to what flag on the CreateVertex/IndexBuffer() call lets me chose dynamic vs. static?

Thanks for all the info,

jujumbura
Few ideas and comments:

- Why not creating a quadtree for minimizing the amount of blocks and create a level of texture.

- 1 Dynamic vertex buffer + 1 dynamic index buffer - store your terrain data in system memory and you can stream plenty of light weight terrain data from memory. Recreate system memory index/vertex data when lod changes. Tight 8 bytes per vertex might do it. Pros : very few VB/IB changes, Cons: streaming eats memory/bus bandwidth

- For lodding, few extra vertices sent over bus isn't bad, you can adapt the block to neighbouring blocks by changing the indices (instead of tweaking vertex positions)

- you can use indexed triangle strips with DrawIndexedPrimitive to take benefit from caches
Quote:Original post by jujumbura
Hmm. So that makes sense, but if I try to use a single VB as much as possible, I'm confused about how I achieve certain affects. How would I then cut up my map into partial regions for things like frustrum clipping? Or changing textures, or translation? If I do them all with one DrawPrimitive call?


Will you be staying with 20x20 or going higher? For 20x20 let the card do the frustum culling. Transforming these vertices is nothing for a modern card, and the work you do trying to optimise this will likely end up taking more time. If you're going higher, you can break up into pretty large chunks (like 20x20) and cull them.

I don't understand what you mean by translation. The terrain isn't one block that moves together?

As for changing textures, I'm not sure what you're trying to do, exactly. If it's a relatively rare thing, just lock the relevant part and update it.
If you are bench marking on nVidia hardware, try their performance tools. As without that, you won't know which part of the pipeline, from CPU to GPU, causes the problem.

Maybe that's not the problem of changing states but instead, of drawing too many polygons. Or even you are stalled between draw calls. e.g. when Drwa...(), data may be being sent from system memory to GPU memory and staffed your main thread?
The poorest programmer in the game industry!!
Quote:Original post by ET3D
Quote:Original post by jujumbura
Hmm. So that makes sense, but if I try to use a single VB as much as possible, I'm confused about how I achieve certain affects. How would I then cut up my map into partial regions for things like frustrum clipping? Or changing textures, or translation? If I do them all with one DrawPrimitive call?


Will you be staying with 20x20 or going higher? For 20x20 let the card do the frustum culling. Transforming these vertices is nothing for a modern card, and the work you do trying to optimise this will likely end up taking more time. If you're going higher, you can break up into pretty large chunks (like 20x20) and cull them.

I don't understand what you mean by translation. The terrain isn't one block that moves together?

As for changing textures, I'm not sure what you're trying to do, exactly. If it's a relatively rare thing, just lock the relevant part and update it.



Definitely higher than 20 by 20. But I didn't know what was a good size for a terrain patch, so I was cuttuing up the 21 by 21 into 6 by 6 regions. It sounds like that's really just a waste, from what I've heard here. Plus I need to try using a single vertex buffer, like people have suggested.

My question about translation wasn't really relevent to terrain, but I was just wondering how it factors into things like models. For terrain I can see how you could use index buffers and a single DrawIndexedPrimitive() call to draw everything you want in its place, but when when it comes down to drawing lots of models at lots of different locations at the same time, I'm wondering how you cut down on the Draw() calls.
If you can (for texturing reasons), try using bigger regions. A modern graphics card shouldn´t even have any difficulties to transform your whole terrain (with 800 tris) every frame. My guess is that drawing the whole terrain in one DrawIndexedPrimitive() will be far more effective than doing 100 DrawIndexedPrimitive()s with tiny batches.
So, if you don´t have a reason that keeps you from putting all the terrain together in one batch, try that. If your terrain gets too large to do in one batch, or gets more triangles, I would suggest doing the static vertex buffer / dynamic index buffer thing. For that I would just store the indices for a region as a normal short or int array and copy that into the index buffer used to render the terrain. This could be done with one Lock-Write-Unlock operation per frame and should be faster than rendering every region with a distinct draw call.

This topic is closed to new replies.

Advertisement