Terrain rendering, should I fill an index buffer each frame?

Started by
11 comments, last by matches81 18 years, 10 months ago
I have a quad tree that runs to find what needs rendered. I also have a vertex buffer containing each vert. My problem is before rendering, I have to fill the index buffer with what is being rendered. Is this a bad idea? If so, whats a way around it? Thanks.
--X
Advertisement
xsirxx,

Generally what happens is that each "type" of object that is to be rendered has a vertex buffer and an associated index buffer. The index buffer is only used to reduce the number of vertices, since indices require much less memory than vertices. As such you only need to fill the vertex and index buffers once, unless you're doing some magic that requires changing indices or vertex data in your buffers.

Once the object has an index buffer and vertex buffer than before you draw the object set the vertex buffer and index buffer on the device. You can do that with the following calls:

IDirect3DDevice9::
HRESULT SetIndices( IDirect3DIndexBuffer9 *pIndexData );
HRESULT SetStreamSource( UINT StreamNumber,
IDirect3DVertexBuffer9 *pStreamData,
UINT OffsetInBytes, UINT Stride );

As you can see, you dont need to fill the index buffer each frame before you draw, you only need to activate *which* index buffers to draw.
Jeromy Walsh
Sr. Tools & Engine Programmer | Software Engineer
Microsoft Windows Phone Team
Chronicles of Elyria (An In-development MMORPG)
GameDevelopedia.com - Blog & Tutorials
GDNet Mentoring: XNA Workshop | C# Workshop | C++ Workshop
"The question is not how far, the question is do you possess the constitution, the depth of faith, to go as far as is needed?" - Il Duche, Boondock Saints
That makes sense, but what if I have a quadtree? and that quadtree culls out most of the verts before rendertime. I also have what I call "Land" which contains TONS of triangles that range through the entire map. The quadtree cuts it down. Now once its cut down I cant see another way of getting around filling an index buffer? I can obviously create a an index buffer for every leaf? Or should I? even if each buffer contains 25 triangles?

I temperarily removed the quadtree, only ran it once to fill buffer then stopped. And the fps didnt raise at all.

[Edited by - xsirxx on June 8, 2005 4:46:28 AM]
--X
Quote:Original post by xsirxx
That makes sense, but what if I have a quadtree? and that quadtree culls out most of the verts before rendertime. I also have what I call "Land" which contains TONS of triangles that range through the entire map. The quadtree cuts it down. Now once its cut down I cant see another way of getting around filling an index buffer? I can obviously create a an index buffer for every leaf? Or should I? even if each buffer contains 25 triangles?

I temperarily removed the quadtree, only ran it once to fill buffer then stopped. And the fps didnt raise at all.


You should create an index buffer for each leaf of the quadtree but try to ensure that it contains approximately 1500 primitives. Each of these leaves would not get rendered if and only if all of the triangles within that patch are outside the frustrum as determined by the quadtree.

This might seem wasteful but is actually more efficint than doing culling at a polygonal level as long as each index buffer represents approximately 1500 primitives.

Your fps might not have risen for one of two reasons.
  • V-Sync may be enabled so any rise above the refresh rate is not detecteable.

  • You have so few polygons in your total scene that the quad tree processing actually takes more time than just rendering everything would of. Quadtrees only become beneficial past a certain threshold in the number of polygons.


- Oscar [smile]
Shouldn't normally need to fill index buffer every frame. I wouldn't allocate an index buffer for 25 triangles though - I'm pretty sure that buffers (like index,vertex,texture) have video memory overhead so too fine a granularity will just use up VRAM unnecessarily - besides, if you're using Direct3D you want to try to batch up objects of similar type (ie, use same textures, render state, etc) into as few draw calls as possible - certainly for a heightfield terrain this is very feasible.
Quote:
I temperarily removed the quadtree, only ran it once to fill buffer then stopped. And the fps didnt raise at all.

I'm not sure what you mean but if your fps is too low it could be that you have too many render calls per frame (under Direct3D you can become CPU limited in this case). That could happen if you only render leaves of your quad-tree and there are many leaves visible (like thousands). Then it doesn't matter how fast your GPU can process triangles because most of your CPU time is taken up in Direct3D and the driver and the GPU could spend part of its time idle.
Quote:Original post by Croc

This might seem wasteful but is actually more efficint than doing culling at a polygonal level as long as each index buffer represents approximately 1500 primitives.


Just a quick question, why approx 1500 primitives? Why not 500, 1000, or 2000+? I know that there is overhead involved for each index buffer so you don't want to have many with few primitives each, but I'm just wondering why you came up with 1500.


Basically What I meant is, I even took out the quadtree totally and just rendered normally without culling and the fps didnt budge but maybe 4 frames max. also I am only using DrawIndexedPrimitive(...) once per frame right now. When I run my Deferred lighting and seperate by material type shaders I can use DIP(...) much more.

The Slow down though is outside of the quadtree for right now, although I am still looking for ways to make the quadtree faster. So if I dont recreate an index buffer every round, I should create one for every leaf? Even if it is less than 1500? Or basically make the maps so that its so dense(which I dont want to do) that it matches 1500 or more?

UPDATE: I get 11.4M tri/sec with no quadtree and nothing besides rendering @ 16796 tri. I then get 3.8M tri/sec WITH the quadtree running at the same settings.

So any ideas on how to speed this thing up would be LOVELY! :) thanks alto guys.

[Edited by - xsirxx on June 8, 2005 7:33:22 PM]
--X
Looks like quadtree does do significant culling because you get 11.4M tri/sec without it and 3.8M tri/sec with it and yet the fps doesn't change significantly - so you're not GPU vertex limited. You're currently only issuing one DIP per frame so you're unlikely to be CPU limited in Direct3D and the driver. It's odd !

It looks like the problem is in the deferred lighting. How many DIP calls does it do per frame ? Ie, how many total DIP calls per frame are there ?

Need a bit more info on how you're doing things to continue...

What fps are you getting ?
What CPU and GPU are you using ?
What else is the CPU doing besides calling Direct3D - anything intensive ?
Is your fps limited by vsync ?
Are you saying that with no quad-tree culling you're rendering 16796 tris total per frame (but you're getting 11.4M tri/sec which means 687fps) ?
Are you creating an index or vertex buffer(s) every frame or just creating them once at startup and writing to one or both types of buffers every frame ?
Are you using dynamic or static index buffer(s) ? What about vertex buffer(s) ?
How many buffers (both index and vertex) are you using (I assume one each since you only issue one DIP per frame) ?
How are you filling them each frame ?
Is there alot of overdraw in the scene, are you using expensive pixel shaders and are you using z-test ? Since you're drawing everything in one DIP call you might not be getting any advantage from zbuffer in some view directions (though I'd be surprised if this was the problem).
According to the dx9 documentation, you should not be doing a Release() or Create() during your main game loop. (they should all be done at load, or shutdown) I've had some instances, where 70% of the CPU time was spent issuing release/create calls for CPU skinning. That's not good.

On the other side of things, make sure you REALLY understand the point of batching, z-first rendering, and all the lovly optomizations that GDC puts out every year to take advantage of the new graphics cards. Batching your prims into lower number of drawcalls serves two purposes: 1) lower the number of batches submitted, and 2) minimize the number of state changes you have during your rendering process. This of course comes at an odd middleground, where your app may not be suffering from either of those two problems, and you're just wasting CPU cycles doing all the combining, but not getting the thought performance..

If you function profile your app (which i'd reccomend greatly, rather than just going of FPS fluxuation (which believe it or not, does vary somewhat per execution of the program, due to memory fragmentation)) You can get a better target to shoot for.


~Main
==Colt "MainRoach" McAnlisGraphics Engineer - http://mainroach.blogspot.com
Quote:Original post by kosmon_x
Just a quick question, why approx 1500 primitives? Why not 500, 1000, or 2000+? I know that there is overhead involved for each index buffer so you don't want to have many with few primitives each, but I'm just wondering why you came up with 1500.


Too few primitives per DrawPrimitive call and CPU usage is increased in dealing with processing the extra calls needed. Too many primitives however results in diminishing returns and more seriously, concurrency conflicts. The graphics drivers attempts to internally minimise renderstate changes and so re-orders and parallelises the calls for performance. As the number of primitives per call increases, the opportunity to re-order optimally decreases.

Thus, a balance needs to be struck between these two conflicting ideals and the best way is through profiling as this is not an exact science by any means.
Microsoft recommends a primitive batch size of 1000 in the SDK docs, whereas NVidia recommend 1500. My own testing of my current engine on my development machine puts this figure at approx 1400.

There is no way to derive a true optimal figure as this varies between applications and even the systems that the application is run on (so on other machines, my engine would be optimal at a different figure), and so only ballpark figures can be given.

- Oscar [smile]

This topic is closed to new replies.

Advertisement