lots of small meshes or one big dynamic buffer?

Started by
14 comments, last by _the_phantom_ 11 years ago

Well, now that you mention outdoor scenes, you will actually get better performance by grouping the small objects(clutter/props/rocks) into few chunks, where you can render each group using a single DIP call.

The threshold value, obviously, depends on the gfx/CPU combo you use, but it is obviously faster to just render a group of 10 objects of, say, 3000 tris in one DIP compared to:

- frustum culling of 10 objects on CPU

- 10 DIP calls for some measly, 300 tris on average per object

Think of it the same way you partition the terrain. I assume you use some kind of a quadtree-like scheme for cutting the terrain into chunks and doing the frustum culling.

Now, while a single terrain chunk, say 128x128, will be considered a leaf (in terms of a quadtree), you may have lots and lots of props/rocks/clutter and it may very well be prohibitive to just render ALL props from a single VB - and especially in a scenario when there will be 4 terrain chunks in the frustum and only a small part of each is actually visible, yet you'd be sending 4 huge props VBs through the gfx card's pipeline.

As for the dynamic VB, I forgot to mention that the framerate really dropped for a short moment when the VB was being recreated- you might very well be allergic to such behaviour, but if amount of RAM is an issue, this is a great option, especially if you can spread the task of creating the dynamic VB across multiple frames.

Which, admittedly, becomes harder to manage, since during those few frames you might actually change the camera position, and thus recreate the VB that wasn't even created fully in the first place...

VladR My 3rd person action RPG on GreenLight: http://steamcommunity.com/sharedfiles/filedetails/?id=92951596

Advertisement

The threshold value, obviously, depends on the gfx/CPU combo you use, but it is obviously faster to just render a group of 10 objects of, say, 3000 tris in one DIP compared to:
- frustum culling of 10 objects on CPU
- 10 DIP calls for some measly, 300 tris on average per object

its worse than that! its more like frustum culling 2000+ objects of 10-50 triangles each, and still having 500 DIP calls of 10-50 triangles each when you're through.

Think of it the same way you partition the terrain. I assume you use some kind of a quadtree-like scheme for cutting the terrain into chunks and doing the frustum culling.

the ground is drawn as individual 10x10 quads out to clip range (50-300 units). a heightmap function is used to heightmap a dynamic quad, a "pattern map" determines the texture ID # to use for the quad. superclip4() is called on each quad. superclip is the "clip to frustum" routine. but it does a bit more, like trivially rejecting things behind the camera, etc.

this is for a fps/rpg title.

so i guess you could say the ground is in 10x10 chunks. the size is small so i can have seamless ground textures tile sets that are only 10x10 units in size (10 feet x 10 feet with the scale i'm using of 1 d3d unit = 1 foot).

since it appears (according to various docs at least) that changing textures is the worst thing you can do to a GPU, i've been following the mantra of one mesh, one texture, and sort everything into optimal order before sending it off to the pipeline.

ground quads are the only thing thats not sorted on texture before drawing. to do that, i'd need to do a pass for each ground quad texture tile used, and height map and draw just those quads on each pass.

i'm approaching the point where its time for final graphics. i do final graphics last. so nothing has been optimized within an inch of its life yet. all i've done so far is make sure the frame rate stays up, and that i can achieve the desired visual results. most of the optimization in my future will be geared towards pushing the cutoff range between high and low lod out farther from the camera. in thick woods and jungle, the cutoff is 50 feet right now. then again, you're hard pressed to see 50 in that kind of bush anyway.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Now, while a single terrain chunk, say 128x128, will be considered a leaf (in terms of a quadtree), you may have lots and lots of props/rocks/clutter and it may very well be prohibitive to just render ALL props from a single VB - and especially in a scenario when there will be 4 terrain chunks in the frustum and only a small part of each is actually visible, yet you'd be sending 4 huge props VBs through the gfx card's pipeline.

yes, i've recently started considering how i'd do a shooter type title, and came to the same question: # of batches (size of "level" chunks), vs # of triangles in a chunk that are entirely outside the viewing frustum. IE DIP overhead vs directx clipping overhead.

its possible that the best way (app dependant of course) would be one pass per texture, for each texture, clip all objects to frustum. things that are inside, add to vb. those that are partially inside, clip and add one triangle at a time. then draw that vb with its texture, then move on to the next texture. each texture gets touched exactly once. each vb only has triangles that are partially or entirely in the viewing frustum (or darn close). and the scene is "composited" in layers, one texture at a time.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

As for the dynamic VB, I forgot to mention that the framerate really dropped for a short moment when the VB was being recreated- you might very well be allergic to such behaviour, but if amount of RAM is an issue, this is a great option, especially if you can spread the task of creating the dynamic VB across multiple frames.
Which, admittedly, becomes harder to manage, since during those few frames you might actually change the camera position, and thus recreate the VB that wasn't even created fully in the first place...


created or filled?

it looks like the way to go is create once, lock many.

i'm thinking about filling a buffer each frame before drawing. perhaps one buffer for each texture. or just a few for the textures used on lots of small meshes.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

I would strongly advice you against support the fixed function pipeline anymore, especially for the reason of "compatibility". What do you want to be compatible with? 15 years old graphics hardware? Outdated fixed function samples, where probably twice as much shader equivalent tutorials exist? I don't see any point in carrying on with the fixed function pipeline for any reason. Recent GPUs e.g. doesn't even have a fixed function pipeline in that way, they will probably just emulate them, so there likely isn't even any performance gain from this. As for compatibility, almost all relevant graphic chips support shaders.

Of course it is your choice, but I see fixed function as a waste of time and something that should only be used for beginners to learn the very basics, before going on to shaders. Especially if it keeps you from using techniques like instancing, this should be an alert sign!

I need to second this - nowadays the maximum compatibility path is shaders. Especially since SM3 hardware became ubiquitous, all graphics hardware will actually emulate the fixed pipeline by using driver-provided shaders; what that generally means is tortuous code-paths with dynamic branching and/or lots of runtime shader recompilation and/or lots of shader changes, not to mention exercising code paths that driver writers no longer put much effort into. Maybe 5 years ago you could just about get away with not wanting to use shaders for compatibility reasons, but nowadays there really is no longer any excuse.

The sole exception would be if you're targetting a very specialized community that you know uses retro hardware, but otherwise using shaders just makes sense.

I did, about 10 yrs ago on a GeForce 2 GTX for a top-down 3D scene consisting of walls/props/floor of a quad-grid based level.

1. Brute-Force : SetTexture per quad + VB/IB per quad

2. Single VB / IB for whole level, DIP per quad

3. Dynamic VB - Recreating Single VB per frame (sorting/copying objects in frustum)- upon camera change. DIP per texture

I generally prefer a variant on your option 3 - a static VB (sorted by texture/material at build time) with dynamic IB but it's a tradeoff - you avoid the overhead of rebuilding the VB but you accept the overhead of draw calls jumping randomly about in the VB (hoping to come out on the right side of the tradeoff). The old advice about constraining your DIP to a specific range of vertices isn't relevant with hardware T&L (and it's worth noting that D3D10+ no longer specifies a vertex range, with the reasoning being that many D3D9 drivers actually ignored it) so that's nothing to be concerned about any more.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

and i'm trying to stick with fixed function for maximum compatibility.

Compatibility with what?
2002 - ATI releases the R300 GPU. No fixed function hardware.
2004 - NV release the NV40. No fixed function hardware.

Heck, everything you've written about being worried about smacks of problems from nearly 10 years ago...

This topic is closed to new replies.

Advertisement