lots of small meshes or one big dynamic buffer?

Started by
14 comments, last by _the_phantom_ 11 years ago

what's faster? lots of small meshes ? or copying them all into one big dynamic buffer and drawing that?

lets say 150 to 500 meshes, of about 10 tri's ech. all with the same texture.

another way to phrase it:

would it be faster to take every triangle in the scene that uses the same texture and copy them all to a dynamic vertex buffer, do that for every texture in the scene, then just just draw the dynamic vertex buffers?

you basically be sorting on texture, all the triangles in the scene, into separate dynamic vertex buffers.

but i'm thinking that if you had say 5 textures, and 500 meshes of 10 tri's, that 5 batch calls of 1000 tri's each might be faster than 500 batch calls of 10 tri's each. despite the overhead of copying the triangles to the dynamic vertex buffers each frame.

has anyone ever tried this?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Advertisement

The general advice is that if you are CPU limited at all, then you should reduce the number of draw calls if possible. With such small triangle counts per primitive, I would suggest using some form of instancing if possible, which could give you the benefits of both worlds.

To be perfectly honest though, nobody can tell you which one will be faster, or by how much. It completely depends on the scene, and the rendering techniques that you are using. You need to attempt each method, and see which one is faster in the given situation - there is no hard rule to go by... The best case is that you could configure your engine to do either method in the appropriate time. That would let you customize your rendering approach for each scene that you render.

I use a method like you describe, but at data-compilation time, and using static buffers at run-time.

On the last console game I made, we did extensive profiling and settled on a rough rule of thumb that every draw call should cover at least 400 pixels, in order to avoid stalls inside the GPU pipeline, and maybe 1 triangle per 16 pixels. These guidelines vary *hugely* depending on your shaders and the actual GPU though...

Depending on the API that you're using, there can be large amounts of CPU overhead in calling any graphics API function, so you often want to reduce API calls to a minimum - that one is easy to profile yourself though, by measuring the time taken by your D3D/GL calls.

Do you want to optimize CPU time, GPU time, or both?

With such small triangle counts per primitive, I would suggest using some form of instancing if possible, which could give you the benefits of both worlds.

that was the very first thing i checked, but it requires shaders, and i'm trying to stick with fixed function for maximum compatibility.

i've been working on a design for some game library modules, and came to the conclusion that the whole problem with games is the graphics, they take up too much time. computers are fast enough to model most anything we want for game purposes, but much/most/almost all of the computer's time is spent drawing.

we can draw scenes at the complexity we want, or at the detail we want, but not both yet really.

i don't think there's a graphics programmer out there who wouldn't draw more if they had twice the processing power. I don't think anyone one would say "naw, thats ok, i got enough stuff in my scene".

since apparently graphics cards like to draw lots of triangles at once using the same texture, i was thinking one big buffer wth all the triangles for a texture might be faster.

To be perfectly honest though, nobody can tell you which one will be faster, or by how much. It completely depends on the scene, and the rendering techniques that you are using. You need to attempt each method, and see which one is faster in the given situation - there is no hard rule to go by.

i know what you mean. you would think it wouldn't be that way and that some methods would tend to rise to the top, but like everything else in games there's 6+ ways to do it and it all depends.

looks like i might be spending some quality time with vertex and index buffers.

am i correct in the assumption that i want to copy the vertices one after the other, and add the "vertex base index" of each mesh to its index values?

since i've come to the conclusion that graphics is the problem, i'm going to see what i can do to get some better performance. perhaps even go to shaders and sacrifice some backward compatibility.

its either that, or i don't draw rich environments, or i only draw them out to 50 feet, or i do it all with 2d billboards. : P

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Do you want to optimize CPU time, GPU time, or both?

not sure. the goal is to be able to draw rich environments, and still have cpu power left for semi-serious simulation.

i'm not necessarily thinking terms of a specific title, more like general approaches that can be used in multiple titles.

I use a method like you describe, but at data-compilation time, and using static buffers at run-time.

is it fast enough that i might do that at the start of a new game when i generate the world? Or are we talking 30 days runtime on a MIPS alpha?

Depending on the API that you're using, there can be large amounts of CPU overhead in calling any graphics API function, so you often want to reduce API calls to a minimum - that one is easy to profile yourself though, by measuring the time taken by your D3D/GL calls.

i'm using DX9.0c fixed function. looks like i might finally have a reason to fire up the profiler. but as i said i'm thinking more in terms of general approaches rather than a specific title, so i guess, technically, i still don't have anything to profile. i guess i'll need to try it both ways and see what happens. God! so much time in game development is spent on experimentation and R&D!

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Just wanted to add something that stuck out for me:

that was the very first thing i checked, but it requires shaders, and i'm trying to stick with fixed function for maximum compatibility.

I would strongly advice you against support the fixed function pipeline anymore, especially for the reason of "compatibility". What do you want to be compatible with? 15 years old graphics hardware? Outdated fixed function samples, where probably twice as much shader equivalent tutorials exist? I don't see any point in carrying on with the fixed function pipeline for any reason. Recent GPUs e.g. doesn't even have a fixed function pipeline in that way, they will probably just emulate them, so there likely isn't even any performance gain from this. As for compatibility, almost all relevant graphic chips support shaders.

Of course it is your choice, but I see fixed function as a waste of time and something that should only be used for beginners to learn the very basics, before going on to shaders. Especially if it keeps you from using techniques like instancing, this should be an alert sign!

Especially if it keeps you from using techniques like instancing, this should be an alert sign!

This is good advice - you should probably stay away from fixed function stuff unless you have a very specific reason to use it!

Especially if it keeps you from using techniques like instancing, this should be an alert sign!

This is good advice - you should probably stay away from fixed function stuff unless you have a very specific reason to use it!

is there boilerplate shader code available that implements basic fixed function capabilities (aniso mipmaps, gouraud and phong) ?

i could use that to quickly convert to programmable then implement instancing. i could really use it to draw all these bushes and rocks and plants and such for caveman.

when MS bought rend386, i was forced to write my own perspective correct texture mapped poly engine.

i've also written assembly blitters for sprite engines that did mirror, zoom, and rotate simultaneously in real time.

but i don't relish the thought of having to twiddle xyz's, uv's, and rgb's. all i want is 1000 rocks on the screen ! <g>.

then again, it would allow me to write a shader that did mipmapped sprite textures without blending the background into the edges. i can't believe MS released directx with that basic incompatibility between their color key transparency / alpha test system and their mip filtering system. then you could actually do if alpha == 0 instead of if alpha < threshold, and it would work correctly.

by now i would have thought that the most common shader implementations would be widely available. while i haven't ever gone searching for some, i also haven't seen any posted anywhere.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

has anyone ever tried this?

I did, about 10 yrs ago on a GeForce 2 GTX for a top-down 3D scene consisting of walls/props/floor of a quad-grid based level.

1. Brute-Force : SetTexture per quad + VB/IB per quad

2. Single VB / IB for whole level, DIP per quad

3. Dynamic VB - Recreating Single VB per frame (sorting/copying objects in frustum)- upon camera change. DIP per texture

Slowest - Option 1

Faster - Option 2

Fastest - Option 3, since the VB does not really get recreated every frame, just every time new quads from the grid pop into frustum

Honestly, it took me a single afternoon to code 2 additional render methods so that I could switch between them at runtime upon a keypress. So, I propose you spend a little bit of an effort and do the same, it's really drop-dead easy and straightforward (just watch the pool / flags for VB / IB create/update - check the nVidia papers for that).

When benchmarking, make sure to switch off everything else in the engine. It is pointless to make these optimizations and then run them at full load at 12 fps and wonder why you can't see any difference - e.g. go for lowest resolution, no Vsync / AA / AF, no AI / Physics...

VladR My 3rd person action RPG on GreenLight: http://steamcommunity.com/sharedfiles/filedetails/?id=92951596

I did, about 10 yrs ago on a GeForce 2 GTX for a top-down 3D scene consisting of walls/props/floor of a quad-grid based level.
1. Brute-Force : SetTexture per quad + VB/IB per quad
2. Single VB / IB for whole level, DIP per quad
3. Dynamic VB - Recreating Single VB per frame (sorting/copying objects in frustum)- upon camera change. DIP per texture

Slowest - Option 1
Faster - Option 2
Fastest - Option 3, since the VB does not really get recreated every frame, just every time new quads from the grid pop into frustum

interesting.

i do most of my drawing by creating (outdoor) scenes from many small meshes (rock, plants, trees, etc). which is analogous to option 1.

guess its time to write some test code.

so we're talking the GPU's slower memory access of the dynamic VB and IB, vs. the additional quads of one big VB, vs. the API overhead of drawing individual quads.

and dynamic was still fastest eh?

sound like clip to frustrum and place in dynamic buffer may be the trick. thanks!

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

This topic is closed to new replies.

Advertisement