Storage vs Draw calls
#1 Members - Reputation: 319
Posted 02 March 2011 - 12:55 PM
I have a quad tree terrain. When I place meshes on that terrain, I assign them to the terrain quad tree they fall in and when I render the terrain I set the terrain nodes meshes as "visible" if the terrain node is visible itself. There is only one mesh of a single type and if it needs to be repeated, pointers to it are stored and a transformation matrix as well.
That's OK, but when rendering lots of small meshes like rocks, flowers, grass blades leads to many "draw calls" which brings my engine to it's knees. I'm thinking of combining all the terrain node meshes into a single vertex buffer transformed into world space and render them in a single draw call, but that would eat up more memory for the vertex data to be stored, compared to the "one mesh per kind" and rendering it in multiple places. Should I use instancing in such a situation ? Is there a additional overhead of the instancing itself ?
Which route to go ?
-save memory, transform and render a single mesh in several places using several draw calls
-make one big vertex buffer that eats up more memory and render them in one draw call.
-use a single mesh and use instancing.
I experimented a bit with it even and found a bit slow compared to the "transform and render" approach. My meshes are about 100-200 triangles, 100 or more per terrain chunk.
Thank you in advance.
#3 Members - Reputation: 4032
Posted 02 March 2011 - 02:24 PM
Regarding the storage vs draw calls question, storage is cheap and plentiful whereas performance isn't. Storage is a resource to be used, and if you're not using it then it's going to waste. So I always believe that - unless you're in a very specific situation where you know storage is limited (like on a mobile device) - in an age where 1GB GPUs are commonplace you should definitely be availing of storage in cases where doing so can increase performance. There's very little return on investment if you're scrimping and saving memory these days.
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
#4 Members - Reputation: 4750
Posted 03 March 2011 - 03:42 AM
I use a combination of several techniques:That's OK, but when rendering lots of small meshes like rocks, flowers, grass blades leads to many "draw calls" which brings my engine to it's knees. I'm thinking of combining all the terrain node meshes into a single vertex buffer transformed into world space and render them in a single draw call, but that would eat up more memory for the vertex data to be stored, compared to the "one mesh per kind" and rendering it in multiple places. Should I use instancing in such a situation ? Is there a additional overhead of the instancing itself ?
1. batching: my terrain is subdivided into tiles. All smaller meshes (grass, little stones ect.) are combined into a single mesh for each tile. When using a cache (i.e. LRU) you only need to hold N of this meshes in memory (N ~ 50-100).
2. instancing: all larger objects (trees, rocks etc.) are rendered using separated render calls (=> this could be optimized using geometry shaders).
My game: Gnoblins
Developer journal about Gnoblins
Small goodies: Simple alpha transparency in deferred shader
#5 Members - Reputation: 319
Posted 03 March 2011 - 10:08 AM
I don't want to use instancing for the large meshes like tree, because I already have quad-tree culling and to be able to cull the trees that do not occupy visible terrain patches I most probably need to lock/update the instance vertex buffer, which could be slow ? On the other side, I need to switch to a low res mesh as the distance to trees increase, and even switch to billboards. How can I have this LOD transitions and culling with an "instanced forest" ?
What about vertex texture fetch. Let me explain what I mean.
Make a mesh that covers a terrain patch. Let's say an array of small rock. When rendering, offset each vertex of that mesh to match the terrain bellow by sampling the terrain height texture in the vertex shader. Also sample a "distribute map" to see if rocks are allowed at that position( if it'a a road skip rocks etc). If rocks are not allowed at that position in the terrain patch, move the vertex bellow the ground in the vertex shader ?
Is this a good idea ? Is vertex texture fetch too slow on SM 3.0 ?
#6 Members - Reputation: 157
Posted 03 March 2011 - 11:16 AM
http://http.develope...gems3_ch02.html
It's used there for an animated crowd but the part involving the use of different sets of vertices for the LODs I think would translate to your problem.
#7 Members - Reputation: 146
Posted 03 March 2011 - 11:17 AM
...I most probably need to lock/update the instance vertex buffer, which could be slow ?
It isn't slow. We draw thousands of flora instances this way, each one individually culled against the view frustum. The time to build the instance buffer each frame is negligible. We did design the per-instance data to be as compact as possible to facilitate this, however.
#8 Members - Reputation: 416
Posted 03 March 2011 - 08:55 PM
Typically, when instancing you need an additional 16 * 4 =64 bytes of data per instance. So, if you have 1000 instanced trees that you want to draw each frame and you rebuild ONLY THE INSTANCE buffer, then its 64 * 1000 = 64,000 bytes of data you need to update and send to the video card each frame. If that hurts your frame rate in any way.. then you did it wrong
Now, if you use billboards, you can cut down to only 12 if they are camera facing, you just need to send the position instead of an entire matrix like above.. .
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
#9 Members - Reputation: 319
Posted 16 March 2011 - 11:43 AM
#10 Members - Reputation: 132
Posted 17 March 2011 - 10:16 AM
If you use a quaternion instead of a full matrix, that'd be 10 * 4 = 40 bytes per instance (position+rotation+scale). You could also use a 4x3 matrix instead of a 4x4. However, I'm unsure if losing alignment would impact performance.In support of the post above me. .. .
Typically, when instancing you need an additional 16 * 4 =64 bytes of data per instance. So, if you have 1000 instanced trees that you want to draw each frame and you rebuild ONLY THE INSTANCE buffer, then its 64 * 1000 = 64,000 bytes of data you need to update and send to the video card each frame. If that hurts your frame rate in any way.. then you did it wrong![]()
Now, if you use billboards, you can cut down to only 12 if they are camera facing, you just need to send the position instead of an entire matrix like above.. .






