Jump to content

  • Log In with Google      Sign In   
  • Create Account

Storage vs Draw calls


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 solenoidz   Members   -  Reputation: 319

Like
0Likes
Like

Posted 02 March 2011 - 12:55 PM

Hi, I need an advice.
I have a quad tree terrain. When I place meshes on that terrain, I assign them to the terrain quad tree they fall in and when I render the terrain I set the terrain nodes meshes as "visible" if the terrain node is visible itself. There is only one mesh of a single type and if it needs to be repeated, pointers to it are stored and a transformation matrix as well.
That's OK, but when rendering lots of small meshes like rocks, flowers, grass blades leads to many "draw calls" which brings my engine to it's knees. I'm thinking of combining all the terrain node meshes into a single vertex buffer transformed into world space and render them in a single draw call, but that would eat up more memory for the vertex data to be stored, compared to the "one mesh per kind" and rendering it in multiple places. Should I use instancing in such a situation ? Is there a additional overhead of the instancing itself ?

Which route to go ?
-save memory, transform and render a single mesh in several places using several draw calls
-make one big vertex buffer that eats up more memory and render them in one draw call.
-use a single mesh and use instancing.

I experimented a bit with it even and found a bit slow compared to the "transform and render" approach. My meshes are about 100-200 triangles, 100 or more per terrain chunk.

Thank you in advance.

Sponsor:

#2 0xffffffff   Members   -  Reputation: 146

Like
0Likes
Like

Posted 02 March 2011 - 01:47 PM

Use instancing.

#3 mhagain   Members   -  Reputation: 4032

Like
1Likes
Like

Posted 02 March 2011 - 02:24 PM

There's always some additional overhead for instancing so you need to do a performance comparison between instanced and non-instanced and make an informed decision based on the result of that. However, based on your description it looks as though you can confidently use instancing here.

Regarding the storage vs draw calls question, storage is cheap and plentiful whereas performance isn't. Storage is a resource to be used, and if you're not using it then it's going to waste. So I always believe that - unless you're in a very specific situation where you know storage is limited (like on a mobile device) - in an age where 1GB GPUs are commonplace you should definitely be availing of storage in cases where doing so can increase performance. There's very little return on investment if you're scrimping and saving memory these days.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#4 Ashaman73   Members   -  Reputation: 4750

Like
1Likes
Like

Posted 03 March 2011 - 03:42 AM

That's OK, but when rendering lots of small meshes like rocks, flowers, grass blades leads to many "draw calls" which brings my engine to it's knees. I'm thinking of combining all the terrain node meshes into a single vertex buffer transformed into world space and render them in a single draw call, but that would eat up more memory for the vertex data to be stored, compared to the "one mesh per kind" and rendering it in multiple places. Should I use instancing in such a situation ? Is there a additional overhead of the instancing itself ?

I use a combination of several techniques:
1. batching: my terrain is subdivided into tiles. All smaller meshes (grass, little stones ect.) are combined into a single mesh for each tile. When using a cache (i.e. LRU) you only need to hold N of this meshes in memory (N ~ 50-100).
2. instancing: all larger objects (trees, rocks etc.) are rendered using separated render calls (=> this could be optimized using geometry shaders).

#5 solenoidz   Members   -  Reputation: 319

Like
0Likes
Like

Posted 03 March 2011 - 10:08 AM

Thank you people.
I don't want to use instancing for the large meshes like tree, because I already have quad-tree culling and to be able to cull the trees that do not occupy visible terrain patches I most probably need to lock/update the instance vertex buffer, which could be slow ? On the other side, I need to switch to a low res mesh as the distance to trees increase, and even switch to billboards. How can I have this LOD transitions and culling with an "instanced forest" ?

What about vertex texture fetch. Let me explain what I mean.
Make a mesh that covers a terrain patch. Let's say an array of small rock. When rendering, offset each vertex of that mesh to match the terrain bellow by sampling the terrain height texture in the vertex shader. Also sample a "distribute map" to see if rocks are allowed at that position( if it'a a road skip rocks etc). If rocks are not allowed at that position in the terrain patch, move the vertex bellow the ground in the vertex shader ?
Is this a good idea ? Is vertex texture fetch too slow on SM 3.0 ?

#6 BattleMetalChris   Members   -  Reputation: 157

Like
0Likes
Like

Posted 03 March 2011 - 11:16 AM

There's an article in GPU Gems 3 about using instancing with different models and LODS:

http://http.develope...gems3_ch02.html

It's used there for an animated crowd but the part involving the use of different sets of vertices for the LODs I think would translate to your problem.

#7 0xffffffff   Members   -  Reputation: 146

Like
0Likes
Like

Posted 03 March 2011 - 11:17 AM

...I most probably need to lock/update the instance vertex buffer, which could be slow ?


It isn't slow. We draw thousands of flora instances this way, each one individually culled against the view frustum. The time to build the instance buffer each frame is negligible. We did design the per-instance data to be as compact as possible to facilitate this, however.

#8 smasherprog   Members   -  Reputation: 416

Like
1Likes
Like

Posted 03 March 2011 - 08:55 PM

In support of the post above me. .. .

Typically, when instancing you need an additional 16 * 4 =64 bytes of data per instance. So, if you have 1000 instanced trees that you want to draw each frame and you rebuild ONLY THE INSTANCE buffer, then its 64 * 1000 = 64,000 bytes of data you need to update and send to the video card each frame. If that hurts your frame rate in any way.. then you did it wrong :P

Now, if you use billboards, you can cut down to only 12 if they are camera facing, you just need to send the position instead of an entire matrix like above.. .
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.

#9 solenoidz   Members   -  Reputation: 319

Like
0Likes
Like

Posted 16 March 2011 - 11:43 AM

Thank you people. Well, I do use billboards for distance representation of the trees, but I not use instancing right now, but draw a vertex buffer that contains only 3D positions and texture coordinates. All the 6 vertices are in the same position, and in the vertex shader I calculate right vector according to the camera eye position and "spread" them along right and up direction to form a camera facing billboard. I also do vertex texture fetch to do Y offsets, but that's a different story. Actually, I didn't mean to bump the thread, but I have a new question. Rigth now, I'm using a quad-tree to do frustum culling. I store pointers to the geometry in the quad-tree leaves. It's seems it does what it's supposed to do and I don't think it bottle-necks the rendering for the scenes I make. However, I'm thinking of using a regular grid to store objects instead. Make a fixed cell size grid and travers the visible nodes, using similar algorithm that's used for rasterization. Find the cells that are intersected by a line segment. I may need them view-depth sorted as weel, for occlusion culling. What do you think about this plain grid vs the quad-tree structure for view frustum culling a large world ? I heard, tree like structures traversal could become bottle neck for huge worlds.

#10 Manoel Balbino   Members   -  Reputation: 132

Like
0Likes
Like

Posted 17 March 2011 - 10:16 AM

In support of the post above me. .. .

Typically, when instancing you need an additional 16 * 4 =64 bytes of data per instance. So, if you have 1000 instanced trees that you want to draw each frame and you rebuild ONLY THE INSTANCE buffer, then its 64 * 1000 = 64,000 bytes of data you need to update and send to the video card each frame. If that hurts your frame rate in any way.. then you did it wrong :P

Now, if you use billboards, you can cut down to only 12 if they are camera facing, you just need to send the position instead of an entire matrix like above.. .

If you use a quaternion instead of a full matrix, that'd be 10 * 4 = 40 bytes per instance (position+rotation+scale). You could also use a 4x3 matrix instead of a 4x4. However, I'm unsure if losing alignment would impact performance.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS