Jump to content

  • Log In with Google      Sign In   
  • Create Account

Foliage on Trees - Best Approach - Transforms or Batching ?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 PhillipHamlyn   Members   -  Reputation: 454

Like
0Likes
Like

Posted 05 April 2013 - 04:36 AM

Hi,

 

I have a number of tree models cribbed from Sketchup8's model repository and have converted them through a Collada pre-processor. The tree models themselves are reasonably small and make extensive use of a simple set of foliage quads transformed using a local matrix to be placed all around the tree.

 

I realise that the total number of triangles to be drawn is the same in either case, but for performance, which would be the best approach to rendering my mesh ?

 

  1. Render each foliage quad using the model transform matrix for that "leaf", giving a large number of calls but a small vertex buffer.
  2. Pre-calculate the model position of each "leaf" using the model transform, and put the baked in vertex positions into a vertex buffer, and accept a small number of calls but a large vertex buffer.

Since the accepted wisdom is to batch; I would have though option 2 would be the best performance, but want to know if I'm missing a technique here.

 

Thanks

 

Phillip


Edited by PhillipHamlyn, 05 April 2013 - 04:36 AM.


Sponsor:

#2 Dawoodoz   Members   -  Reputation: 331

Like
0Likes
Like

Posted 05 April 2013 - 04:44 AM

Option 2, merge everything into one model since triangles cost almost nothing compared to draw calls when most engines have their bottleneck on the CPU to GPU communication over the motherboard. You should try reducing the detail level since 6 quads facing the camera usually look better than 10000 quads in random directions.


Edited by Dawoodoz, 05 April 2013 - 04:46 AM.

My open source DirectX 10/11 graphics engine. https://sites.google.com/site/dawoodoz

"My design pattern is the simplest to understand. Everyone else is just too stupid to understand it."


#3 eppo   Crossbones+   -  Reputation: 2629

Like
0Likes
Like

Posted 05 April 2013 - 07:41 AM

It really depends on how many leaves/trees you're going to draw.

 

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffersThe demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

 

3. Hardware instancing.

 

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

 

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.


Edited by eppo, 05 April 2013 - 07:45 AM.


#4 PhillipHamlyn   Members   -  Reputation: 454

Like
0Likes
Like

Posted 05 April 2013 - 08:20 AM

It really depends on how many leaves/trees you're going to draw.

 

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffersThe demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

 

3. Hardware instancing.

 

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

 

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.

 

Eppo,

 

Thanks for the response.

 

I draw each tree individually, but my concern was that some of the trees are >20k triangles of which there are only a small number of unique vertexes (say around 500) so the best way of arranging my mesh for speed of rendering is what I'm after. It seems both you and Dawoodoz recommend not calling the shader with a new transform matrix for each "leaf", so I'm happy I'm on the right track. I haven't looked into hardware instancing at all on my project; I noticed a few comments on GameDev about it being a somewhat old approach now, and GPU speed has outstripped the need for it. Can you give me reference to a url which demonstrates or discusses using texture buffers to pass transformation matrixes - I've not come across that example of hardware instancing before ?

 

Phillip



#5 eppo   Crossbones+   -  Reputation: 2629

Like
0Likes
Like

Posted 05 April 2013 - 09:06 AM

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same when it comes to basic instancing.

 

This page explains it well.

 

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.


Edited by eppo, 05 April 2013 - 09:18 AM.


#6 PhillipHamlyn   Members   -  Reputation: 454

Like
0Likes
Like

Posted 05 April 2013 - 09:19 AM

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same.

 

This page explains it well.

 

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.

 

Now that; ... that is cool.

 

I hadn't realised that hardware instancing multiplies the number of calls to the shader by the number of instances in the instance buffer - I had mistakenly assumed that it was a method of merging two vertex buffers together (i.e. using two streams of data to create one combined vertex buffer). Don't know where I got that idea from. I will definitely pursue this.

 

Phillip






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS