Sign in to follow this  
PhillipHamlyn

Foliage on Trees - Best Approach - Transforms or Batching ?

Recommended Posts

PhillipHamlyn    579

Hi,

 

I have a number of tree models cribbed from Sketchup8's model repository and have converted them through a Collada pre-processor. The tree models themselves are reasonably small and make extensive use of a simple set of foliage quads transformed using a local matrix to be placed all around the tree.

 

I realise that the total number of triangles to be drawn is the same in either case, but for performance, which would be the best approach to rendering my mesh ?

 

  1. Render each foliage quad using the model transform matrix for that "leaf", giving a large number of calls but a small vertex buffer.
  2. Pre-calculate the model position of each "leaf" using the model transform, and put the baked in vertex positions into a vertex buffer, and accept a small number of calls but a large vertex buffer.

Since the accepted wisdom is to batch; I would have though option 2 would be the best performance, but want to know if I'm missing a technique here.

 

Thanks

 

Phillip

Edited by PhillipHamlyn

Share this post


Link to post
Share on other sites
Dawoodoz    461

Option 2, merge everything into one model since triangles cost almost nothing compared to draw calls when most engines have their bottleneck on the CPU to GPU communication over the motherboard. You should try reducing the detail level since 6 quads facing the camera usually look better than 10000 quads in random directions.

Edited by Dawoodoz

Share this post


Link to post
Share on other sites
eppo    4877

It really depends on how many leaves/trees you're going to draw.

 

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffersThe demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

 

3. Hardware instancing.

 

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

 

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.

Edited by eppo

Share this post


Link to post
Share on other sites
PhillipHamlyn    579

It really depends on how many leaves/trees you're going to draw.

 

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffersThe demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

 

3. Hardware instancing.

 

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

 

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.

 

Eppo,

 

Thanks for the response.

 

I draw each tree individually, but my concern was that some of the trees are >20k triangles of which there are only a small number of unique vertexes (say around 500) so the best way of arranging my mesh for speed of rendering is what I'm after. It seems both you and Dawoodoz recommend not calling the shader with a new transform matrix for each "leaf", so I'm happy I'm on the right track. I haven't looked into hardware instancing at all on my project; I noticed a few comments on GameDev about it being a somewhat old approach now, and GPU speed has outstripped the need for it. Can you give me reference to a url which demonstrates or discusses using texture buffers to pass transformation matrixes - I've not come across that example of hardware instancing before ?

 

Phillip

Share this post


Link to post
Share on other sites
eppo    4877

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same when it comes to basic instancing.

 

This page explains it well.

 

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.

Edited by eppo

Share this post


Link to post
Share on other sites
PhillipHamlyn    579

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same.

 

This page explains it well.

 

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.

 

Now that; ... that is cool.

 

I hadn't realised that hardware instancing multiplies the number of calls to the shader by the number of instances in the instance buffer - I had mistakenly assumed that it was a method of merging two vertex buffers together (i.e. using two streams of data to create one combined vertex buffer). Don't know where I got that idea from. I will definitely pursue this.

 

Phillip

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this