Foliage on Trees - Best Approach - Transforms or Batching ?

Started by
4 comments, last by PhillipHamlyn 11 years, 1 month ago

Hi,

I have a number of tree models cribbed from Sketchup8's model repository and have converted them through a Collada pre-processor. The tree models themselves are reasonably small and make extensive use of a simple set of foliage quads transformed using a local matrix to be placed all around the tree.

I realise that the total number of triangles to be drawn is the same in either case, but for performance, which would be the best approach to rendering my mesh ?

  1. Render each foliage quad using the model transform matrix for that "leaf", giving a large number of calls but a small vertex buffer.
  2. Pre-calculate the model position of each "leaf" using the model transform, and put the baked in vertex positions into a vertex buffer, and accept a small number of calls but a large vertex buffer.

Since the accepted wisdom is to batch; I would have though option 2 would be the best performance, but want to know if I'm missing a technique here.

Thanks

Phillip

Advertisement

Option 2, merge everything into one model since triangles cost almost nothing compared to draw calls when most engines have their bottleneck on the CPU to GPU communication over the motherboard. You should try reducing the detail level since 6 quads facing the camera usually look better than 10000 quads in random directions.

It really depends on how many leaves/trees you're going to draw.

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffers. The demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

3. Hardware instancing.

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.

It really depends on how many leaves/trees you're going to draw.

If you want to draw a large amount of trees, you're going to end up with enormous vertex buffers. The demands on video memory will grow rapidly with every baked tree you add, so unless you have a way of quickly streaming large chunks of mesh data in and out of video memory, your foliage will end up eating memory.

3. Hardware instancing.

Instanced leaves can be placed in smaller 16 bit vertex buffers accompanied by a texture (buffer) filled with transformation matrices. Combined they take up less memory, and can still be drawn in a single batch.

The drawback with static instance buffers is they make depth sorting harder and since all leaves carry the same vertex data, it's hard to bake local information like ambient occlusion.

Eppo,

Thanks for the response.

I draw each tree individually, but my concern was that some of the trees are >20k triangles of which there are only a small number of unique vertexes (say around 500) so the best way of arranging my mesh for speed of rendering is what I'm after. It seems both you and Dawoodoz recommend not calling the shader with a new transform matrix for each "leaf", so I'm happy I'm on the right track. I haven't looked into hardware instancing at all on my project; I noticed a few comments on GameDev about it being a somewhat old approach now, and GPU speed has outstripped the need for it. Can you give me reference to a url which demonstrates or discusses using texture buffers to pass transformation matrixes - I've not come across that example of hardware instancing before ?

Phillip

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same when it comes to basic instancing.

This page explains it well.

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.

I overlooked the fact that you're using xna, which means 9.3-style stream based instancing is your only option. This is an older method, but results are the same.

This page explains it well.

The technique is still relevant, as it is still desired to limit the number of draw calls, limit the amount of memory used and limit the amount of dynamic data uploaded per frame.

Now that; ... that is cool.

I hadn't realised that hardware instancing multiplies the number of calls to the shader by the number of instances in the instance buffer - I had mistakenly assumed that it was a method of merging two vertex buffers together (i.e. using two streams of data to create one combined vertex buffer). Don't know where I got that idea from. I will definitely pursue this.

Phillip

This topic is closed to new replies.

Advertisement