Instancing for trees/foliage

Started by
8 comments, last by CC Ricers 11 years, 2 months ago

Hi,

I was thinking about creating trees for my game. I have created instancing but after I have created 300 trees it made everything very slow. After that problem I was thinking about mixing instancing with octrees, but I wonder whether it's an overkill. Would just multiple tree objects (without instancing) with the octrees be enough? It would let me stay away from the more complicated octrees with instanced data.

I am talking about bigger terrain, let's say 1/3 of the skyrim.

I know that I could check it by myself, but it will take me at least a few days to implement and I don't know if it's even woth it, so maybe some of you already know this?

Thanks for some advices.

Advertisement
Without profiling you won't know what the slow part is, so try to do that.

You should certainly be culling trees that are offscreen, but there's no way you would need an octree with a mere 300 so I doubt that's the sole solution. Once you have more, yes, you will need some sort of structure.

Instancing is useful, it saves memory and speeds up rendering, but it may not be worth the complexity (as long as you still share buffers and textures). You'll only find out by experimenting.

Thanks for the answer.

The 300 trees was just an example I calculate oveer 1k + some other objects like rocks etc.

Actually I wasn't asking if some method will give me 3 FPS more, but whether it could be a hard bottleneck - the multiple objects for trees without instancing, only with octree. I have never written any 3D game yet, so I don't even have any general view on it. I'm gonna experiment then with pure objects + octree.

a. Define "Slow"

b. What was the vertex count of those trees?

c. Do you use a heavy vertex shader?

d. Do you alpha blend or alpha test the leaves?

e. Do you also alpha blend/test the trunk?

f. Do you write your own custom depth? (you shouldn't be doing that)

g. Are you doing shadow maps? If so, are you using cascades?

h. Don't think in adding "3 fps" more, think in gaining "10 ms" more (milliseconds)

There was a massive lack of information in order to help you.

Uhm, well...

a. Just that feeling while moving mouse and it doesn't give you the fluency (lol).

b. Hmm, before I have posted the question I thought my tree has around 300 vertices that's why I was wondering why the game is not fluent with 300 trees, but actually it has 2800... I don't know how could I overlook this.

c. I don't think it's heavy. I calculate there instance data like position, scale and rotation + multiplying it by the primary matrices (world, view, projection).

d. Alpha blend with black&white map.

e. No.

f. What is that? I am passing depth to the pixel shader in order to calculate fog.

g. No shadows yet. Don't even know what the cascades are.

Sorry for the irritating questions/sentences, I am a beginner in game dev.

a. Just that feeling while moving mouse and it doesn't give you the fluency (lol).

Ugh, measure the framerate. Possibly in milliseconds & (less priority) in fps.

b. Hmm, before I have posted the question I thought my tree has around 300 vertices that's why I was wondering why the game is not fluent with 300 trees, but actually it has 2800... I don't know how could I overlook this.

2800v * 300 instances = 840.000 vertices per frame.
At 30 fps that's 25.200.000 vertices per second. That's still way below what a decent GPU can perform (i.e. a quick Google shows that the old Geforce 6600 can do 375 million vertices per second, divide that value by 4 to 10 because those are raw specs)

But of course, it's not the same if you're running in an Intel HD 3000 series than a GeForce 690 or Radeon HD 7990. What GPU do you have? (and what CPU btw.)

c. I don't think it's heavy. I calculate there instance data like position, scale and rotation + multiplying it by the primary matrices (world, view, projection).

Ok, fair enough. May be there's some inefficiency that has been unnoticed (i.e. I assume you pass the worldViewProj matrix, instead of passing the 3 matrices and concatenating them in the shader)

d. Alpha blend with black&white map.
e. No.

Gotcha! This is most likely the culprit. Alpha blending is expensive. It consumes a lot of bandwidth. Switch to alpha test. If you don't like the results, consider using alpha blending for the close trees, and alpha blending for the far trees.
Using CSAA for nice, smooth leaves while using alpha testing is also another possibility (though this topic is probably more advanced)

f. What is that? I am passing depth to the pixel shader in order to calculate fog.

Good, don't worry. Just checking. Btw. That feature allows you to alter from the depth in pixel shader that was going to be passed to Z Buffer that was calculated from the vertex shader. However this has severe performance impacts.
If you don't know what it is, then you're probably not using it.

g. No shadows yet. Don't even know what the cascades are.

Ok, just checking. Don't worry, you'll get there.

Cheers

Hey, thank you for your interest.

Ugh, measure the framerate. Possibly in milliseconds & (less priority) in fps.

My debug says that it's ~70 fps (and frame time is like 0.015 up to 0.02ms), but I cannot believe it... I'm sure the 60 fps make the game very fluent, but still I have feeling that it's not fluent in my game.

What GPU do you have? (and what CPU btw.)

It's GeForce 8800GT 512MB. CPU is Intel Quad Q6600 2.4 GHz.

I assume you pass the worldViewProj matrix, instead of passing the 3 matrices and concatenating them in the shader)

I pass to shader 3 matrices separately.

If the frame time is 15-20ms (you actually gave seconds above) then perhaps the lack of smoothness is due to inconsistency, especially if you have vsync turned on. You need to be at or under 16ms for a smooth 60fps.

The good news is you're not far off. Unfortunately I can't tell you what the slow part is. Could be GPU, could be draw calls, could be something else. Hence the advice to profile it.

Easy way to tell if alpha blending/overdraw is the culprit: halve the resolution and see if it makes a difference.

Remove the post, please.

If you are going to do arbitrary matrix manipulation for some instances, that's a good reason to switch to dynamic vertex buffers. Your second stream (instance) would contain the dynamic buffer.

For me, high polycount was the bottleneck because I was able to brute-force cull many thousands of instanced cubes no problem with >60 FPS. But trees are obviously more complex and my engine is not the same as yours so you would need your own approach to profiling it. Unless these objects are over 100k triangles each, instancing 30 of each usually shouldn't cause a problem.

Eventually I plan to go with impostors for rendering most of the trees, so you may consider that or simpler billboard sprites. Only the closest trees will be full meshes with instancing.

New game in progress: Project SeedWorld

My development blog: Electronic Meteor

This topic is closed to new replies.

Advertisement