Jump to content
  • Advertisement
Sign in to follow this  
remigius

Is it possible to efficiently combine hardware instancing and culling?

This topic is 4522 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I was just reading up on occlusion culling and I was thinking on how I might combine that with my hardware instancing sample, but I don't see how that would be possible when rendering a lot (50.000+) of small meshes. Because I'm using instancing, I'd first need to come up with an efficient way to turn the rendering of specific objects on and off. Trivial as that is for normal rendering, it seems a big downside to instancing. For true (SM3) hardware instancing, I'd have to remove the instance data of the objects that shouldn't be rendered from the data vertex buffer. It shouldn't be too hard to implement, but it introduces a CPU load to a technique that's supposed to offload completely to the GPU. For shader instancing (GPU batch drawing) it would be a bit more efficient, since I can work directly on the instance data array. But I'd need to check the instance data entries one by one and pass the ones that should be rendered to the effect, so it'll still add some overhead. And even with that out of the way, the typical occlusion query is probably undoable (if I understand it correctly), since I can't render the bounding meshes for the 50.000+ instances one-by-one to perform the query, as doing that would remove the whole purpose of the instancing. I might be able to use a bounding mesh for performing the occlusion query for multiple instances at once, but it seems a bit of a messy workaround. The same problem applies to view frustrum culling. I think the large number of objects/instances makes it undesirable to check them all against the frustrum, so I'd have to go mess with octrees to get this to work efficiently. It may be doable, but I doubt it would improve the overall performance, considering the overhead of the culling technique AND the overhead of specifying which instances should be rendered. Does anyone have any experience with this problem? From what I can tell, I don't think culling techniques in general can improve the performance of instancing much when rendering a large number of small meshes (for a few large instanced models, I do see that it can be an improvement). Any insights would be very much appreciated, since I got a feeling I'm missing something here :)

Share this post


Link to post
Share on other sites
Advertisement
I think, that in any real-world scenario, you'd need to actually move the objects you're instancing. If you don't need them to move seperately, they might aswell just be one mesh.

So, in true hardware instancing (SM3), this means you'd be accessing the instance specific data every frame, in order to update it. Since you're already touching it all, you'd might as well just do away with the whole buffer, and just make a new one. This pretty much means, that in any real situation, you'd have a dynamic buffer for the instance specific data, that you lock with discard and refill completely each frame. If you're already filling it, you might as well just skip the object that are culled or occluded, and render a few less prims.

I don't think this would cause any slow-down at all. This could only improve performance over no culling.

As for shader instancing, the same applies. Like with true hardware instancing, you're gonna be reseting the instance registers each frame, to update their movement. If you're doing that, setting up a system to skip a certain object and add the next in it's place shouldn't be a problem.


For 50,000 objects, a quad/octree would be manditory. Also, I'd recommend you skip that last step - checking each object individually. If a tree node is in view, render all objects in it.

Hope this helps.

Share this post


Link to post
Share on other sites
Quote:
Original post by remigius
And even with that out of the way, the typical occlusion query is probably undoable (if I understand it correctly), since I can't render the bounding meshes for the 50.000+ instances one-by-one to perform the query, as doing that would remove the whole purpose of the instancing. I might be able to use a bounding mesh for performing the occlusion query for multiple instances at once, but it seems a bit of a messy workaround.

Yea, I would definitely recommend against using occlusion queries. I tried them a little while back, but it just didn't work out. Since you need to make so many DIP calls, it sucks up tons of performance. Like sirob said, an octree/quadtree would be a much better solution.

Quote:
I think, that in any real-world scenario, you'd need to actually move the objects you're instancing. If you don't need them to move seperately, they might aswell just be one mesh.

For dynamic objects (ie characters, anything that moves, ect...), this is a definite. However, for static objects (ie trees, houses, landscape geometry...), you probably aren't going to need to ever move them. However, since they are most likely going to be in very different parts of the map, you will want to perform some type of culling heuristic on them. If you group them together as the same mesh, this isn't possible.

Share this post


Link to post
Share on other sites
Thanks for the insights! I think I'll go with the octree for view frustrum culling and see how it turns out. I was a bit pessimistic about the performance hit of changing the instance data, but some testing shows that it is barely noticeable.

Quote:
For dynamic objects (ie characters, anything that moves, ect...), this is a definite. However, for static objects (ie trees, houses, landscape geometry...), you probably aren't going to need to ever move them. However, since they are most likely going to be in very different parts of the map, you will want to perform some type of culling heuristic on them. If you group them together as the same mesh, this isn't possible.


Yep, that's what I was thinking too. I still had the 'warehouse visualization' application in mind, where you could pull static positional data from some source as the instance data.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!