Large amounts of scene objects.

Started by
12 comments, last by spinningcube 13 years, 1 month ago
I would like to achieve large thick forests that can be seen from quite a ways away. Not unlike Outerra. My scene is represented as a hierarchal scene graph (for static objects like vegetation the global transformation is only computed once). I have tried just adding a bunch of trees to my scene but the bottleneck is in the traversal of the scene graph (each node is culled by an axis aligned box and by distance). I have deepened the graph by making chunks of "ecosystems" that contain a large amount of trees so that large portions can be culled very quickly (similar result as an Octree would achieve) but performance is still bounded by the scene traversal, not rendering.

I am aware of how I can optimize the rendering (billboards, imposters, etc) but the rendering is not the bottleneck. Does anyone have any insight on dealing with the problem of queue large amounts of render operations to your renderer?
Advertisement
Spatial queries acceleration structure, like you mention, are what you need. A quadtree might be enough in your case.

I'd like some details about what you call scene traversal.

To me there are 3 "views" in a rendering engine:
-Hirerachical view (for transformations)
-Spatial view (for culling)
-Rendering view (for rendering)

Which one of those is your bottleneck ?
-* So many things to do, so little time to spend. *-
Tree structures can make your scene traversal slow if it means that each scene item is located randomly in memory with regards to the others.
Pitfalls of OOP has a good explanation of the impact of memory layout on these kinds of traversals.


Try laying all your structures out flat in memory so that when traversing you usually read data in a linear pattern. Also try to break code up into chunks that only work on a small bit of data, and do the same instructions as many times as possible. For example, culling all systems in one go (gather indices of visible ones), then using the results to traverse a tree array.struct Scene
{
const static int numEcosystems = 1024;
const static int numTreesPerSystem = 1024;
Aabb ecosystem[numEcosystems];
Tree drawables[numEcosystems*numTreesPerSystem];
};

............

int isVisible[numEcosystems];
int numVisible = 0;
for( int i=0; i != numEcosystems; ++i )
{
if( Cull( camera, ecosystem ) == false )
{
isVisible[numVisible++] = i;
}
}

for( int i=0; i != numVisible; ++i )
{
int systemIndex = isVisible;
for( Tree* tree = drawables[systemIndex*numTreesPerSystem], end=tree+numTreesPerSystem; tree != end; ++tree )
{
tree->Draw(...);
}
}
Quite a few engines use a spacial tree, like an octree to help cull objects that do not need to be drawn. However, the speed-up is similiar to an early-z. Most of the time, the earl-z pass does not produce a speedup. I tried implementing my own octree and I started to realize that this was expensive, doubling my draw calls per frame, adding the time spend on traversing the tree, and also the fact that having your objects seperated like that does not make logical sense. The reason it doesnt work well is because it breaks up your ability ( or makes it EXTREMLY difficult) to do instance calls and state changes as well. Some objects all use similiar textures, shaders, depth states, etc. When rendering using a tree you will end up having to do double work most of the time, while getting very little in return.

You can read my rant about this experience here http://nolimitsdesig...chical-culling/

I also have code available with my partial implementation.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
Thanks for the replies. I doubt that implementing a quadtree would improve the speed since currently it can cull large amounts of trees in a single bound check (ecosystems are large, and there are only about 200 in the scene and maybe 20 visible at a given moment). I can try marking the nodes that are in the frustum as being visible and traversing those later to queue the render operations. Here is my code:


void SceneNode::renderToQueue(RenderQueue* renderQueue, double frameDelta) {
if (!visible) {
return;
}

bool inFrustum = !bounded || scene->getActiveCamera()->containsBox(boundingBox);
bool inRange = cullDistance == 0.0 || scene->getActiveCamera()->getOrigin()->distanceFrom(getGlobalOrigin()) < cullDistance;

if (inFrustum && inRange) {

// Queue visible renderables to the render queue.
foreach (renderable, renderables) {
if ((*renderable)->isVisible()) {
(*renderable)->queueRenderOperation(renderQueue, globalTransformation);
}
}


// Queue all children nodes.
foreach (childNode, children) {
(*childNode)->renderToQueue(renderQueue, frameDelta);
}
}
}



The the children of a scene node are stored in an std::vector of pointers. Perhaps an std::vector of objects (rather than pointers) would be faster since all child nodes would be contiguous in memory? However it is depth-first so all children of a child node would be traversed before the next child.

Edit: Come to think of it, I think this should definitely be a breadth-first search anyway. I'm not sure why I decided to go depth first.
There was a recent presentation about a similar subject at GDC. You can find the slides here:

http://forums.electronicarts.co.uk/battlefield-3/1394275-daniel-collins-gdc-presentation-culling-battlefield.html

It might be a bit overkill, but it might also give you an idea or two on how to better handle large numbers of objects. I don't really understand a lot of it, and the first part of the presentation might be the only relevant part, but it is still an interesting read.

Quite a few engines use a spacial tree, like an octree to help cull objects that do not need to be drawn. However, the speed-up is similiar to an early-z. Most of the time, the earl-z pass does not produce a speedup. I tried implementing my own octree and I started to realize that this was expensive, doubling my draw calls per frame, adding the time spend on traversing the tree, and also the fact that having your objects seperated like that does not make logical sense. The reason it doesnt work well is because it breaks up your ability ( or makes it EXTREMLY difficult) to do instance calls and state changes as well. Some objects all use similiar textures, shaders, depth states, etc. When rendering using a tree you will end up having to do double work most of the time, while getting very little in return.

You can read my rant about this experience here http://nolimitsdesig...chical-culling/

I also have code available with my partial implementation.


Yep, that's what I thought when I needed decide which route to take : instancing or storage vs draw calls, unless for lots of similar objects that need to be repeated over and over. By the way for the occlusion culling you are using in your example, isn't the objects need to be sorted and rendered front to back ? If so, instancing would pretty much spoil that approach.
Personally I'm doing [color=#1C2837][size=2]hierarchal culling and occlusion culling at the moment. Occlusion culling is done per object, not per visible node.
Speed boost is visible when occlusion culling is enabled, so I'm happy, at least for my hardware and my particular scene. I think it all depends of the situation. For example if you need to make a big world, but relatively small object count is in the view frustum and in range at any given time, I think you need to use [color=#1C2837][size=2]hierarchal culling after all, because the number of total objects in the world would be to big to handle the individually. Maybe the most universal approach is to have the hierarchical tree(for example quad-tree)
[color="#1C2837"]and in every leaf have pointers to instance buffer for every unique object in that node. For example, if it's for a wood, store in the node cell 2 pointers. To instance data for the trees and for instance data bushes that fall in that node. Then traverse the tree, mark the nodes that are visible and in range, sort them back to front, and render them using occlusion culling and instancing. You would be culling one node against another not per objects in the node, but rendering a node using only 2 draw call via instancing should be fast.
[color="#1C2837"]I'm not sure if I'm not missing something and that's a good way to mix the 3 (quad-tree, occlusion culling, instancing) approaches though.


One obvious optimization area I can see in your code (that doesn't involve too much re-engineering) is this: if your tree is laid out correctly, then if any given node is in the frustum all of it's children are also guaranteed to be in the frustum. Likewise if a node is outside the frustum then all of it's children are also guaranteed to be outside of the frustum (and you can bypass a LOT of scene traversal with this one in particular). So the only case where you need to do a check vs frustum is if a node intersects the frustum; otherwise you can trivially accept or trivially reject a whole bunch of stuff.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


One obvious optimization area I can see in your code (that doesn't involve too much re-engineering) is this: if your tree is laid out correctly, then if any given node is in the frustum all of it's children are also guaranteed to be in the frustum. Likewise if a node is outside the frustum then all of it's children are also guaranteed to be outside of the frustum (and you can bypass a LOT of scene traversal with this one in particular). So the only case where you need to do a check vs frustum is if a node intersects the frustum; otherwise you can trivially accept or trivially reject a whole bunch of stuff.



In my code, when a node is outside the frustum (or out of the node's culling range) none of the children are tested. However I did not think of the case of not having to test children of a node that is completely contained in the frustum.


Thanks.
I ended up implemented an Octree to get a huge speedup. However, I still need to render many more objects than my scene graph can handle. I can render around 10,000 objects before the scene traversal starts to bottleneck rather than the rendering. So I will need to come up with another way to render trees on my terrain.

This topic is closed to new replies.

Advertisement