Sign in to follow this  

Feedback for a tree structure design

This topic is 833 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I currently have a SceneGraph which resembles a tree of SceneNodes where each SceneNode has its own vector of children. The terminology might be off but you get the point. This adds alot of indirection and jumping around in memory when traversing the tree.
 
I'm thinking of an alternate approach where all SceneNodes are stored in a single container and the only information each SceneNode has is its own parent. Then if the relationships between the SceneNodes change (such as a new child is created or removed) the SceneGraph is marked as dirty.
Then once per frame, if SceneGraph is dirty another data entity traverses the tree and for each SceneNode constructs an Iterator that can be used for traversing all immediate children. 
Since the children will probably be spread out in the container the Iterator needs some additional metadata rather than just begin/end index to iterate correctly.
 
Optionally one could also let this new data entity reshuffle the SceneNodes in the container each time it is dirty so that all immediate children are contigously placed "infront" of their parent using memcpy() or similiar to move SceneNodes around. Since a SceneNode has no complex constructor/destructor (typically just a name, some world transform data and ID of its parent) it wouldn't be much of a problem.
 
As for storing the iterators, simply keeping a mirrored container and use the same index as the SceneNode in question to get its Iterator seems solid enough?
 
And so finally to traverse all immediate children of a SceneNode one just asks the SceneGraph for an Iterator for a particular SceneNode.
 
I'm mainly looking for some feedback. Is this a reasonable idea? Would the gain of memory coherence (plus nice decoupling of responsability compared to previous solution - each SceneNode is more "dumb") outweigh the cost of reconstructing the Iterator sets (possibly, but not probably) each frame? Is the memcpy option reasonable?
 
Any feedback is welcome

Share this post


Link to post
Share on other sites

Is there any reason that you have to traverse the scene in a particular order to begin with?

 

In my experience, the main use of this kind of tree is hierarchical transforms -- i.e. attachments and skeletons.

To implement those, you just need to ensure that the node-container is sorted such that parents always come before their children, and then do a linear iteration through the container to generate the world-transforms from the parent's world-transform and the current node's local transform. To make this fast, the process that performs this task should be able to access only the parent IDs, the (output) world matrices, and the local matrices.

After that task is complete, the scene can be traversed in any order at all for rendering purposes.

 

The last game that I worked on actually had multiple of these node-containers per scene (usually one per character), so that they could be parallelized even more effectively. After a character's scene nodes had been computed, it was possible for rendering commands for that character to be submitted, even if other parts of the scene were still being updated.

Edited by Hodgman

Share this post


Link to post
Share on other sites

Is there any reason that you have to traverse the scene in a particular order to begin with?

 

In my experience, the main use of this kind of tree is hierarchical transforms -- i.e. attachments and skeletons.

To implement those, you just need to ensure that the node-container is sorted such that parents always come before their children, and then do a linear iteration through the container to generate the world-transforms from the parent's world-transform and the current node's local transform. To make this fast, the process that performs this task should be able to access only the parent IDs, the (output) world matrices, and the local matrices.

After that task is complete, the scene can be traversed in any order at all for rendering purposes.

 

The last game that I worked on actually had multiple of these node-containers per scene (usually one per character), so that they could be parallelized even more effectively. After a character's scene nodes had been computed, it was possible for rendering commands for that character to be submitted, even if other parts of the scene were still being updated.

 

One of the reasons is for updating world transform as you noted once per frame.

 

The relationship between the SceneNodes have to be explicit somehow. What if I want to rotate all children of a particular SceneNode for example?

Share this post


Link to post
Share on other sites

In my experience, the main use of this kind of tree is hierarchical transforms -- i.e. attachments and skeletons.

To implement those, you just need to ensure that the node-container is sorted such that parents always come before their children, and then do a linear iteration through the container to generate the world-transforms from the parent's world-transform and the current node's local transform. To make this fast, the process that performs this task should be able to access only the parent IDs, the (output) world matrices, and the local matrices.

 

That be wasteful if only a few nodes needed to be updated. But then again if its once per frame maybe its not an issue...?

 

Also I've read the link where he suggests the swapping technique of swapping dirty elements and their children to the end of the array and then perform the update. That would fix the problem above.

My problem with this is that I use an ID-based container that maps an ID to a index and doing so invalidates all IDs which is not an option. Another problem with this ID-system is that child nodes may be fragmented in the container as the nodes cannot be moved to reaccomodate new children.

Share this post


Link to post
Share on other sites

My ID-based pool has a layer of indirection and each element stores index to the indirection-vector so I can swap them in memory (exactly to avoid fragmentation and do swap-with-last to remove).

 

If you delay the swapping etc. to the end of the frame or such, you might be able to use the element vector indices directly for temporary things (to avoid going through indirection layer).

Share this post


Link to post
Share on other sites

That be wasteful if only a few nodes needed to be updated. But then again if its once per frame maybe its not an issue...?

You can always use some kind of bit-field to keep track of which regions of the array are dirty, and then skip over coarse regions. It's then still a linear search, and can still be paralleled easily by splitting up the "coarse regions" between threads.
e.g. instead of having a dirty flag on every single node in the array, maybe you have a bit array where each bit indicates that 16 nodes are dirty. That lets you track the (coarse) dirty status of 2048 nodes with a single SSE __m128i register.
 
Also, the most important situation to optimize for in a game is the worse-case performance.
Say you're making a game where 100 animated characters charge at you at a time, and each of them has 100 bones -- that means your node hierarchy system has to track 10k nodes.
It's no point doing all sorts of fancy lazy-update / dirty-tracking optimizations if they don't actually help you in this worst case scenario.
Having all those enemies frustum culled (and then not updated) when they're off-screen is nice, and something you should probably do... but it's an absolutely useless optimization if your framerate is below-target when they are all on screen and un-cullable biggrin.png
A solid 30Hz framerate is much better that one that continually alternates between 60Hz and 15Hz, so just keep overall performance in mind when implementing techniques that only benefit certain use-cases. It also helps to have statistical data on what your typical usage situation is. e.g. One game might have 200 characters loaded but on average only 2 are visible, but another game might have 20 characters loaded and on average have 19 visible. The "correct" optimization strategies will be different for those two games.

My ID-based pool has a layer of indirection and each element stores index to the indirection-vector so I can swap them in memory (exactly to avoid fragmentation and do swap-with-last to remove).

Yeah I've seen this used quite a bit. It allows external systems to use simple integer IDs, while allowing the internal system to reorder items whenever it needs to. Often you want to compact your "active"/"dirty" items, so that your update loop has a nice solid, contiguous block of the array to process.

Edited by Hodgman

Share this post


Link to post
Share on other sites

This topic is 833 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this