Data-oriented scene graph?

Started by
8 comments, last by Digitalfragment 7 years, 11 months ago

I've been reading a lot on engine development lately, and it's making me question my code. Coming from C#/Java background, i'm used to OOD. I'm trying to understand C++'s data-oriented design and, specially, how it applies to game engine development. My biggest question so far is about the implementation of a scene graph: how can you do that with data-oriented design? For me, scene graphs, and well... graphs, are naturally hierarchical, so OOD matches perfectly.

Advertisement

So, I can't go into deep specifics in the time available to me now, and I probably am not the right person to do so anyway -- but in general, its not usually the case design patterns (of which the scene graph is one) survive a transformation from OOP to DOD. You end up with something that serves basically the same purpose, but the organization necessary to make the transformation renders it unrecognizable as the original thing. DOD is really a call to re-imagine a solution from first-principles, taking into account the machine's organization -- DOD is not a call to take your OOP patterns and architecture and just "reproject" them into this new paradigm.

But I'm speaking generally, and not to whether Scene Graph has a straight-forward-ish transformation specifically. My gut says that DOD and graphs of most kinds are at odds, and nothing immediately comes to mind when I try to imagine a system that is logically a graph, but employs serious DOD underneath, while still achieving the performance benefits one would expect of DOD. You can do relatively straight-forward things, maybe, like making sure your scene-graph nodes compose only "hot" data, and while that would be some benefit, that doesn't fundamentally change the graph representation.

That said, I'm not expert enough myself to believe that no one here is going to come along and disprove me :)

throw table_exception("(? ???)? ? ???");

And I'll add, nothing says you must DOD'ify everything in your program. If OOP suites a problem, and the difficulty-to-benefit ratio of DOD is unclear, and the OOP solution is not holding higher-level adoption of beneficial DOD applications, then it is probably wisest not to replace a principled OOP solution with a poor DOD one, just for DOD's sake.

throw table_exception("(? ???)? ? ???");

graphs, are naturally hierarchical, so OOD matches perfectly.

This is one of these things that people who are used to OO tend to *think*. It tends to be less true in practice. When all that you have is a hammer, everything looks like a nail, and so forth...

Most scene graphs are actually trees. What is a tree, at its most essential? It's a set of parent pointers. Nothing object-oriented about a flat array of parent pointers, now, is there?

edit: even if you actually have a graph, a graph is a set of nodes and a set of edges. The typical OO encoding of Node objects with outgoing edge pointers has about as much to do with "graphness" as any other representation.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

I'm trying to understand C++'s data-oriented design and, specially, how it applies to game engine development. My biggest question so far is about the implementation of a scene graph: how can you do that with data-oriented design? For me, scene graphs, and well... graphs, are naturally hierarchical, so OOD matches perfectly.

Most scene graphs are actually trees. What is a tree, at its most essential? It's a set of parent pointers. Nothing object-oriented about a flat array of parent pointers, now, is there?

^^That.

DoD means that you're not just thinking about what data you have, but are thinking about how it is read and written. How and why is it generated? How and why is it consumed? What's the data access patterns?
For example, given this graph of nodes, you can't even begin to apply DoD to this class, because we know nothing about how it is used:


class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
}

Instead of thinking in terms of objects, you need to think in terms of Inputs -> Processes -> Outputs.

e.g. One common operation in "scene graphs" is traversing the graph to find visible nodes - so, let's say our Node class is used like this:


class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    if( m_bounds.Touches(in_camera) )
    {
      out_visible.push_back(this);
      for( int i=0, end=m_children.size(); i!=end; ++i )
        m_children[i]->Traverse( in_camera, out_visible );
    }
  }
}

Looking at this, we can see that the tree is traversed in a depth-first order, sometimes skipping over branches that are outside the frustum.
With that bit of knowledge, we now know how this data should be laid out in memory -- all nodes should be contiguously allocated, and they should be sorted in a topological depth first order, so that during the traversal, the CPU is linearly walking forwards through the allocation, and never randomly jumping backwards.
i.e. they should not be allocated individually with new :P

From there, you can start to completely restructure the data if you like.
Instead of every node storing a vector of child pointers (which not only is quite large, but is actually stored in a separate memory allocation elsewhere in RAM), we only really need to store two pointers! The two different conditions are which node to visit next if this node is visible, and which node to visit next if this node is hidden. For most nodes, next_if_visible is the offset to their first child - or if they have no children, it's the same value as next_if_hidden. For most nodes next_if_hidden is their next sibling - or if they have no more siblings, then it's their parent's next_if_hidden value - or if it's the root node, then it's 0 as a terminator value. Using those rules, we can represent the exact same parent/child relationships as in the original data, but implicitly instead of explicitly.
Let's say that in any one branch of the graph you've always got less than 65k objects in the graph -- so these "pointers" can actually be nice, compact 16bit offsets! Traversal can now look like:


class Node
{
  Aabb     bounds;
  uint16_t next_if_visible;
  uint16_t next_if_hidden;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    bool visible = m_bounds.Touches(in_camera);
    if( visible )
      out_visible.push_back(this);
    u16 offset = visible ? next_if_visible : next_if_hidden;
    if( offset )
      (this + offset)->Traverse( in_camera, out_visible );
  }
}

Not counting the Aabb member, we've gone from a Node being stored in two allocations of 24bytes + 8bytes * numChildren to a Node being just 4bytes, while remaining functionally equivalent!
Now... I've made this particular consumer of the data happy, but may have complicated life for other consumers or producers of this data (producing this array of nodes is now much more complex than with the original code) -- so in practice it's a balancing act where you're trying to keep multiple different algorithms happy with their optimal data layouts.
In some cases, you may even end up splitting a particular "object" into two different structures, which it's cloned into at different points in time, in order to deliver optimal data to different consumer algorithms. You've got to think much more about how data flows from one system to another. You also often end up with code that's much more functional in style -- where instead of long-lived mutable objects, you end up with short-lived immutable objects.

However, before doing any of this, we first should've looked at the actual data transform itself -- is this the best way to traverse the scene to find visible nodes? Would bredth-first be better than depth-first? Could the node data be split / merged / moved / rearranged to support this operation better? What other systems will consume this visibility list, and what format would they like their data to be in? How can we utilize multiple CPU cores to accelerate the traversal? What data changes are required to support parallelism? Are AABB's and Frustums the best data structures to be using here? Does this operation even need to be hierarchical - could it be faster if it was done on a flat/dumb array? What process updates the AABB locations in the first place - where does that data come from? What's the optimal output format for that process?

--Make sure you're performing the right operations in the first place before massaging the data to fit those operations.

>> I'm trying to understand C++'s data-oriented design

data oriented design is an optimization method, and its a generic hardware oriented optimization method that is rather language neutral. language simply determines how easy or hard or impossible it is. so DOD is not exclusive to c++

>> and, specially, how it applies to game engine development.

when profiling indicates a bottleneck, and the best choice also so far isn't working out, a DOD analysis of the situation, as described in Hodgman's last paragraph is called for.

>> My biggest question so far is about the implementation of a scene graph: how can you do that with data-oriented design?

>> nothing immediately comes to mind when I try to imagine a system that is logically a graph, but employs serious DOD underneath

>> What is a tree, at its most essential? It's a set of parent pointers

an array of structs with a parent pointer field in each struct was the first and simplest thing that came to my mind.

OP: remember, OO is just one of many ways to organize code, and is not specific to just one language.. and DOD is just one of many ways to optimize code, and also is not specific to just one language. if you like OO, use it, if profiling indicates some part is too slow using standard OO type code, first research for standard optimizations used in that case. if none apply or work, then consider a DOD re-design of the data structures.. I used "lite" DOD principles to re-design my render queue for fast sorting in optimal draw order, and got fantastic results. all i did was start at the hardware end and determined what order the hardware wanted to see the data in. then i redesigned the data structures to essentially do a bucket sort of draw calls into that order (which was defined by the hardware).

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Data oriented design is thinking about what data you need and what algorithms you'll run it through, because that will be the deciding factor in how to laid it out.

Here's an exemple:

http://fr.slideshare.net/DICEStudio/culling-the-battlefield-data-oriented-design-in-practice

or rather:

http://fr.slideshare.net/DICEStudio/introduction-to-data-oriented-design?next_slideshow=1

http://fr.slideshare.net/EmanWebDev/pitfalls-of-object-oriented-programminggcap09?next_slideshow=1

-* So many things to do, so little time to spend. *-

I'm trying to understand C++'s data-oriented design and, specially, how it applies to game engine development. My biggest question so far is about the implementation of a scene graph: how can you do that with data-oriented design? For me, scene graphs, and well... graphs, are naturally hierarchical, so OOD matches perfectly.

Most scene graphs are actually trees. What is a tree, at its most essential? It's a set of parent pointers. Nothing object-oriented about a flat array of parent pointers, now, is there?

^^That.

DoD means that you're not just thinking about what data you have, but are thinking about how it is read and written. How and why is it generated? How and why is it consumed? What's the data access patterns?
For example, given this graph of nodes, you can't even begin to apply DoD to this class, because we know nothing about how it is used:


class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
}

Instead of thinking in terms of objects, you need to think in terms of Inputs -> Processes -> Outputs.

e.g. One common operation in "scene graphs" is traversing the graph to find visible nodes - so, let's say our Node class is used like this:


class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    if( m_bounds.Touches(in_camera) )
    {
      out_visible.push_back(this);
      for( int i=0, end=m_children.size(); i!=end; ++i )
        m_children[i]->Traverse( in_camera, out_visible );
    }
  }
}

Looking at this, we can see that the tree is traversed in a depth-first order, sometimes skipping over branches that are outside the frustum.
With that bit of knowledge, we now know how this data should be laid out in memory -- all nodes should be contiguously allocated, and they should be sorted in a topological depth first order, so that during the traversal, the CPU is linearly walking forwards through the allocation, and never randomly jumping backwards.
i.e. they should not be allocated individually with new :P

From there, you can start to completely restructure the data if you like.
Instead of every node storing a vector of child pointers (which not only is quite large, but is actually stored in a separate memory allocation elsewhere in RAM), we only really need to store two pointers! The two different conditions are which node to visit next if this node is visible, and which node to visit next if this node is hidden. For most nodes, next_if_visible is the offset to their first child - or if they have no children, it's the same value as next_if_hidden. For most nodes next_if_hidden is their next sibling - or if they have no more siblings, then it's their parent's next_if_hidden value - or if it's the root node, then it's 0 as a terminator value. Using those rules, we can represent the exact same parent/child relationships as in the original data, but implicitly instead of explicitly.
Let's say that in any one branch of the graph you've always got less than 65k objects in the graph -- so these "pointers" can actually be nice, compact 16bit offsets! Traversal can now look like:


class Node
{
  Aabb     bounds;
  uint16_t next_if_visible;
  uint16_t next_if_hidden;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    bool visible = m_bounds.Touches(in_camera);
    if( visible )
      out_visible.push_back(this);
    u16 offset = visible ? next_if_visible : next_if_hidden;
    if( offset )
      (this + offset)->Traverse( in_camera, out_visible );
  }
}

Not counting the Aabb member, we've gone from a Node being stored in two allocations of 24bytes + 8bytes * numChildren to a Node being just 4bytes, while remaining functionally equivalent!
Now... I've made this particular consumer of the data happy, but may have complicated life for other consumers or producers of this data (producing this array of nodes is now much more complex than with the original code) -- so in practice it's a balancing act where you're trying to keep multiple different algorithms happy with their optimal data layouts.
In some cases, you may even end up splitting a particular "object" into two different structures, which it's cloned into at different points in time, in order to deliver optimal data to different consumer algorithms. You've got to think much more about how data flows from one system to another. You also often end up with code that's much more functional in style -- where instead of long-lived mutable objects, you end up with short-lived immutable objects.

However, before doing any of this, we first should've looked at the actual data transform itself -- is this the best way to traverse the scene to find visible nodes? Would bredth-first be better than depth-first? Could the node data be split / merged / moved / rearranged to support this operation better? What other systems will consume this visibility list, and what format would they like their data to be in? How can we utilize multiple CPU cores to accelerate the traversal? What data changes are required to support parallelism? Are AABB's and Frustums the best data structures to be using here? Does this operation even need to be hierarchical - could it be faster if it was done on a flat/dumb array? What process updates the AABB locations in the first place - where does that data come from? What's the optimal output format for that process?

--Make sure you're performing the right operations in the first place before massaging the data to fit those operations.

One of the issues I ran into while building a scenegraph like that was the pure index-based approach works great for static graphs such as a skeleton mesh but becomes a headache for dynamic graphs :(

I ended up building a graph structure ontop of the ID/indirection container Sean suggested here (http://gamedev.stackexchange.com/questions/33888/what-is-the-most-efficient-container-to-store-dynamic-game-objects-in) but perhaps there's a better way to do it.

One of the issues I ran into while building a scenegraph like that was the pure index-based approach works great for static graphs such as a skeleton mesh but becomes a headache for dynamic graphs :(

Yeah, so that's an issue with data-orienting a scene graph. However, I think it's worth saying again, and again, and again: a scene graph is almost never what you actually want in this day and age.

You generally need a spatial structure to accelerate culling and other visibility queries, and a loose quad-tree or a spatial hash is much better for this than a hierarchical scene graph. You also generally want to attach one object to another (turret to a hardpoint, shield to a character's arm, etc), and for that a flat list of attachments can be equally effective as a hierarchy.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

One of the issues I ran into while building a scenegraph like that was the pure index-based approach works great for static graphs such as a skeleton mesh but becomes a headache for dynamic graphs :(

I ended up building a graph structure ontop of the ID/indirection container Sean suggested here (http://gamedev.stackexchange.com/questions/33888/what-is-the-most-efficient-container-to-store-dynamic-game-objects-in) but perhaps there's a better way to do it.

Something i've done before is that each object (typically static in its own design) is its own data-oriented array, but that array could be parented to any particular index in another object's data-oriented array. This allowed for the transform code to be parallelized by objects based on a very simple dependency model.

I took it a step further so that each bone in the array could be parented to a different bone in the parents objects array, but limited to having one parent array. That kept the dependency model simple but with enough flexibility to do for example, skinning-only transforms by having every bone in a skeleton parented to the matching bone in the animated skeleton.

EDIT: This also allowed for each of the flat arrays to be inserted into an acceleration structure under a single spatial hash, instead of having to test each node within it.

This topic is closed to new replies.

Advertisement