Jump to content

  • Log In with Google      Sign In   
  • Create Account

KaiserJohan

Member Since 08 Apr 2011
Offline Last Active Today, 09:21 AM

Posts I've Made

In Topic: my C++ console 3D fps Game on Windows from scratch

13 May 2016 - 02:16 AM

Awesome, would love to read some form of post-mortem, i.e unexpected difficulties, what went right/wrong and how you designed the software renderer.


In Topic: Data-oriented scene graph?

11 May 2016 - 08:00 AM

 

I'm trying to understand C++'s data-oriented design and, specially, how it applies to game engine development. My biggest question so far is about the implementation of a scene graph: how can you do that with data-oriented design? For me, scene graphs, and well... graphs, are naturally hierarchical, so OOD matches perfectly.

Most scene graphs are actually trees. What is a tree, at its most essential? It's a set of parent pointers. Nothing object-oriented about a flat array of parent pointers, now, is there?

^^That.

DoD means that you're not just thinking about what data you have, but are thinking about how it is read and written. How and why is it generated? How and why is it consumed? What's the data access patterns?
For example, given this graph of nodes, you can't even begin to apply DoD to this class, because we know nothing about how it is used:

class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
}

Instead of thinking in terms of objects, you need to think in terms of Inputs -> Processes -> Outputs.

 

e.g. One common operation in "scene graphs" is traversing the graph to find visible nodes - so, let's say our Node class is used like this:

class Node
{
  Aabb m_bounds;
  vector<Node*> m_children;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    if( m_bounds.Touches(in_camera) )
    {
      out_visible.push_back(this);
      for( int i=0, end=m_children.size(); i!=end; ++i )
        m_children[i]->Traverse( in_camera, out_visible );
    }
  }
}

Looking at this, we can see that the tree is traversed in a depth-first order, sometimes skipping over branches that are outside the frustum.
With that bit of knowledge, we now know how this data should be laid out in memory -- all nodes should be contiguously allocated, and they should be sorted in a topological depth first order, so that during the traversal, the CPU is linearly walking forwards through the allocation, and never randomly jumping backwards.
i.e. they should not be allocated individually with new :P

From there, you can start to completely restructure the data if you like.
Instead of every node storing a vector of child pointers (which not only is quite large, but is actually stored in a separate memory allocation elsewhere in RAM), we only really need to store two pointers! The two different conditions are which node to visit next if this node is visible, and which node to visit next if this node is hidden. For most nodes, next_if_visible is the offset to their first child - or if they have no children, it's the same value as next_if_hidden. For most nodes next_if_hidden is their next sibling - or if they have no more siblings, then it's their parent's next_if_hidden value - or if it's the root node, then it's 0 as a terminator value. Using those rules, we can represent the exact same parent/child relationships as in the original data, but implicitly instead of explicitly.
Let's say that in any one branch of the graph you've always got less than 65k objects in the graph -- so these "pointers" can actually be nice, compact 16bit offsets! Traversal can now look like:

class Node
{
  Aabb     bounds;
  uint16_t next_if_visible;
  uint16_t next_if_hidden;
  void Traverse( const Frustum& in_camera, vector<Node*>& out_visible )
  {
    bool visible = m_bounds.Touches(in_camera);
    if( visible )
      out_visible.push_back(this);
    u16 offset = visible ? next_if_visible : next_if_hidden;
    if( offset )
      (this + offset)->Traverse( in_camera, out_visible );
  }
}

Not counting the Aabb member, we've gone from a Node being stored in two allocations of 24bytes + 8bytes * numChildren to a Node being just 4bytes, while remaining functionally equivalent!
Now... I've made this particular consumer of the data happy, but may have complicated life for other consumers or producers of this data (producing this array of nodes is now much more complex than with the original code) -- so in practice it's a balancing act where you're trying to keep multiple different algorithms happy with their optimal data layouts.
In some cases, you may even end up splitting a particular "object" into two different structures, which it's cloned into at different points in time, in order to deliver optimal data to different consumer algorithms. You've got to think much more about how data flows from one system to another. You also often end up with code that's much more functional in style -- where instead of long-lived mutable objects, you end up with short-lived immutable objects.

 

 

However, before doing any of this, we first should've looked at the actual data transform itself -- is this the best way to traverse the scene to find visible nodes? Would bredth-first be better than depth-first? Could the node data be split / merged / moved / rearranged to support this operation better? What other systems will consume this visibility list, and what format would they like their data to be in? How can we utilize multiple CPU cores to accelerate the traversal? What data changes are required to support parallelism? Are AABB's and Frustums the best data structures to be using here? Does this operation even need to be hierarchical - could it be faster if it was done on a flat/dumb array? What process updates the AABB locations in the first place - where does that data come from? What's the optimal output format for that process?

--Make sure you're performing the right operations in the first place before massaging the data to fit those operations.

 

 

One of the issues I ran into while building a scenegraph like that was the pure index-based approach works great for static graphs such as a skeleton mesh but becomes a headache for dynamic graphs :(

 

I ended up building a graph structure ontop of the ID/indirection container Sean suggested here (http://gamedev.stackexchange.com/questions/33888/what-is-the-most-efficient-container-to-store-dynamic-game-objects-in) but perhaps there's a better way to do it.


In Topic: interactive tutorials are quests?

29 March 2016 - 04:20 AM

Some games also make an optional tutorial mission. What really turns me off is when important plot elements are baked into it which you miss if you skip it. XCOM 2 is the worst offender to date for this.


In Topic: Cascaded Shadow Map problem: seam between cascade

24 March 2016 - 02:07 AM

Have you tried making the cascades overlap slightly?


In Topic: some a* questions

22 March 2016 - 03:24 AM

Thanks for input. I found very large test maps. It's at http://movingai.com/benchmarks/dao/ if anyone else wants to know.


PARTNERS