Algorithm How to synchronize a Scene Graph?

Recommended Posts

Hello,

when I have multiple Threads, reading and writing to a scene graph, how do I synchronize data over several nodes?

I.e. when a character with a gun moves, the gun moves with him. A thread dedicated to calculating matrices of both objects might update the character first but before it is able to recalculate the gun's matrix the render thread is already drawing both. Inevitably this causes the character and the gun to be out of sync...

Now this doesn't only apply to the renderer but for the other threads, too.

Share on other sites

There are a number of ways to approach this problem, personally I like the most simplistic variation.  Basically there is a simple trick which can be used involving dependency chains and work ownership.  In each of the scene graph nodes add an int32_t or anything which can be used in std::atomic's.  Now, every frame, update a counter and then tell the threads about that update such that each thread will have a copy of the old and new value of the counter.  At this point the obvious item is to perform an atomic compare/exchange on the counters in the nodes, if it succeeds that means the node needs to be updated before any of it's data is used.  Of course there is a problem with this, if one thread succeeds in updating the counter and then starts updating the data in the node, other threads could use the data as it is being updated.  So, things get a little tricky though still pretty simple.

In order to fix the problem, reserve some value of the counter, I use -1 usually, which will represent a 'locked' state.  Now rather than the compare/exchange just updating to the new value, it updates to the locked state first.  It does the work needed to update the node and then stores the new counter value when done.  Adding a little spin wait on the locked state is trivial and while it may help to do fancy exponential backoffs and such on the spin, 'usually' it is a waste of time since contention in a fair sized scene graph is pretty minimal as long as there is a decent amount of work for each thread to perform.

At the end of the day, this system works quite well and utilizes the threads reasonably.

Share on other sites

Why not just update the character transform and the children's transforms together?  Then have the update thread only iterate over the top level objects.  That way an object and all it's dependent child objects will be moved together, and the render thread will see them all move as one.  Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

Share on other sites
2 hours ago, 0r0d said:

Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

As I said, there are many possible solutions, even to the deadlock as described.  The problem is that many of the obvious solutions are worse than just giving up on threads and running things single threaded.  For a scene graph, the best solution is usually the eventual consistency model I describe simply because it avoids most cases of contention and doesn't need non-uniform work ownership rules.  Amdahl's law is unforgiving, the more complexity you put into deciding what work to perform the less you will be able to utilize the threads.  As such, what conceptually seems simple in this case is actually really hard to implement and eventually a single threaded approach would be better.

Share on other sites
6 hours ago, electrolysing said:

Hello,

when I have multiple Threads, reading and writing to a scene graph, how do I synchronize data over several nodes?

I.e. when a character with a gun moves, the gun moves with him. A thread dedicated to calculating matrices of both objects might update the character first but before it is able to recalculate the gun's matrix the render thread is already drawing both. Inevitably this causes the character and the gun to be out of sync...

Now this doesn't only apply to the renderer but for the other threads, too.

Don't.

Call the old gun transform T[n]  and the new one T[n+1].

You're asking about a situation where you don't know whether a function will be operating on T[n] or T[n+1]. That situation means the architecture is invalid. It can happen in single-threaded code or multi-threaded code too, so threading isn't the problem. The problem is not planning out your data flow and using that as a requirement to inform the architecture.

In a typical OO architecture, you either have the T[n] object available or the T[n+1] object available, but not both. So, find all the functions that need to read T[n] and place them earlier in the update order before the function that overwrites T[n] with T[n+1]. This "overwriting" function is dependent on these earlier functions completing, so in a multi-threaded architecture, that's a sync point (overwrite task may not begin until all read tasks have completed).

For updating a transform hierarchy, often every node can update their local transform in parallel, and then all the world transforms calculated in a second parallel pass. For sending updates to a renderer thread, it should NOT be sampling from the scene structure while it's being updated. Again, it's your job to get the scheduling correct that that the renderer only consumes fully completed sets of transforms. You may also want to create a copy of these transforms so that the rendering thread can safely consume the copy at it's leisure while you continue with the next update. Often you also want to send interpolated transformation to the renderer too...

Share on other sites
41 minutes ago, All8Up said:

I see that perhaps my reply wasnt entirely clear.  What I meant was to have a single lock/unlock for the parent, and the parent internally handles the update/render for itself and the children.   This the simplest solution to the parent/child synchronization that the OP was asking about.  Of course it wont solve all sorts of other dependencies and how to deal with them while multithreading, but since the OP asked about a very simple use case, I thought I'd start with a simple solution and see where it goes from there.  I'm also assuming a pretty shallow scenegraph where almost all objects are top-level parents and only their parts are children.  If the scenegraph is more complicated then this wont work.

Maybe I didnt properly understand how your solution works, but it seems to just be a lock/unlock system using atomics instead of mutexes (or whatever).  But you still have to lock (i.e. update your atomic to -1) an object before you proceed to the children, and that's not going to be atomic.  How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

Share on other sites
3 hours ago, 0r0d said:

But you still have to lock (i.e. update your atomic to -1) an object before you proceed to the children, and that's not going to be atomic.  How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

(I don't approve of this fine grained locking as mentioned in my post but) You solve deadlocks by enforcing a standard order for locking - e.g. Always lock parents before children. Unfortunately this boils down to having to lock the root node though, so you could have a different policy for the root's children, such as having to lock them in order of node-ID... Or just use that policy for all nodes. Make sure to enforce these policies with assertions in your dev builds though, or they'll certainly be broken

Share on other sites
12 hours ago, All8Up said:

This is an interesting approach. Reminds me of a worker thread based system.
I am wondering about the stability - in theory the render thread could potentially try to steal heavy loaded work and stalls its actual work.

12 hours ago, 0r0d said:

Why not just update the character transform and the children's transforms together?  Then have the update thread only iterate over the top level objects.  That way an object and all it's dependent child objects will be moved together, and the render thread will see them all move as one.  Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

I would rather not trying to lock/unlock*. As I see it, that kind of defeats the purpose of Multithreading.
At least on a bigger scale (methods, objects), especially when locking the whole scene graph.

8 hours ago, Hodgman said:

This "overwriting" function is dependent on these earlier functions completing, so in a multi-threaded architecture, that's a sync point (overwrite task may not begin until all read tasks have completed).

If I got that correctly (from all your answers), all threads should be synced aka running at the same hz rate? At first I was under the impression that all threads should be decoupled from each other, so that i.e. the physics engine can continue to run normally while another thread renders a very complex scene.

This would, at least, allow a double buffered approach.
- In pass n: All threads read from var1 and write to var2.
- In pass n+1: All threads read from var2 and write to var1.

Buffering data was what I was doing before (and right now) inside of an object, because I had control of all the data. When the calculations were ready, I made an atomic pointer swap to publish them. This approach doesn't work with variables scattered over multiple objects - at least none I could come up with.

However, syncing the threads after every loop could solve this problem.

If I am understanding you correctly, your solution would be to use two sync points/two passes(read&write):
[sync] -> [read and buffer needed data] -> [sync] -> [compute and write results] -> [repeat]

?

Share on other sites
8 hours ago, Hodgman said:

Don't.

To be honest, this is probably the better point.  But, the OP suggested a functional threading model in the description.  But even with data flow controls, scene graphs are actually a problem in distribution systems and this counter approach usually ends up in both functional and job based solutions as part of the method allowing multiple threads to efficiently traverse the scene graph and not duplicate work.  You could do this with OS mutex and probably perform about the same, but I've found many uses for the counter approach in areas beyond simple update and rendering, not to mention the lower memory footprint.

The thing I see most often is folks pushing nodes into a job system and then each child is pushed as individual jobs.  In most cases I've seen that approach it definitely utilizes all the cores, unfortunately REALLY poorly.  The overhead is usually just horrible in such solutions and you'd be better off with a single threaded solution.  Tree structures are always difficult to get effective parallel utilization from.  The counter trick is actually one of the better ways to partition the tree's.  We haven't even touched on the fact that most scene graphs have to be traversed twice, once to update all the transforms and bounds at the leaves and then again to accumulate the bounds back up to the root.

8 hours ago, 0r0d said:

How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

That's actually a fairly simple item to solve with the approach suggested.  I did forget to mention it of course.    Basically you have to choose which direction you intend to perform updates and follow that direction appropriately.  Let's say we choose depth first priority, if an item like the gun is locked by the renderer and it steps to it's parent to find it currently locked, it unlocks the gun and spin waits for the atomic to become 'updated'.  Any time something is locking children it assumes it will always eventually get the lock so it will wait.  Given that the contention is exceptionally rare in most cases, this doesn't hurt performance much and just works.

6 minutes ago, electrolysing said:

I am wondering about the stability - in theory the render thread could potentially try to steal heavy loaded work and stalls its actual work.

This is a matter of balance and purity.  If you want the render thread to do nothing but render things, you have to either remove it from scene graph traversal while others are updating the graph, or block when it runs into something which is not yet updated.  On the other hand, if you don't mind that this is turning all of your threads into a more generic 'worker' concept this approach is actually an optimization.  Consider that something has to do the work and the render thread would be blocked anyway, why not let it do the work to unblock itself?

But, going back to Hodman's point, the approach I'm suggesting is not horrible but there are other solutions which will perform better.  Though, as I mentioned, I've ended up using this as part of those other solutions anyway so I don't think it would be completely wasted work.  Tagging nodes in tree's is a very common approach with parallel updates, it is common in generic concurrent tree types and many algorithm's which have complex traversals such as this.

Share on other sites
On 3.12.2017 at 4:11 PM, All8Up said:

To be honest, this is probably the better point.  But, the OP suggested a functional threading model in the description.

Well... is multithreading a bad choice for games?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

1. 1
Rutin
35
2. 2
3. 3
4. 4
5. 5

• 12
• 14
• 9
• 9
• 14
• Forum Statistics

• Total Topics
633343
• Total Posts
3011430
• Who's Online (See full list)

There are no registered users currently online

×