• Advertisement

Algorithm How to synchronize a Scene Graph?

Recommended Posts

Hello,

when I have multiple Threads, reading and writing to a scene graph, how do I synchronize data over several nodes?

I.e. when a character with a gun moves, the gun moves with him. A thread dedicated to calculating matrices of both objects might update the character first but before it is able to recalculate the gun's matrix the render thread is already drawing both. Inevitably this causes the character and the gun to be out of sync...

Now this doesn't only apply to the renderer but for the other threads, too.

Share this post


Link to post
Share on other sites
Advertisement

There are a number of ways to approach this problem, personally I like the most simplistic variation.  Basically there is a simple trick which can be used involving dependency chains and work ownership.  In each of the scene graph nodes add an int32_t or anything which can be used in std::atomic's.  Now, every frame, update a counter and then tell the threads about that update such that each thread will have a copy of the old and new value of the counter.  At this point the obvious item is to perform an atomic compare/exchange on the counters in the nodes, if it succeeds that means the node needs to be updated before any of it's data is used.  Of course there is a problem with this, if one thread succeeds in updating the counter and then starts updating the data in the node, other threads could use the data as it is being updated.  So, things get a little tricky though still pretty simple.

In order to fix the problem, reserve some value of the counter, I use -1 usually, which will represent a 'locked' state.  Now rather than the compare/exchange just updating to the new value, it updates to the locked state first.  It does the work needed to update the node and then stores the new counter value when done.  Adding a little spin wait on the locked state is trivial and while it may help to do fancy exponential backoffs and such on the spin, 'usually' it is a waste of time since contention in a fair sized scene graph is pretty minimal as long as there is a decent amount of work for each thread to perform.

For all intents and purposes, this approach turns your threads accessing the scene graph pretty much into a work stealing job system.  The reason for that description: say your render thread wants to render the gun, it will check the counter and if it updates it, then it knows it needs to do the matrix updates.  In order to do that it will track to the parent and check it's counter, if it swaps that also, it gets to perform the matrix updates for both items, i.e. stealing the work from your matrix update thread.  If the matrix update thread see's already updated or locked nodes, it just ignores them since obviously some other thread is busy updating them.  So long as all threads in the scene graph respect the update counter, dependencies get solved properly and everything is thread safe.

At the end of the day, this system works quite well and utilizes the threads reasonably.

Share this post


Link to post
Share on other sites

Why not just update the character transform and the children's transforms together?  Then have the update thread only iterate over the top level objects.  That way an object and all it's dependent child objects will be moved together, and the render thread will see them all move as one.  Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

Share this post


Link to post
Share on other sites
2 hours ago, 0r0d said:

Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

Believe it or not, the solution I suggested is exactly this, it just skips over all the multithreaded issues that you run into when implementing it.  Threading throws a serious wrench in the mix for many obvious and simple solutions.  For instance, the naive solution to implement your suggestion is a deadlock prone item in a threaded codebase.  Say the rendering thread starts to render the gun, it locks the gun, the thread decides it needs to lock the parent of the gun so it tries to lock that.  Unfortunately the other thread has the parent locked as it updates it and then it tries to lock the gun.  Oops, dreaded deadlock issue.

As I said, there are many possible solutions, even to the deadlock as described.  The problem is that many of the obvious solutions are worse than just giving up on threads and running things single threaded.  For a scene graph, the best solution is usually the eventual consistency model I describe simply because it avoids most cases of contention and doesn't need non-uniform work ownership rules.  Amdahl's law is unforgiving, the more complexity you put into deciding what work to perform the less you will be able to utilize the threads.  As such, what conceptually seems simple in this case is actually really hard to implement and eventually a single threaded approach would be better.

Share this post


Link to post
Share on other sites
6 hours ago, electrolysing said:

Hello,

when I have multiple Threads, reading and writing to a scene graph, how do I synchronize data over several nodes?

I.e. when a character with a gun moves, the gun moves with him. A thread dedicated to calculating matrices of both objects might update the character first but before it is able to recalculate the gun's matrix the render thread is already drawing both. Inevitably this causes the character and the gun to be out of sync...

Now this doesn't only apply to the renderer but for the other threads, too.

Don't. 

Call the old gun transform T[n]  and the new one T[n+1]. 

You're asking about a situation where you don't know whether a function will be operating on T[n] or T[n+1]. That situation means the architecture is invalid. It can happen in single-threaded code or multi-threaded code too, so threading isn't the problem. The problem is not planning out your data flow and using that as a requirement to inform the architecture. 

In a typical OO architecture, you either have the T[n] object available or the T[n+1] object available, but not both. So, find all the functions that need to read T[n] and place them earlier in the update order before the function that overwrites T[n] with T[n+1]. This "overwriting" function is dependent on these earlier functions completing, so in a multi-threaded architecture, that's a sync point (overwrite task may not begin until all read tasks have completed). 

For updating a transform hierarchy, often every node can update their local transform in parallel, and then all the world transforms calculated in a second parallel pass. For sending updates to a renderer thread, it should NOT be sampling from the scene structure while it's being updated. Again, it's your job to get the scheduling correct that that the renderer only consumes fully completed sets of transforms. You may also want to create a copy of these transforms so that the rendering thread can safely consume the copy at it's leisure while you continue with the next update. Often you also want to send interpolated transformation to the renderer too... 

Share this post


Link to post
Share on other sites
41 minutes ago, All8Up said:

Believe it or not, the solution I suggested is exactly this, it just skips over all the multithreaded issues that you run into when implementing it.  Threading throws a serious wrench in the mix for many obvious and simple solutions.  For instance, the naive solution to implement your suggestion is a deadlock prone item in a threaded codebase.  Say the rendering thread starts to render the gun, it locks the gun, the thread decides it needs to lock the parent of the gun so it tries to lock that.  Unfortunately the other thread has the parent locked as it updates it and then it tries to lock the gun.  Oops, dreaded deadlock issue.

 

I see that perhaps my reply wasnt entirely clear.  What I meant was to have a single lock/unlock for the parent, and the parent internally handles the update/render for itself and the children.   This the simplest solution to the parent/child synchronization that the OP was asking about.  Of course it wont solve all sorts of other dependencies and how to deal with them while multithreading, but since the OP asked about a very simple use case, I thought I'd start with a simple solution and see where it goes from there.  I'm also assuming a pretty shallow scenegraph where almost all objects are top-level parents and only their parts are children.  If the scenegraph is more complicated then this wont work.

Maybe I didnt properly understand how your solution works, but it seems to just be a lock/unlock system using atomics instead of mutexes (or whatever).  But you still have to lock (i.e. update your atomic to -1) an object before you proceed to the children, and that's not going to be atomic.  How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

Share this post


Link to post
Share on other sites
3 hours ago, 0r0d said:

But you still have to lock (i.e. update your atomic to -1) an object before you proceed to the children, and that's not going to be atomic.  How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

(I don't approve of this fine grained locking as mentioned in my post :P but) You solve deadlocks by enforcing a standard order for locking - e.g. Always lock parents before children. Unfortunately this boils down to having to lock the root node though, so you could have a different policy for the root's children, such as having to lock them in order of node-ID... Or just use that policy for all nodes. Make sure to enforce these policies with assertions in your dev builds though, or they'll certainly be broken ;)

Share this post


Link to post
Share on other sites
12 hours ago, All8Up said:

For all intents and purposes, this approach turns your threads accessing the scene graph pretty much into a work stealing job system.  The reason for that description: say your render thread wants to render the gun, it will check the counter and if it updates it, then it knows it needs to do the matrix updates.  In order to do that it will track to the parent and check it's counter, if it swaps that also, it gets to perform the matrix updates for both items, i.e. stealing the work from your matrix update thread.  If the matrix update thread see's already updated or locked nodes, it just ignores them since obviously some other thread is busy updating them.  So long as all threads in the scene graph respect the update counter, dependencies get solved properly and everything is thread safe.

This is an interesting approach. Reminds me of a worker thread based system.
I am wondering about the stability - in theory the render thread could potentially try to steal heavy loaded work and stalls its actual work.

 

12 hours ago, 0r0d said:

Why not just update the character transform and the children's transforms together?  Then have the update thread only iterate over the top level objects.  That way an object and all it's dependent child objects will be moved together, and the render thread will see them all move as one.  Of course I'm assuming you have some way to lock/unlock the data for the object transforms.  This lock just needs to encompass both the parent and child updates.

I would rather not trying to lock/unlock*. As I see it, that kind of defeats the purpose of Multithreading.
At least on a bigger scale (methods, objects), especially when locking the whole scene graph.

 

8 hours ago, Hodgman said:

This "overwriting" function is dependent on these earlier functions completing, so in a multi-threaded architecture, that's a sync point (overwrite task may not begin until all read tasks have completed). 

If I got that correctly (from all your answers), all threads should be synced aka running at the same hz rate? At first I was under the impression that all threads should be decoupled from each other, so that i.e. the physics engine can continue to run normally while another thread renders a very complex scene.

This would, at least, allow a double buffered approach.
- In pass n: All threads read from var1 and write to var2.
- In pass n+1: All threads read from var2 and write to var1.

Buffering data was what I was doing before (and right now) inside of an object, because I had control of all the data. When the calculations were ready, I made an atomic pointer swap to publish them. This approach doesn't work with variables scattered over multiple objects - at least none I could come up with.

However, syncing the threads after every loop could solve this problem.

If I am understanding you correctly, your solution would be to use two sync points/two passes(read&write):
[sync] -> [read and buffer needed data] -> [sync] -> [compute and write results] -> [repeat]

?

Share this post


Link to post
Share on other sites
8 hours ago, Hodgman said:

Don't. 

To be honest, this is probably the better point.  But, the OP suggested a functional threading model in the description.  But even with data flow controls, scene graphs are actually a problem in distribution systems and this counter approach usually ends up in both functional and job based solutions as part of the method allowing multiple threads to efficiently traverse the scene graph and not duplicate work.  You could do this with OS mutex and probably perform about the same, but I've found many uses for the counter approach in areas beyond simple update and rendering, not to mention the lower memory footprint.

The thing I see most often is folks pushing nodes into a job system and then each child is pushed as individual jobs.  In most cases I've seen that approach it definitely utilizes all the cores, unfortunately REALLY poorly.  The overhead is usually just horrible in such solutions and you'd be better off with a single threaded solution.  Tree structures are always difficult to get effective parallel utilization from.  The counter trick is actually one of the better ways to partition the tree's.  We haven't even touched on the fact that most scene graphs have to be traversed twice, once to update all the transforms and bounds at the leaves and then again to accumulate the bounds back up to the root.

8 hours ago, 0r0d said:

How does that not result in deadlocks?  If you have parent A and child B, the update thread "locks" A and the render thread "locks" B, then you'll get a deadlock when each tries to lock the other's parent/child.

That's actually a fairly simple item to solve with the approach suggested.  I did forget to mention it of course.  :)  Basically you have to choose which direction you intend to perform updates and follow that direction appropriately.  Let's say we choose depth first priority, if an item like the gun is locked by the renderer and it steps to it's parent to find it currently locked, it unlocks the gun and spin waits for the atomic to become 'updated'.  Any time something is locking children it assumes it will always eventually get the lock so it will wait.  Given that the contention is exceptionally rare in most cases, this doesn't hurt performance much and just works.

6 minutes ago, electrolysing said:

I am wondering about the stability - in theory the render thread could potentially try to steal heavy loaded work and stalls its actual work.

This is a matter of balance and purity.  If you want the render thread to do nothing but render things, you have to either remove it from scene graph traversal while others are updating the graph, or block when it runs into something which is not yet updated.  On the other hand, if you don't mind that this is turning all of your threads into a more generic 'worker' concept this approach is actually an optimization.  Consider that something has to do the work and the render thread would be blocked anyway, why not let it do the work to unblock itself?

But, going back to Hodman's point, the approach I'm suggesting is not horrible but there are other solutions which will perform better.  Though, as I mentioned, I've ended up using this as part of those other solutions anyway so I don't think it would be completely wasted work.  Tagging nodes in tree's is a very common approach with parallel updates, it is common in generic concurrent tree types and many algorithm's which have complex traversals such as this.

Share this post


Link to post
Share on other sites
On 3.12.2017 at 4:11 PM, All8Up said:

To be honest, this is probably the better point.  But, the OP suggested a functional threading model in the description.

Well... is multithreading a bad choice for games?

Share this post


Link to post
Share on other sites
1 hour ago, electrolysing said:

Well... is multithreading a bad choice for games?

Definitely not, it's just the free threaded nature of how you described your implementation which is not the best way to go.  Typically in games you perform several steps in order rather than all at once.  In that manner you put 'all' threads to updating the scene graph and only when that completes do you let the render thread walk the graph to start rendering.  Or, more commonly, after the update then the cull pass is run which issues the rendering commands and the render thread picks them up.  Basically everything in that more common model is a 'push' towards worker threads rather than multiple items trying to process shared data.

Share this post


Link to post
Share on other sites

Having just watched this video, I think it would apply well to the discussion:
http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine

More so the last half of the video on their multithreading, multi frame system.  But honestly, you should try and go as wide as possible.  Do all physics, all animation, moving on after each stage is done.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By francoisdiy
      So I wrote a programming language called C-Lesh to program games for my game maker Platformisis. It is a scripting language which tiles into the JavaScript game engine via a memory mapper using memory mapped I/O. Currently, I am porting the language as a standalone interpreter to be able to run on the PC and possibly other devices excluding the phone. The interpreter is being written in C++ so for those of you who are C++ fans you can see the different components implemented. Some background of the language and how to program in C-Lesh can be found here:

      http://www.codeloader.net/readme.html
      As I program this thing I will post code from different components and explain.
    • By isu diss
      I'm trying to duplicate vertices using std::map to be used in a vertex buffer. I don't get the correct index buffer(myInds) or vertex buffer(myVerts). I can get the index array from FBX but it differs from what I get in the following std::map code. Any help is much appreciated.
      struct FBXVTX { XMFLOAT3 Position; XMFLOAT2 TextureCoord; XMFLOAT3 Normal; }; std::map< FBXVTX, int > myVertsMap; std::vector<FBXVTX> myVerts; std::vector<int> myInds; HRESULT FBXLoader::Open(HWND hWnd, char* Filename, bool UsePositionOnly) { HRESULT hr = S_OK; if (FBXM) { FBXIOS = FbxIOSettings::Create(FBXM, IOSROOT); FBXM->SetIOSettings(FBXIOS); FBXI = FbxImporter::Create(FBXM, ""); if (!(FBXI->Initialize(Filename, -1, FBXIOS))) { hr = E_FAIL; MessageBox(hWnd, (wchar_t*)FBXI->GetStatus().GetErrorString(), TEXT("ALM"), MB_OK); } FBXS = FbxScene::Create(FBXM, "REALMS"); if (!FBXS) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the scene"), TEXT("ALM"), MB_OK); } if (!(FBXI->Import(FBXS))) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to import fbx file content into the scene"), TEXT("ALM"), MB_OK); } FbxAxisSystem OurAxisSystem = FbxAxisSystem::DirectX; FbxAxisSystem SceneAxisSystem = FBXS->GetGlobalSettings().GetAxisSystem(); if(SceneAxisSystem != OurAxisSystem) { FbxAxisSystem::DirectX.ConvertScene(FBXS); } FbxSystemUnit SceneSystemUnit = FBXS->GetGlobalSettings().GetSystemUnit(); if( SceneSystemUnit.GetScaleFactor() != 1.0 ) { FbxSystemUnit::cm.ConvertScene( FBXS ); } if (FBXI) FBXI->Destroy(); FbxNode* MainNode = FBXS->GetRootNode(); int NumKids = MainNode->GetChildCount(); FbxNode* ChildNode = NULL; for (int i=0; i<NumKids; i++) { ChildNode = MainNode->GetChild(i); FbxNodeAttribute* NodeAttribute = ChildNode->GetNodeAttribute(); if (NodeAttribute->GetAttributeType() == FbxNodeAttribute::eMesh) { FbxMesh* Mesh = ChildNode->GetMesh(); if (UsePositionOnly) { NumVertices = Mesh->GetControlPointsCount();//number of vertices MyV = new XMFLOAT3[NumVertices]; for (DWORD j = 0; j < NumVertices; j++) { FbxVector4 Vertex = Mesh->GetControlPointAt(j);//Gets the control point at the specified index. MyV[j] = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); } NumIndices = Mesh->GetPolygonVertexCount();//number of indices MyI = (DWORD*)Mesh->GetPolygonVertices();//index array } else { FbxLayerElementArrayTemplate<FbxVector2>* uvVertices = NULL; Mesh->GetTextureUV(&uvVertices); int idx = 0; for (int i = 0; i < Mesh->GetPolygonCount(); i++)//polygon(=mostly triangle) count { for (int j = 0; j < Mesh->GetPolygonSize(i); j++)//retrieves number of vertices in a polygon { FBXVTX myVert; int p_index = 3*i+j; int t_index = Mesh->GetTextureUVIndex(i, j); FbxVector4 Vertex = Mesh->GetControlPointAt(p_index);//Gets the control point at the specified index. myVert.Position = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); FbxVector4 Normal; Mesh->GetPolygonVertexNormal(i, j, Normal); myVert.Normal = XMFLOAT3((float)Normal.mData[0], (float)Normal.mData[1], (float)Normal.mData[2]); FbxVector2 uv = uvVertices->GetAt(t_index); myVert.TextureCoord = XMFLOAT2((float)uv.mData[0], (float)uv.mData[1]); if ( myVertsMap.find( myVert ) != myVertsMap.end() ) myInds.push_back( myVertsMap[ myVert ]); else { myVertsMap.insert( std::pair<FBXVTX, int> (myVert, idx ) ); myVerts.push_back(myVert); myInds.push_back(idx); idx++; } } } } } } } else { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the FBX Manager"), TEXT("ALM"), MB_OK); } return hr; } bool operator < ( const FBXVTX &lValue, const FBXVTX &rValue) { if (lValue.Position.x != rValue.Position.x) return(lValue.Position.x < rValue.Position.x); if (lValue.Position.y != rValue.Position.y) return(lValue.Position.y < rValue.Position.y); if (lValue.Position.z != rValue.Position.z) return(lValue.Position.z < rValue.Position.z); if (lValue.TextureCoord.x != rValue.TextureCoord.x) return(lValue.TextureCoord.x < rValue.TextureCoord.x); if (lValue.TextureCoord.y != rValue.TextureCoord.y) return(lValue.TextureCoord.y < rValue.TextureCoord.y); if (lValue.Normal.x != rValue.Normal.x) return(lValue.Normal.x < rValue.Normal.x); if (lValue.Normal.y != rValue.Normal.y) return(lValue.Normal.y < rValue.Normal.y); return(lValue.Normal.z < rValue.Normal.z); }  
    • By Descent
      Wow what a wild game by GalaXa Games Entertainment Interactive. Play now... it's really fun but IF you have epilepsy then don't play. It does not feature flashing pictures, but there is lots of animated stuff that might get ya. Anyway, 4 levels, 2 endings, insane action, BY INFERNAL. Please play it, right nao! Also , nice midi music composed by me is in the game.
       
      demons.rar
    • By Stewie.G
      Hi,
       
      I've been trying to implement a basic gaussian blur using the gaussian formula, and here is what it looks like so far:
      float gaussian(float x, float sigma)
      {
          float pi = 3.14159;
          float sigma_square = sigma * sigma;
          float a = 1 / sqrt(2 * pi*sigma_square);
          float b = exp(-((x*x) / (2 * sigma_square)));
          return a * b;
      }
      My problem is that I don't quite know what sigma should be.
      It seems that if I provide a random value for sigma, weights in my kernel won't add up to 1.
      So I ended up calling my gaussian function with sigma == 1, which gives me weights adding up to 1, but also a very subtle blur.
      Here is what my kernel looks like with sigma == 1
              [0]    0.0033238872995488885    
              [1]    0.023804742479357766    
              [2]    0.09713820127276819    
              [3]    0.22585307043511713    
              [4]    0.29920669915475656    
              [5]    0.22585307043511713    
              [6]    0.09713820127276819    
              [7]    0.023804742479357766    
              [8]    0.0033238872995488885    
       
      I would have liked it to be more "rounded" at the top, or a better spread instead of wasting [0], [1], [2] with values bellow 0.1.
      Based on my experiments, the key to this is to provide a different sigma, but if I do, my kernel values no longer adds up to 1, which results to a darker blur.
      I've found this post 
      ... which helped me a bit, but I am really confused with this the part where he divide sigma by 3.
      Can someone please explain how sigma works? How is it related to my kernel size, how can I balance my weights with different sigmas, ect...
       
      Thanks :-)
    • By MonterMan
      Hi all. I have been looking for a real-time global illumination algorithm to use in my game. I've found voxel cone tracing and I'm debating whether or not it's an algorithm worth investing my time researching and implementing. I have this doubt due to the following reasons:
      . I see a lot of people say it's really hard to implement.
      . Apparently this algorithm requires some Nvidia extension to work efficiently according to the original paper (I highly doubt it though)
      . Barely real-time performance, meaning it's too slow to be implemented in a game 
       
      So in order to determine if I should invest time in voxel cone tracing, I want to ask the following questions:
      . Is the algorithm itself flexible enough so that I can increase the performance by tweaking it (probably lowering the GI quality at the same time, but I don't care)
      . Can I implement it without any driver requirement or special extensions, like the paper claims?
  • Advertisement