Sign in to follow this  

Feedback on this Threading architecture

This topic is 811 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been racking my brain on implementing this threading architecture for my game engine. Research hasn't done much good. And... it's really going to be needed pretty soon. But I am curious in how exactly I should approach this? I thought about just diving on in... but I realize that implementing something like this without a good design at first is just a cause of frustration and headaches.Here is what I am currently thinking of. And pardon the crappy art work. MS Paint is not very good at 1:30 am in the morn.

 

To help add some context. The engine is being optimized for Diablo and Baulder's Gate style games.

 

LN23czt.png

 

There is only one Lua VM machine, and it's being driven on the main thread. Most of the loop code is handled here, as well as the game code.

 

My system will be using sort of a bastardized version of the ECS, implemented in Lua.


The game does all pre-updates first (updates where we have requested a raycast, User input, etc). Then logic updates. Animations. Physics. Then a multithreaded rendering.

 

This might seem like a bit of a naive approach. The center is the main thread. The branches are dispatches to worker threads.

 

                        PreUpdate                                                                                          Animation                               Physics 

                      /                  \                                                                                     /                 \                          /             \

PreUpdateJobs-  PreUpdate - UpdateLogic -DispatchAnimationSystem - Animation -  - PhysicsDispatch - Physics - DeferredContextJobs - Render

         |            \                  /      |                                                                             \                /                          \             /

         |              PreUpdate         |                                                                                Animation                               Physics

         |                                     V                              

          ->        Dispatch Immediate Data To other workers. Streaming Sounds, loading level data.

 

As you can see... UpdateLogic will not have any threaded jobs. I really couldn't be bothered to work out how to thread entities when they constantly ping each other for information. But just about everything else is threaded.

 

Any suggestions to this? A better architecture?

 

I am also really curious about the deferred context in Directx 11. There really is very little context on how to use it effectively in Microsoft's Documentation. And just about everyone else who uses it, does not really elaborate what they did with the renderer to make it effective. Does it need to be on a separate thread for it to be effective?

Edited by Tangletail

Share this post


Link to post
Share on other sites

Maybe? So I use two solid threads. One for the Main update, one for rendering, any others for jobs, and directx's deferred context? It won't be easy... most of my data exists on Lua. But the scene graph and rendering systems are implemented with C++

Edited by Tangletail

Share this post


Link to post
Share on other sites
One common technique is to "double-buffer" your data. Basically, you determine which data your renderer needs, make copies, and hand all the copies off to the renderer. Then the renderer does its thing with the copies while the rest of the game works on the originals in parallel.

This can even be extended to pretty much any other system. For example AI can easily use the position copies made for rendering to make all of its decisions for pathfinding and such for multiple objects in parallel.

Of course, for the above to all work, the copies must be const.

Then, once your frame is done, you take the results of all the work the other subsystems did, duplicate them, and hand them off to the renderer (and other systems) for the next frame.

Conceptually, you're never mutating data, you're just transforming it from state A (previous frame) to state B (new frame). And since no one writes to state A and no one reads from state B you don't have any threading issues or even any locks.

In practice, it may be infeasible to basically consume twice the amount of memory for everything, and it does introduce a frame of lag to user input, but you can mix and match this technique with shared and mutating data (with appropriate locks) to mitigate these issues.

Share this post


Link to post
Share on other sites

Alright, so let me see if I understand then.

 

The renderer realistically only needs to know of...

Entity and Prop Positions
Entity Animations.

Shaders.
textures.

I would need to have three copies of data: Lua's data (Which is where ALL game details are applied. ECS, game logic, positional data, material information, etc. C does not have direct access to anything inside of lua), then the double buffer?

 

Why not just submit one buffer, and let the renderer use that. If we are just interpolating, then the renderer can continue to work with what it has. When lua is done with it's simulation, delete that, and replace it with new data?

Share this post


Link to post
Share on other sites

I would need to have three copies of data: Lua's data (...), then the double buffer?
 
Why not just submit one buffer, and let the renderer use that. If we are just interpolating, then the renderer can continue to work with what it has. When lua is done with it's simulation, delete that, and replace it with new data?

 

Because it doesn't let you decouple your systems.

 

 

 

Any suggestions to this? A better architecture?

 

Fix your timestep, and decouple your renderer from your simulation.

 

Rendering and simulating should be completely separate actions.  On smaller hobby games it usually doesn't matter much, but as systems grow these change.  Simulation should be fixed at regular intervals to prevent an enormous number of bugs related to fast and slow simulation steps. Advance the simulator when you are ready, advance multiple times if you need to.

 

Bigger engines tend to render "as fast as possible", and tend to interpolate between the previous simulation state's time and the current simulation state's time, advancing through one simulator step in the past.  

 

The buffered versions of data gives you the two snapshots for advancing the renderer. It tends to be a little more complex because you are interpolating through animations and other systems, and you'll want to catch some of the nuance that animators create.

 

 

For smaller systems and hobby games that much effort is likely too difficult, you're spending your valuable time writing more advanced engine technologies rather than creating your game.  For a large group where there are several engine developers working along with the gameplay dev team it is a great system to build.

 

There are many great benefits of decoupling the two.  Some of the more utilitarian are that you avoid bugs that come from variable timesteps. Probably the coolest is that you can run multiple headless simulations.  Just tell a single computer to launch 20 or 50 or 100 simulations, and have the AI play against itself. If a problem is detected, attach a graphics rendering and log everything. Since automated tests are relatively rare in this industry, it serves as a nice way to exercise the system and detect hard-to-catch issues both in the simulator and in the AI.

Share this post


Link to post
Share on other sites
It's also pretty beneficial to split your main game logic into its own thread. The "main" thread (the one started by the OS) is the only one that can access the message pump on most OSes. It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX. When your app starts, spawn a new thread for all your gameplay logic, and keep only the core message pump and render command submission on the original main thread. Then spawn additional worker threads equal to NumCores-2.

So you'd have these threads:
Main/Render
Logic <-- Lua lives here
Worker0
...
WorkerN

Share this post


Link to post
Share on other sites


It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX

 

I am super curious about this statement.  As far as I'm aware, the only limitation on Present() is that it needs to be called on the ImmediateContext.  Maybe I missed something, and you can fill in the gap?

 

My experience has been to spin up a thread solely for CommandList execution by the IC, and that thread is the only thing that touches the IC.  You can legally touch the IC across threads as long as you follow basic thread safety concerns, but I've never found it to be worthwhile to do so.  So, basically I'm saying exactly what Sean is saying, but I separate Main and Render into 2 threads and reduce worker thread count by an additional 1.

Share this post


Link to post
Share on other sites

 


It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX

 

I am super curious about this statement.  As far as I'm aware, the only limitation on Present() is that it needs to be called on the ImmediateContext.  Maybe I missed something, and you can fill in the gap?

 

My experience has been to spin up a thread solely for CommandList execution by the IC, and that thread is the only thing that touches the IC.  You can legally touch the IC across threads as long as you follow basic thread safety concerns, but I've never found it to be worthwhile to do so.  So, basically I'm saying exactly what Sean is saying, but I separate Main and Render into 2 threads and reduce worker thread count by an additional 1.

 

No in DX present has to be called on the main thread, the reason is in the remarks of the Present call documentation here: https://msdn.microsoft.com/en-us/library/windows/desktop/bb174576%28v=vs.85%29.aspx It comes down to that it is possible that the Present call has to wait on the message pump to do its work so having them on the same thread solves the problems of unexpected stalls and waiting threads.

Share this post


Link to post
Share on other sites

There is a link on that part of the website going there: https://msdn.microsoft.com/en-us/library/windows/desktop/bb205075%28v=vs.85%29.aspx#Multithread_Considerations

Seems you could present on another thread, if you are careful and the message thread never waits on anything (no critical sections)?

Share this post


Link to post
Share on other sites

Whoa... hold on. This is going over my head.

 

 

It's also pretty beneficial to split your main game logic into its own thread. The "main" thread (the one started by the OS) is the only one that can access the message pump on most OSes. It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX. When your app starts, spawn a new thread for all your gameplay logic, and keep only the core message pump and render command submission on the original main thread. Then spawn additional worker threads equal to NumCores-2.

So you'd have these threads:
Main/Render
Logic <-- Lua lives here
Worker0
...
WorkerN

 

First, why am I moving game logic to the second thread, and the renderer to the main thread? The main thread is always the entry point. Not to mention usually the most populated thread on windows. Throwing a renderer there seems like it'd be counter productive. Not to mention that this thread is also where most of your input is usually read from, and closing that thread with game logic could be horribly problematic.

 

 

 


I would need to have three copies of data: Lua's data (...), then the double buffer?
 
Why not just submit one buffer, and let the renderer use that. If we are just interpolating, then the renderer can continue to work with what it has. When lua is done with it's simulation, delete that, and replace it with new data?

 

Because it doesn't let you decouple your systems.

 

 

 

Any suggestions to this? A better architecture?

 

Fix your timestep, and decouple your renderer from your simulation.

 

Rendering and simulating should be completely separate actions.  On smaller hobby games it usually doesn't matter much, but as systems grow these change.  Simulation should be fixed at regular intervals to prevent an enormous number of bugs related to fast and slow simulation steps. Advance the simulator when you are ready, advance multiple times if you need to.

 

Bigger engines tend to render "as fast as possible", and tend to interpolate between the previous simulation state's time and the current simulation state's time, advancing through one simulator step in the past.  

 

The buffered versions of data gives you the two snapshots for advancing the renderer. It tends to be a little more complex because you are interpolating through animations and other systems, and you'll want to catch some of the nuance that animators create.

 

 

For smaller systems and hobby games that much effort is likely too difficult, you're spending your valuable time writing more advanced engine technologies rather than creating your game.  For a large group where there are several engine developers working along with the gameplay dev team it is a great system to build.

 

There are many great benefits of decoupling the two.  Some of the more utilitarian are that you avoid bugs that come from variable timesteps. Probably the coolest is that you can run multiple headless simulations.  Just tell a single computer to launch 20 or 50 or 100 simulations, and have the AI play against itself. If a problem is detected, attach a graphics rendering and log everything. Since automated tests are relatively rare in this industry, it serves as a nice way to exercise the system and detect hard-to-catch issues both in the simulator and in the AI.

 

 

I just realized I did something horribly stupid in design after seeing this. I have two data coordinates, that I am surprised hadn't caused any errors. The ones in Lua, and the Native coded Scene Graph which probably was not a good idea. I'll have to remove that bit then.Which brings us back to a swap buffer. Sounds more plausible now.

 

But, wouldn't fixing your time step on your simulation logic cause problems as well? Most processors are fast enough for simple systems. But if the world is highly player controlled, then wouldn't a massive simulation forever cause you a horrible delay? Lots of physics objects, lots of enemies getting aggro.

 

And lastly, Deferred context. If the render thread is constantly polling, wouldn't this cause issues with the job system, delaying updates from the logic thread? Or is it just better to wait and see if I need deferred context?

 

EDIT: Never mind I thought against Deferred Context.

Edited by Tangletail

Share this post


Link to post
Share on other sites

 

But, wouldn't fixing your time step on your simulation logic cause problems as well? Most processors are fast enough for simple systems. But if the world is highly player controlled, then wouldn't a massive simulation forever cause you a horrible delay? Lots of physics objects, lots of enemies getting aggro.

Nope, It decouples the rendering and update logic and makes all updates independant of the processing capabalities of the CPU.

 

You update objects based on a delta time( time between last update) and all the loop does is ensure that you are updating the data at the correct rat.

 

HTH

Share this post


Link to post
Share on other sites

This topic is 811 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this