Back to General and Gameplay Programming

Feedback on this Threading architecture

General and Gameplay Programming Programming

Started by Tangletail September 24, 2015 07:04 AM

11 comments, last by frob 8 years, 6 months ago

Tangletail

2,915

Author

September 24, 2015 07:04 AM

I've been racking my brain on implementing this threading architecture for my game engine. Research hasn't done much good. And... it's really going to be needed pretty soon. But I am curious in how exactly I should approach this? I thought about just diving on in... but I realize that implementing something like this without a good design at first is just a cause of frustration and headaches.Here is what I am currently thinking of. And pardon the crappy art work. MS Paint is not very good at 1:30 am in the morn.

To help add some context. The engine is being optimized for Diablo and Baulder's Gate style games.

There is only one Lua VM machine, and it's being driven on the main thread. Most of the loop code is handled here, as well as the game code.

My system will be using sort of a bastardized version of the ECS, implemented in Lua.

The game does all pre-updates first (updates where we have requested a raycast, User input, etc). Then logic updates. Animations. Physics. Then a multithreaded rendering.

This might seem like a bit of a naive approach. The center is the main thread. The branches are dispatches to worker threads.

PreUpdate Animation Physics

/ \ / \ / \

PreUpdateJobs- PreUpdate - UpdateLogic -DispatchAnimationSystem - Animation - - PhysicsDispatch - Physics - DeferredContextJobs - Render

| \ / | \ / \ /

| PreUpdate | Animation Physics

| V

-> Dispatch Immediate Data To other workers. Streaming Sounds, loading level data.

As you can see... UpdateLogic will not have any threaded jobs. I really couldn't be bothered to work out how to thread entities when they constantly ping each other for information. But just about everything else is threaded.

Any suggestions to this? A better architecture?

I am also really curious about the deferred context in Directx 11. There really is very little context on how to use it effectively in Microsoft's Documentation. And just about everyone else who uses it, does not really elaborate what they did with the renderer to make it effective. Does it need to be on a separate thread for it to be effective?

NightCreature83

5,061

September 24, 2015 08:31 AM

Can your renderer exist on a seperate thread so that that thread only has to deal with rendering transforms and feeding the GPU, whilst you update the game on the other thread?

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

Tangletail

2,915

Author

September 24, 2015 01:11 PM

Maybe? So I use two solid threads. One for the Main update, one for rendering, any others for jobs, and directx's deferred context? It won't be easy... most of my data exists on Lua. But the scene graph and rendering systems are implemented with C++

SmkViper

5,396

September 24, 2015 01:27 PM

One common technique is to "double-buffer" your data. Basically, you determine which data your renderer needs, make copies, and hand all the copies off to the renderer. Then the renderer does its thing with the copies while the rest of the game works on the originals in parallel.

This can even be extended to pretty much any other system. For example AI can easily use the position copies made for rendering to make all of its decisions for pathfinding and such for multiple objects in parallel.

Of course, for the above to all work, the copies must be const.

Then, once your frame is done, you take the results of all the work the other subsystems did, duplicate them, and hand them off to the renderer (and other systems) for the next frame.

Conceptually, you're never mutating data, you're just transforming it from state A (previous frame) to state B (new frame). And since no one writes to state A and no one reads from state B you don't have any threading issues or even any locks.

In practice, it may be infeasible to basically consume twice the amount of memory for everything, and it does introduce a frame of lag to user input, but you can mix and match this technique with shared and mutating data (with appropriate locks) to mitigate these issues.

Tangletail

2,915

Author

September 24, 2015 03:28 PM

Alright, so let me see if I understand then.

The renderer realistically only needs to know of...

Entity and Prop Positions
Entity Animations.

Shaders.
textures.

I would need to have three copies of data: Lua's data (Which is where ALL game details are applied. ECS, game logic, positional data, material information, etc. C does not have direct access to anything inside of lua), then the double buffer?

Why not just submit one buffer, and let the renderer use that. If we are just interpolating, then the renderer can continue to work with what it has. When lua is done with it's simulation, delete that, and replace it with new data?

frob

46,221

September 24, 2015 05:47 PM

I would need to have three copies of data: Lua's data (...), then the double buffer?

Why not just submit one buffer, and let the renderer use that. If we are just interpolating, then the renderer can continue to work with what it has. When lua is done with it's simulation, delete that, and replace it with new data?

Because it doesn't let you decouple your systems.

Any suggestions to this? A better architecture?

Fix your timestep, and decouple your renderer from your simulation.

Rendering and simulating should be completely separate actions. On smaller hobby games it usually doesn't matter much, but as systems grow these change. Simulation should be fixed at regular intervals to prevent an enormous number of bugs related to fast and slow simulation steps. Advance the simulator when you are ready, advance multiple times if you need to.

Bigger engines tend to render "as fast as possible", and tend to interpolate between the previous simulation state's time and the current simulation state's time, advancing through one simulator step in the past.

The buffered versions of data gives you the two snapshots for advancing the renderer. It tends to be a little more complex because you are interpolating through animations and other systems, and you'll want to catch some of the nuance that animators create.

For smaller systems and hobby games that much effort is likely too difficult, you're spending your valuable time writing more advanced engine technologies rather than creating your game. For a large group where there are several engine developers working along with the gameplay dev team it is a great system to build.

There are many great benefits of decoupling the two. Some of the more utilitarian are that you avoid bugs that come from variable timesteps. Probably the coolest is that you can run multiple headless simulations. Just tell a single computer to launch 20 or 50 or 100 simulations, and have the AI play against itself. If a problem is detected, attach a graphics rendering and log everything. Since automated tests are relatively rare in this industry, it serves as a nice way to exercise the system and detect hard-to-catch issues both in the simulator and in the AI.

SeanMiddleditch

17,596

September 24, 2015 06:58 PM

It's also pretty beneficial to split your main game logic into its own thread. The "main" thread (the one started by the OS) is the only one that can access the message pump on most OSes. It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX. When your app starts, spawn a new thread for all your gameplay logic, and keep only the core message pump and render command submission on the original main thread. Then spawn additional worker threads equal to NumCores-2.

So you'd have these threads:
Main/Render
Logic <-- Lua lives here
Worker0
...
WorkerN

Sean Middleditch – Game Systems Engineer – Join my team!

MattSutherlin

1,210

September 24, 2015 08:21 PM

It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX

I am super curious about this statement. As far as I'm aware, the only limitation on Present() is that it needs to be called on the ImmediateContext. Maybe I missed something, and you can fill in the gap?

My experience has been to spin up a thread solely for CommandList execution by the IC, and that thread is the only thing that touches the IC. You can legally touch the IC across threads as long as you follow basic thread safety concerns, but I've never found it to be worthwhile to do so. So, basically I'm saying exactly what Sean is saying, but I separate Main and Render into 2 threads and reduce worker thread count by an additional 1.

NightCreature83

5,061

September 24, 2015 11:20 PM

It's also the only one that can (safely) submit certain kinds of rendering commands, like the Present() call in DX

I am super curious about this statement. As far as I'm aware, the only limitation on Present() is that it needs to be called on the ImmediateContext. Maybe I missed something, and you can fill in the gap?

My experience has been to spin up a thread solely for CommandList execution by the IC, and that thread is the only thing that touches the IC. You can legally touch the IC across threads as long as you follow basic thread safety concerns, but I've never found it to be worthwhile to do so. So, basically I'm saying exactly what Sean is saying, but I separate Main and Render into 2 threads and reduce worker thread count by an additional 1.

No in DX present has to be called on the main thread, the reason is in the remarks of the Present call documentation here: https://msdn.microsoft.com/en-us/library/windows/desktop/bb174576%28v=vs.85%29.aspx It comes down to that it is possible that the Present call has to wait on the message pump to do its work so having them on the same thread solves the problems of unexpected stalls and waiting threads.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion