• Advertisement
Sign in to follow this  

Multithreaded Engine Design?

This topic is 3765 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hiya, After looking at the feature lists for some high-end commercial game engines (Source, id Tech 4, CryEngine), I've noticed that most of them run their major subsytems on different threads. This has made me wonder about how you would go about doing this? I was under the impression that it's generally better to design an engine as a collection of high-level components (e.g. AudioSystem, SceneGraph) which can be used by themselves if needs be, and that an all-encompassing 'Engine' class is too restrictive. But how can the engine designers run different components on different threads, if they don't have this 'Engine' class to control them? I'm sure the engines above are very complex and don't easily fit into either of these designs, but could anyone give me some information? Any info would be much appreciated [smile]

Share this post


Link to post
Share on other sites
Advertisement
What's stopping you from creating compltely seperated AudioSystem and SceneGraph engines, then creating a higher Engine to contain them all?

For what it's worth, it's rarely ever advantageous to use multiple threads, barring networking and timer related tasks.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kest
For what it's worth, it's rarely ever advantageous to use multiple threads, barring networking and timer related tasks.


False. The Xbox 360 has 3 hyper-threaded cores, giving you access to 6 hardware threads. The PS3 has (IIRC) one hyper-threaded core and all those SPUs to work with. I don't even think you can buy a non-hyper-threaded single core PC any more, and dual and quad cores are becoming increasingly common, especially for gaming rigs.

By not taking advantage of threading, you are wasting vast amounts of CPU on modern systems, and if you're making a AAA quality game, you really can't afford that waste. In fact, I would say it's rarely ever advantageous not to use multiple threads any more (although, if you are not trying to make a AAA quality game, you may well be able to get away with a single-threaded architecture).

Share this post


Link to post
Share on other sites
Quote:
Original post by beebs1But how can the engine designers run different components on different threads, if they don't have this 'Engine' class to control them?

I'm not sure I see why this would be different. Obviously, somewhere you have some logic that controls, say, the audio component (initializes it, sends messages to it, tracks its lifetime). Why would it make a difference whether the audio component runs in its own thread or not?

That said, don't assume that commercial engines are particularly good examples of "good design".
Some of them might be, some of them are definitely not.

Share this post


Link to post
Share on other sites
Sooner or later, you will have to deal with the multi core trend, particularly because the individual cores are actually going to become slower (the experimental 80 core Intel for example runs each core at 400 MHz [that is MEGAhertz not GIGAhertz]) so unless you want to waste a ton of CPU power, your application better make use of concurrency.

The problem with concurrency however is that it's complex. Concurrent programming is fundamentally different from non-concurrent programming...even more so than most programmers generally acknowledge. It's not so much about avoiding race conditions and deadlocks, it's about not killing the benefits of concurrency with unnecessary locking and about scaling. If you write your engine in a fashion that your core subsystems run on separate threads, your app may scale to four cores but what happens if next year Intel releases a 16 core or a 32 core CPU. And what if those cores no longer run at 3 Ghz but at just 1 Ghz? Although that new CPU technically has more processing power than say a 3 GHz quad core, your app will actually perform worse (this is not at all an unrealistic szenario if you believe hardware makers).

I think the only way to write a multi-threaded engine that makes good use of the available processing power and scales to basically any number of cores with a near-linear performance gain is to have a job system. The way this works is to have a pool of threads (the number of threads should correspond to the number of cores) that are given jobs to process by a thread manager. This manager keeps a queue (preferably a lock-free queue) of incoming jobs and distributes them to the individual threads once they're available (i.e. they've completed a previous job). For this to work, the jobs need to be completely isolated which means they can't access shared data (this corresponds to the functional paradigm). So when for example you need to do simulations for a particle system, a job would consist of a function that does the simulation along with all the data required by the function (= the input).

Now of course there isn't enough to do in most game engines to efficiently occupy say 32 cores... this is when it can make sense to actually process several frames in advance even if that means you have to copy data to avoid having to share state.

The kind of concurrent design I just described is apparently employed by the latest version of the source engine (they call it "HybridThreading"). Other engines such as the CryEngine or the Unreal Engine seem to follow the more conventional "one thread per subsystem" model which scales very poorly even on todays hardware. Bioshock (an Unreal Engine game) for example doesn't seem to occupy more than two cores which is quite wasteful.

Share this post


Link to post
Share on other sites
@Kest: I recommend dropping your 1995 thinking and advance to 2007 when 90% of CPUs are multicore or have hyperthreading. Unless you're building an old school game or refuse to modernize there is no point in not learning multicore techniques now, rather then playing a game of major catch up in a years time.

Share this post


Link to post
Share on other sites
Well..... anyone have some tips links for where to get started with something like boos::thread or similar?

And also how hard is it to change a sizable program from single thread to multi thread?


Share this post


Link to post
Share on other sites
Quote:
Original post by vs322
And also how hard is it to change a sizable program from single thread to multi thread?

Depends. You could just spawn half a dozen threads doing nothing much, and technically your program would be multithreaded.
The hard part is actually splitting up the workload so it can be executed in parallel. With an existing singlethreaded codebase, that might be quite tricky.

The problem is to isolate each task so much that it doesn't depend on other parts of your program, and so other parts of your program don't depend on it. In short, side-effects are bad.

For adapting an existing program to run multithreaded, I'd suggest looking at OpenMP, which VS2k5 (and, I think, GCC) support. That just requires you to insert a couple of #pragma's in your code, and it'll be (somewhat) parallelized without you having to completely overhaul your existing code.

Share this post


Link to post
Share on other sites
Quote:
@Kest: I recommend dropping your 1995 thinking and advance to 2007 when 90% of CPUs are multicore or have hyperthreading. Unless you're building an old school game or refuse to modernize there is no point in not learning multicore techniques now, rather then playing a game of major catch up in a years time.


Major catch up, eh? I haven't seen a lot of concrete multicore designs or architectures discussed, I would say its more beneficial to work on other aspects of your game, and let mp development mature some. Since when did multicore architectures gaurantee a good game? I know tons of games that aren't multi threaded and are fun to play. Multi threading is great, but if you're counting on that to make your game awesome, I wish you luck.

Share this post


Link to post
Share on other sites
Wow - thank you all for your posts!

Quote:
Original post by Kest
What's stopping you from creating compltely seperated AudioSystem and SceneGraph engines, then creating a higher Engine to contain them all?

For what it's worth, it's rarely ever advantageous to use multiple threads, barring networking and timer related tasks.


While I disagree with the last part of your post, this is very interesting to me - thanks. I guess you could build seperate components, such as audio, scene management and networking, and compile them into DLL's. This way they can operate by themselves. Then you could build an 'engine' system, which loads the component DLLs, and can control them concurrently - it could do a lot more as well.

Another concept that has me interested lately is how the Source engine is an executable, and the application code is written in DLL's. I wonder how they do that - I don't know that there's a way of searching for functions inside an executable, so perhaps they use a lot of callbacks. I'd love to give this a try - can anyone elaborate? Although I suppose this sort of thng is covered by NDA though...

Thanks again for your input!

Share this post


Link to post
Share on other sites
When you create a dll you can explicitly specify what classes/functions will be available for public use through an export macro. The exe that makes use of the dll's import these exported functions/classes. If the dll contains a function that is not exported, other objects cannot get access to them.

Share this post


Link to post
Share on other sites
Quote:
Original post by beebs1
Another concept that has me interested lately is how the Source engine is an executable, and the application code is written in DLL's. I wonder how they do that - I don't know that there's a way of searching for functions inside an executable, so perhaps they use a lot of callbacks. I'd love to give this a try - can anyone elaborate? Although I suppose this sort of thng is covered by NDA though...


The only difference between an exe and a dll is their extension, you can expose functions in an executable as exported symbols which you can link to from a dll or another executable. Look up the Portable Executable file format in google.

Share this post


Link to post
Share on other sites
I've noticed a recurring theme in all of the "pro multi-core" arguments is the idea that you're wasting cpu if you're not using every core. But what about multitasking environments? Sure your game is the most important thing running on the computer at that time, but are you positive it's faster to occupy every core and let the system processes compete for CPU time with your game? I'm totally ignorant here so i'm not trying to say anyone's wrong, but it seems logical (to me) that using ALL of the cores or processors on a given system would actually hurt performance because there's more going on than just your game. Am I insane or what?

cheers
-Dan

Share this post


Link to post
Share on other sites
This is how my framework is set up:



Brief Overview:

CEngine manages the processes (which are not to be confused with actual OS processes. In this case they simply provide a term for a module of code. The CEngine class is the main loop of the application, it iterates over the processes and interacts with the interface of each one, in a controlled manner.

The process interface provides the following functions, among other less important ones:

Initialise()
Frame()
End()

RunThread()
Run()

The CEngine class (which is more of a kernel really) runs in three phases. First it initialises and calls Initialise() on each of the processes, then it runs calling Frame() and when it is shutdown it calls End() on each of the processes. The base class also provides threading capability. The processes constructor takes a bool which indicates whether the kernel (when initialising the process) should put it into a new thread. If the bool is true then it doess so.

As well as each thread running on its own, the Process it is associated with has an interface through which the thread is given things to do. I also have CriticalSection and Mutex classes which i use to serialise shared data.

I have tested this architecture on single core and multi core machines and it seems to work well so far.




Share this post


Link to post
Share on other sites
Quote:
Original post by Ademan555
...But what about multitasking environments? Sure your game is the most important thing running on the computer at that time, but are you positive it's faster to occupy every core and let the system processes compete for CPU time with your game?

Interesting point... Usually game's developers make the assumption that the user is going to quit all other applications before running the game, but that assumption isn't necessarily true.

I've found on my quad-core with older (single-threaded) games, I can actually let the virus scanner run in the background without actually hurting performance much at all.
If a game was assuming that it could use all cores equally, but I was actually using one core to run the virus scanner, then whatever the game runs on that particular core is going to run half as fast as the things that it runs on the other 3 cores...

Share this post


Link to post
Share on other sites
Quote:
Original post by Ademan555
I've noticed a recurring theme in all of the "pro multi-core" arguments is the idea that you're wasting cpu if you're not using every core. But what about multitasking environments? Sure your game is the most important thing running on the computer at that time, but are you positive it's faster to occupy every core and let the system processes compete for CPU time with your game? I'm totally ignorant here so i'm not trying to say anyone's wrong, but it seems logical (to me) that using ALL of the cores or processors on a given system would actually hurt performance because there's more going on than just your game. Am I insane or what?

cheers
-Dan


You CAN set the affinity of each thread to attempt to manage which of the N cores a thread runs on. However, i (along with msdn) take the view that the best you can really do is create the appropriate threads and let the OS do the rest of the work. Restricting a thread to a core can actually hurt performance since it is not allowed to go anywhere else should the OS need the core the thread is on.

In addendum, you suffer a performance loss on most hardware these days by restricting things to a single core, obviously. Atleast you have a chance to use more cores by making things multithreaded whether the OS acts appropriately or not is another matter.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
Interesting point... Usually game's developers make the assumption that the user is going to quit all other applications before running the game, but that assumption isn't necessarily true.

I've found on my quad-core with older (single-threaded) games, I can actually let the virus scanner run in the background without actually hurting performance much at all.
If a game was assuming that it could use all cores equally, but I was actually using one core to run the virus scanner, then whatever the game runs on that particular core is going to run half as fast as the things that it runs on the other 3 cores...


The OS can shuffle things between cores as and when it wants, mix in preemptive multitasking and while the game might well run slower everyone still gets a slice of the resource pie.

Share this post


Link to post
Share on other sites
Quote:
Original post by phantom
Quote:
Original post by Hodgman
Interesting point... Usually game's developers make the assumption that the user is going to quit all other applications before running the game, but that assumption isn't necessarily true.

I've found on my quad-core with older (single-threaded) games, I can actually let the virus scanner run in the background without actually hurting performance much at all.
If a game was assuming that it could use all cores equally, but I was actually using one core to run the virus scanner, then whatever the game runs on that particular core is going to run half as fast as the things that it runs on the other 3 cores...


The OS can shuffle things between cores as and when it wants, mix in preemptive multitasking and while the game might well run slower everyone still gets a slice of the resource pie.


Note that switching threads between cores is pretty slow, unfortunately.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ademan555
I've noticed a recurring theme in all of the "pro multi-core" arguments is the idea that you're wasting cpu if you're not using every core. But what about multitasking environments? Sure your game is the most important thing running on the computer at that time, but are you positive it's faster to occupy every core and let the system processes compete for CPU time with your game? I'm totally ignorant here so i'm not trying to say anyone's wrong, but it seems logical (to me) that using ALL of the cores or processors on a given system would actually hurt performance because there's more going on than just your game. Am I insane or what?

cheers
-Dan

The OS will handle it correctly. By using the other cores, you won't starve the other processes or the kernel for CPU time. The OS is smart enough to ration out CPU time. In any case, games are notoriously hard to parallelise, and it's very unlikely that you'll be able to fully utilise a modern quad core CPU. So the cycles you're not using will be allocated to all the other processes (which really usually only take about 1% CPU time).

Share this post


Link to post
Share on other sites
IMHO the best way to handle multithreading in games is a job based method. You divide your per-frame work up into individual jobs, preferably quite small, and keep them in a big list. A single job might be to complete pathfinding for an enemy, or scan the game objects for collisions. You then spawn as many worker threads as you've got CPU cores and each worker thread removes the next job from the list and goes to work on it. When it finishes it starts the next one, until eventually an entire frame has passed.

Jobs can have dependancies (ie. specific other jobs must be completed first) to allow for jobs to be chained together. Jobs may also spawn additional jobs (such as a collision checking job spawning jobs to handle collision resolution).

The advantage of this method is that it scales very well to any amount of cpus and load balances itself with very little effort. So unlike approaches where you statically assign tasks to threads (eg. thread 1 does physics, thread 2 does graphics, etc.) you get very little wasted cpu time. Task switching should also be minimal, and if your dependancies between jobs are good then it'll be more predictable than traditional multithreading.

The downside is that it's fairly different from the usual approach to writing games, so it'll take a while to get used to it. Plus you've still got the usual problems with memory read/write syncronisation.

Share this post


Link to post
Share on other sites

It comes down to identifying data dependancies and scheduling of access (what jobs can be run when...). Different data goes thru periods of updating and then can be used for read_only by all jobs/fibers running on one or more CPUs without costly locks. The update operation can be segregated (a single CPU working linearly thru the data set again without locks). Independant data sets can be updated each by a different CPU. Some data operations generate results which when finished are passed to a subsequent stage, with each stage potentially being parallelized (depending on the nature of the data operation).

AI can consume a huge amount of CPU (always more than you have) and may need to process broad sets of gamestate data, but also can be partially independant of the games render cycle. Some AI can then be used as Background processing to keep CPUs busy when the render cycle is stuck doing segregated operations.

The render Pipeline processing of course can be done in parallel once the positioning data is frozen for the current frame -- and other CPUS then get busy working out the game state for the next render. Independant Background tasks can fill in work for idle CPUs.


Job queues have to work within a context of a coarser scheduling pattern of data read/write subsets.

Share this post


Link to post
Share on other sites
There are a few components you can keep fairly seperate, such as net code, AI, game code, rendering and audio. Obviously there may be overlap but it is generally going to be minor. But once you start going beyond that high level representations of things and down into the low level - which is going to be important as cores increase - that where the headaches and profanities come into play.

Quote:
Original post by beebs1
Another concept that has me interested lately is how the Source engine is an executable, and the application code is written in DLL's. I wonder how they do that - I don't know that there's a way of searching for functions inside an executable, so perhaps they use a lot of callbacks. I'd love to give this a try - can anyone elaborate? Although I suppose this sort of thng is covered by NDA though...


Quake II, Half-Life (and I believe Source, I may be wrong), Quake III etc just have a struct with function pointers. The engine fills this struct in (there are a wide range of functions, such as rendering stuff, networking stuff, etc) and then pushes it down to the game DLL when it is initialised. The engine makes the game DLL "do stuff" through a single function which it sends messages to (this is in Quake III at least, and Quake II. Not sure about the rest).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement