Multi-threading in games (Boost and C++)

Started by
12 comments, last by All8Up 12 years, 6 months ago
I'd like to grow in my knowledge of computer programming (and game programming) and dive into the realm of multi-threading. I've looked online for many concepts and I understand how multi-threading works, but I do not know about the other concepts in multi-threading: i.e. Semaphores? Mutexes? Locked variables? Some of these things kind of go over my head.

Now, I'll learn these concepts eventually as I figure some things out, but my main question is how to use multi-threading in games. Primarily with the boost library. How should I render my game with two threads? Should one primarily be for simultaneous data ( reading object files run-time for instance without lag ) ?

Also, a question about locked threads (or variables). From what I've read, there are a lot of dangers involving locking and unlocking inappropriately. What should I look out for when doing this?

Basically, I have a lot of unclarity about using threads with C++ in games and I'd appreciate any feedback. :)
I'm that imaginary number in the parabola of life.
Advertisement
For the most part, you don't create 1 thread for rendering, 1 thread for physics, 1 thread for loading, etc. Generally, you want to break your game up into tasks. For example, loading something, rendering, physics, etc. can be thought of as "tasks" that need to be done. You write the program in such a way that you can have a thread pool, and then you take your tasks and throw them to the thread pool, more or less, and let the threads hammer away at them.

If you need to process task A before task B, just throw task A to the thread pool, have some kind of flag/callback/something to let you know when it's done, and then you can throw task B into the thread pool.

One reason you want to break it up into tasks is because you want to avoid locks and things like that (because a lock destroys the idea of multithreading, and the more you lock the less you multithread, plus it's easier to get into situations of deadlock). You'll want to try and separate things so two threads don't have to access the same data (as much as is reasonably possible). Try to break your tasks up so that each task has its own data and memory. One big reason for this is the fact that debugging multithreaded apps sucks. A lot. Because it's incredibly hard to track down which thread is accessing what and modifying what, and if two threads try to modify the same thing at the same time, it's hard to catch both of them at the same time in a debugging environment.

Basically, what I'm saying is try building some simple multithreaded programs that are task based. Split things up into tasks and throw them at a thread pool. Try to keep things as atomic as possible. Try writing a fractal renderer, for example, seeing as fractals are generally embarrassingly parallel. But since you specifically mentioned about games, take a simple game you'd like to make. Try and break it down into each of its components and see what you can split up into separate tasks. Then try and design a program that can process those tasks asynchronously (though some may have to be synchronous, so try and figure out which ones need what). Mind you, try and design as much of it as you can before you sit down to program. Multithreading applications takes a lot of thinking about exactly how things can be broken up, and when kind of potential problems can arise from computing any of the tasks you've come up with asynchronously. Follow some good tutorials and experiment. You'll get bitten in the butt plenty of times, but that's what happens in multithreading.

Sorry I can't tell you a lot of specifics. Things are different for different games. My best advice is to write a small game and split it up into tasks that you can compute in parallel. That's pretty much it, really.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
I recommend the following video tutorials:
C++11 Concurrency Series
http://www.corensic....ialPartOne.aspx
(the link leads to part one, but you can get to all the other parts from there)

You may also find "The Language of Concurrency" presentation of interest ("for a quick primer on some of the concepts and terminology used in this tutorial series"):
http://www.corensic....rencyVideo.aspx

Note: The above uses C++11 Threads, which you can use pretty much interchangeably with the Boost.Thread library (on which the Standard version is based) -- if you don't have a C++11 compiler yet, but have Boost, simply replace the relevant namespaces (as in: use boost::thread instead of std::thread, etc.), so far the syntax used has been identical.
There's a good list of 31 links to articles here: http://herbsutter.com/2010/07/12/effective-concurrency-prefer-using-active-objects-instead-of-naked-threads/
Found this online. Would thread-pooling really be this simple?

Clicky clicky

I feel like using threads like this is TOO simple. Anything that could be a problem with this?
I'm that imaginary number in the parabola of life.
I feel like using threads like this is TOO simple. Anything that could be a problem with this?
A thread pool by itself is very simple, my implementation of a thread pool wouldn't be more than a page of code.
The difficulty in using a simple thread pool is that you've got no guarantees that two scheduled jobs will not both try to acess the same data (giving you race conditions, etc).

As well as a thread pool, you at least need some way of expressing dependencies between jobs -- e.g. If job B uses the output of job A, then A must signal it's completion, and B must not begin work until that signal is given.

Usually a higher-level job system will be implemented on top of the thread pool to make this foolproof.

I'd like to grow in my knowledge of computer programming (and game programming) and dive into the realm of multi-threading. I've looked online for many concepts and I understand how multi-threading works, but I do not know about the other concepts in multi-threading: i.e. Semaphores? Mutexes? Locked variables? Some of these things kind of go over my head.

Now, I'll learn these concepts eventually as I figure some things out, but my main question is how to use multi-threading in games. Primarily with the boost library. How should I render my game with two threads? Should one primarily be for simultaneous data ( reading object files run-time for instance without lag ) ?

Also, a question about locked threads (or variables). From what I've read, there are a lot of dangers involving locking and unlocking inappropriately. What should I look out for when doing this?

Basically, I have a lot of unclarity about using threads with C++ in games and I'd appreciate any feedback. :)


Ignoring some other good comments and such (and a couple I'll respond to separately :)), using threads and using them well are very different things. So, I'll stick to the basic learning and using of threads side of things here. So first, when you decide to use a thread, have a good reason to do so, for instance; if you have to load large items off of disk regularly, this is a good start for doing threaded work. Resource loading is one of the first things all game programmers realize can and probably should be threaded. But it is no trivial task and a "good" system is a challenge at many levels. So, assume, for all intents and purposes you are writing a system which is supposed to be software rendered, this is the only way to keep a whole ton of issues out of this discussion for the time being.


I'll get into the boost bits later but to start with, assume you are going to do this the hard way by creating a thread to do loading and your main game thread will be posting load resource messages and "is loaded" requests to a resource thread. The critical bit here is that you don't ever want the main game thread to have to wait around, everything should be immediate or there is no reason to be using a thread. I.e. you basically want the ability to do the following:

int main( int argc, char** argv )
{
create resource thread
post "load xxx"
while( running )
if xxx loaded
display title screen
else
keep displaying black or better an animated swirly progress thingy as 30/60 fps
}

That simple little outline is all I would suggest you try to start with and then post the results back here or to some of the folks that replied to this directly. The basic reasoning is that even the above is easy to screw up. But, I'll give you an outline of what should be happening:

The resource thread should be a function which basically does the following:
  • Pull message off of work queue.
  • Process message.
  • Post a shared "complete" back to main thread.

The main thread should basically be doing the following:
  1. Create the resource thread.
  2. Post a "load" request.
  3. Check if the request is complete (non-blocking).
  4. switch mode if the request is complete to displaying the title screen.
  5. if not complete keep displaying something trivial to show the system is still running. Just keep changing the clear color or something.

Performing the above will introduce you to Mutex, probably event or condition variable depending on platform or perhaps some atomic operations depending on where your search about how to make the above work leads you.


Now, using boost makes it all very simple and you don't actually need to worry about the learning bits at all if you don't want. Just learn ASIO, yes it can be used for MANY things beyond IO/sockets etc.. I won't get into this side because I think you should learn the above first so you have at least some understanding of what is going on behind the scenes.

If you need some help figuring out where to start with the above, let me know.

For the most part, you don't create 1 thread for rendering, 1 thread for physics, 1 thread for loading, etc. Generally, you want to break your game up into tasks. For example, loading something, rendering, physics, etc. can be thought of as "tasks" that need to be done. You write the program in such a way that you can have a thread pool, and then you take your tasks and throw them to the thread pool, more or less, and let the threads hammer away at them.

I don't completely agree with this description. I most definitely separate several threads you mention to run by themselves and without trying to multi-thread them. The entire rendering driver (I.e. issuer of draw calls without any computation work) is absolutely single threaded even with DX10/11 as the API's are still too primitive/slow (API or driver, that's being argued still) to get any notable benefits from multi-threading. Additionally, if I want to support OpenGL, that makes the driver api incompatible for no good reason. I "do" go with a task system for certain things but for the best results I use a "single" thread to manage tasks as you call them and use all other threads for a team/swarm system. The task driven systems in games generally suck and end up either being massively kernel bound or bound by Amdahl's law to the point that you are getting limited benefit for the threading. Tasks imply a central queue which is bad news for threading.


If you need to process task A before task B, just throw task A to the thread pool, have some kind of flag/callback/something to let you know when it's done, and then you can throw task B into the thread pool.


... snip ...
it's all good rule of thumb but bad news for multi-core processing. A central queue kills all performance because even lockless can't get around the fact that it is the primary point of contention if you work everything as a task. You really have to divide you queues to get effective multicore and limit any locking such that it allows all items between locks to be processed without needing/causing additional locks.

I do use a central queue in my system but it is just a trampoline for the worker threads to enter dedicated processing. I.e. a task in my system gets "all" worker threads to enter it and exit it before the next task is performed. What I mean is that rarely do I enter something which is single threaded and does a single task into my system, the "task" is itself a work queue in 90+% of the cases expecting "x" number of threads to enter it and perform the actual processing. Assume 10 animated characters on screen, 2 blending animations per character average, the "task" to update all animations will have 20ish subtasks which have 0 interdependencies, so 4-8-16 threads enter the task, process the various updates and exit without a single lock. (Workers have a thread local index they can use to figure out what tasks to take.)

This is getting so beyond the initial question though, I just wanted to explain why I don't like the task description, it is the simple way which doesn't scale well in most cases. (I've seen task systems which "SLOW" overall processing, not saying you are describing such a system but it is easy to get bit by that.)
The entire rendering driver (I.e. issuer of draw calls without any computation work) is absolutely single threaded even with DX10/11 as the API's are still too primitive/slow (API or driver, that's being argued still) to get any notable benefits from multi-threading. Additionally, if I want to support OpenGL, that makes the driver api incompatible for no good reason.
FWIW, it is possible to make a 'task' based renderer, even on GL/DX9.
Yes, the actual submission of draw-calls is restricted to being done by a single thread, but after delegating all other rendering work to the 'task' system, our "rendering submission task" (i.e. the part that executes the draw-calls) ended up being about 1ms (which queues up ~32ms of GPU work). Dedicating an entire OS-thread to only 1ms work per frame is not an option for us, so it shares a thread with other tasks too.

Also, there's no need to dedicate an entire thread to background loading of files, unless you're using a custom compressed file-system (and use that thread for long-running decompression tasks). The native OS file-loading APIs are already asynchronous, and using them will be more efficient than using a thread to wrap the blocking file-loading API (which is a wrapper around the asynchronous file-loading API).

[quote name='AllEightUp' timestamp='1319416276' post='4876164']The entire rendering driver (I.e. issuer of draw calls without any computation work) is absolutely single threaded even with DX10/11 as the API's are still too primitive/slow (API or driver, that's being argued still) to get any notable benefits from multi-threading. Additionally, if I want to support OpenGL, that makes the driver api incompatible for no good reason.
FWIW, it is possible to make a 'task' based renderer, even on GL/DX9.
Yes, the actual submission of draw-calls is restricted to being done by a single thread, but after delegating all other rendering work to the 'task' system, our "rendering submission task" (i.e. the part that executes the draw-calls) ended up being about 1ms (which queues up ~32ms of GPU work). Dedicating an entire OS-thread to only 1ms work per frame is not an option for us, so it shares a thread with other tasks too.

Also, there's no need to dedicate an entire thread to background loading of files, unless you're using a custom compressed file-system (and use that thread for long-running decompression tasks). The native OS file-loading APIs are already asynchronous, and using them will be more efficient than using a thread to wrap the blocking file-loading API (which is a wrapper around the asynchronous file-loading API).
[/quote]

The task based rendering was split properly. A "thread" exists to submit the draw calls, be it shared with other things or not, I didn't mean to say exclusive. But generally I do just launch a thread instead of sharing because it executes as soon as possible on whatever processor is available at the time it wakes up. Given that my primary threading system will be thrashing the CPU's, it is best in my case to let the os schedule the rendering thread as a separate entity. Mixing other items into the tasks allows the driver side queue to empty all too often in my experience and wasting hardware capability. Having multiple threads is not a problem in a game, I run one heavy weight thread per logical core and those are the primary worker threads in my system. All other threads, io/rendering etc generally end up running on completely random cores in between my primary worker threads. This is intended and very desirable as the heavy threads are 100% for short bursts of time and usually one will finish a little before the others and be free for other work. At worst, an OS sync has to happen at frame flip to allow the worker threads some time to process.

A thread per specific task is still very viable even in a massively multicored system as it fills in the holes where the big system failed to be 100%. Given you can't drive disk IO multicore to make it faster, why bother with more than a single thread. And, given that it is generally blocking, why try to merge that into your primary threading system? And given all of this, if it makes sense to be a linear task accessing limited resources, why try to multithread it at all?

It is viable to run a game with 20+ threads on a 4 logical CPU box if only 4 threads are "MOSTLY" busy and all the others are just filling in the wait states. The OS is highly optimized for this sort of thing and as such I see no reason to avoid using single threads for a specific item unless you get into the 50+ thread range on a "4 logical core box", higher cores I might allow 100 before I worry about it but scale is all about balance and when/where to worry about it. IO can not, for the most part be multi-cored, so it's a stand alone thread in all my work. Submitting data to the rendering api, can't be multi-cored (well can be, but at a massive cost) so it is a single thread. Separating those things which make sense as multicore tasking and those which are message driven communication is much more important than any neat concept of "task driven" for the entire system.

This topic is closed to new replies.

Advertisement