• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Florian22222

Multithreading for Games

18 posts in this topic

Hi everyone!

Is multithreading nessecary for real time 3d games, because I´ve always done it in the way to call update and paint in a loop without threading.

So my question: Do I need to/should use multithreading or is it better without it?

Thanks in advance!
0

Share this post


Link to post
Share on other sites
At the very minimum, multithreading is essential for fast resource loading as otherwise you'll be stuck with linear processing, which has no chance of utilizing the already slow disk access speeds at full capacity. Once you get to the territory where your scenes require massive per-task processing (physics, multiple occlusion queries, complex sound processing, etc), multithreading becomes the only viable option to maintain a solid framerate at high quality. Generally speaking, the number of threads you have should be equal to the number of processors that are available, each one should be able to process any job (except for certain limitations, eg the draw pipeline being bound to a specific thread whom the context belongs to), your architecture should be conceptually parallel (state/job-based tasks where each task can be processed by any thread). This isn't that difficult to do actually - all you need is a boolean (job ready yet?) response system (no mutexes, critical sections etc.!) backed up by a main loop for job distribution.

As I noted above, fundamentally, for simpler games, if you have any respectable amount of disk access, you, at the very minimum, should have a disk access thread that processes all disk-read jobs and queues completed read jobs for another thread for parallel processing while it processes the next disk read task.

A flag-based response system is based on the concept that any one thread AND ONLY THAT THREAD that is responsible for processing a task within the scope of a particular job can [i]write[/i] into the memory space allocated for that particular task (parallel reads can still occur). Once the update completes, the thread sets the completion flag for the task. In the meantime, the main thread keeps polling all pending tasks for completion (by reading a boolean flag) and passes the task on to the next processing phase when the previous one completes. Consider three linearly dependent jobs that are distinct, but cannot be executed out of order:

TASK
job1: read file from disk (thread1)
job2: create alpha channel or whatever (thread2)
job3: copy texture to GPU (draw thread)

All of these steps can be parallelized for multiple load operations. The real benefit kicks in once you realize that job1 can do much faster reads since all reads are more likely to be sequential (other applications less likely to interrupt disk heads) and you're effectively making use of HDD access (which is the slowest link in the sequence) 90+% of the time, effectively winning the time spend on job2 and job3.

Any further parallelization really depends on your game and the complexity you need. Overall I think a task-based system is a simple and elegant solution that will scale automatically the more cores/threads the system has while still having the ability to fall back to a single core/thread if only one is available.
0

Share this post


Link to post
Share on other sites
[quote name='irreversible' timestamp='1328539670' post='4910156']
As I noted above, fundamentally, for simpler games, if you have any respectable amount of disk access, you, at the very minimum, should have a disk access thread that processes all disk-read jobs and queues completed read jobs for another thread for parallel processing while it processes the next disk read task.
[/quote]I'd say "for simpler games" there's no need for multithreading at all. Unless perhaps on Atoms, but even those are going to run "simpler games" at interactive rates, albeit with some sweat.
[quote name='swiftcoder' timestamp='1328540912' post='4910164']
Have you run into a performance wall with single-core, where you simply can't make your game run at interactive frame-rates?

If so, then you need to multi-thread your game.
[/quote]Seconded. Given today's processors, I hardly believe a beginner can saturate them. Of course one can always do things wrong in the first place.
1

Share this post


Link to post
Share on other sites
[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif][size=3][left][quote]Generally speaking, the number of threads you have should be equal to the number of processors that are available[/quote][/left][/size][/font][/color]

Actually you may want more threads than cores. Consider a single-core cpu. It can still benefit from having multiple threads, especially if each thread is likely to block on something. If you have a single core, and 4 threads:

- game logic & render thread : constantly runs, 'interrupted' by occasional work in the other threads
- sound thread : will periodically do a 'spurt' of work processing audio every so often. most of the time it's waiting
- network thread : will spend most of its time blocked, waiting on a socket
- file thread: will spend most of its time either waiting for the game to ask to load something, or blocked on a slow disk operation

On a modern OS, when a thread is blocked on something, such as waiting for the disk to read something, the kernel will set that thread aside, and run something in the another thread. The above design already benefits a little bit from having a multi-core processor, but isn't ideal. Since most of the work is still happening in the game logic/render thread, it could be further broken out like this:

-Render thread (just 1. OGL doesn't like threads, DX might not either)
-file thread (just 1; don't thrash the disk)
-network thread (1 per socket, but the client will probably only have 1 socket open anyway)
-sound thread (just 1)
-game logic thread (1 or many)

There is a bit of a difference now. Before the game logic & render thread alternated between advancing the game state, and rendering. You could just lock the game state, and take turns processing the next gamestate and rendering. At a first glance, it doesn't need any better than just having them share a thread, except with multiple cores, you could alternate between having 4 cores process 4 game state threads, and then have 1 core render, 4 cores process game state, and 1 core render, etc. This starts to reap the benefits of having multi core; but now what are the other cores doing while you are rendering? Probably not much. This is the point where you have to do one of two things: 1) break the game state into pieces, so you can start rendering one part of the gamestate while you begin to compute the next part or 2) double-buffer the gamestate (or at least just the parts that affect rendering), so that you can have 3 cores processing the NEXT game state while you have the 4th core render the PREVIOUS gamestate.

But, given the scope of most hobby projects, just breaking out super-slow things (like disk access), and the most bottlenecked calculations (physics etc) into seperate threads will give you the most benefit with the least effort (and bugs), which is more important than trying to achieve 100% cpu temperature.
0

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1328661555' post='4910710']On windows, it also (by default) wont wake that blocked thread up for at least 15ms, even if it only needs to block for 0.5ms, which isn't very good for a real-time system, hence you should avoid blocking.
[/quote]

This is really more for my sake than anything, but I was under the impression that this tidbit only held for explicit calls to Sleep(), and things like WaitForSingleObject() and friends can potentially be more responsive?
0

Share this post


Link to post
Share on other sites
Beside everything which has been said, there're two things I want to add.
1. Even with a single core CPU you already use a multicore system, that is CPU + GPU. You can utilize this by filling up the command queue of the GPU and then do some other tasks on the CPU. Until you don't use up this buffer (GPU is running, while CPU is idling) you don't need any multicore support (beside resource processing/loading).

2. When you want to add multicore support, you should start with it first. It is incredible hard to add multicore support later, some design decisions can easily steal the show (after writing 1000s of lines of gamelogic code in a scripting language like lua, it is somewhat demotivating to see, that lua doesn't support multicores in a single VM).
0

Share this post


Link to post
Share on other sites
[quote name='InvalidPointer' timestamp='1328682374' post='4910796']This is really more for my sake than anything, but I was under the impression that this tidbit only held for explicit calls to Sleep(), and things like WaitForSingleObject() and friends can potentially be more responsive?[/quote]It doesn't matter how you transition a thread into a [i]sleep[/i] state - once in that state, the kernel's scheduler won't come around and poke it back into a [i]ready[/i] state until the next scheduling cycle, which [url="http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx"]defaults[/url]to15ms on windows IIRC.
I actually should've said "up to 15ms", because if you go to sleep right before the next tick, you'll get woken up quickly ([i]i.e. somewhere between 0ms to 15ms[/i]). However, if there's too many threads, a [i]ready[/i] thread might not be chosen to execute in any given 15ms tick, other threads can cause starvation. After a thread has starved for 3-5 seconds, the kernel will give it a priority boost to ensure it gets to run for at least one tick. So any time you block (with default settings), you're sleeping for a best case of 0-15ms, and a worst case of ~4 seconds.

On a side note, [font=courier new,courier,monospace]Sleep(0)[/font] is allowed to immediately return instead of sleeping for a tick ([i]which is usually bad[/i]), if thread priorities allow for it.
2

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1328684518' post='4910803']
It doesn't matter how you transition a thread into a sleep state - once in that state, the kernel's scheduler won't come around and poke it back into a ready state until the next scheduling cycle, which defaultsto15ms on windows IIRC.
[/quote]
Hmmm.. I use Sleep(1) in my multicore support code, after reading more about this topic it seems, that you can utilize the timeBeginnPeriod/timeEndPeriod to increase the timer resolution. As far as I understand, this will affect the whole OS (OS will take the highest resolution). Therefore I would try to encapsulate the sleep like this:
[CODE]
timeBeginPeriod(1);
Sleep(1);
timeEndPeriod(1);
[/CODE]
Is this a bad idea (too much overhead, slowing down the OS) ?
0

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1328710959' post='4910899']
[quote name='Hodgman' timestamp='1328684518' post='4910803']
It doesn't matter how you transition a thread into a sleep state - once in that state, the kernel's scheduler won't come around and poke it back into a ready state until the next scheduling cycle, which defaultsto15ms on windows IIRC.
[/quote]
Hmmm.. I use Sleep(1) in my multicore support code, after reading more about this topic it seems, that you can utilize the timeBeginnPeriod/timeEndPeriod to increase the timer resolution. As far as I understand, this will affect the whole OS (OS will take the highest resolution). Therefore I would try to encapsulate the sleep like this:
[CODE]
timeBeginPeriod(1);
Sleep(1);
timeEndPeriod(1);
[/CODE]
Is this a bad idea (too much overhead, slowing down the OS) ?
[/quote]

Even if you change the timer resolution you still won't have any guarantees that a thread will wake up on time, (If you also boost the thread priority so it is higher priority than everything else on the system then it most likely will wake up on time though), only call timeBeginPeriod once early in the application:

[quote] (from msdn.microsoft.com)
Use caution when calling [b]timeBeginPeriod[/b], as frequent calls can significantly affect the system clock, system power usage, and the scheduler. If you call [b]timeBeginPeriod[/b], call it one time early in the application and be sure to call the [b]timeEndPeriod[/b] function at the very end of the application.[/quote]
1

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1328710959' post='4910899'][CODE]timeBeginPeriod(1);
Sleep(1);
timeEndPeriod(1);
[/CODE]Is this a bad idea (too much overhead, slowing down the OS) ?[/quote]In my experience, it seems to be fairly common practice to call [font=courier new,courier,monospace]timeBeginPeriod[/font]/[font=courier new,courier,monospace]timeEndPeriod[/font] in your main function ([i]i.e. applying the increased scheduling precision for the entire life of your program[/i]). N.B. you're also supposed to use [font=courier new,courier,monospace]timeGetDevCaps[/font] to find the system's minimum value, instead of using a hard-coded [font=courier new,courier,monospace]1[/font].
Also, despite the bad guarantees for how long a [font=courier new,courier,monospace]Sleep(1)[/font] will last on Windows, there aren't many other good options if you have to put a thread to sleep -- e.g. if you've got two threads sharing one core, you'll [b]have[/b] to sleep them at some point ([i]especially if their priorities are different - in which case one could be starved until it get's it's priority boost after 3-5 seconds of starvation[/i]).
2

Share this post


Link to post
Share on other sites
There is a simple two-threaded approach that gives a large performance gain for very little work.

Use 1 thread for Drawing and 1 thread for Updating.

Because Update() can never communicate with the DirectX Device
you need two intermediate buffers per component (I call it DrawState)

[code]var drawState = new DrawState[2];[/code]

Then you only need to synchronize the switch between blocks ... I use a array index:

[code]lock { updateIndex = 0; drawIndex = 1; }
or
lock { updateIndex = 1; drawIndex = 0; }[/code]

and each Component accesses its DrawState like so:

[code]Draw() { drawState = state[drawIndex]; }
Update() { drawState = state[updateIndex]; }[/code]

Once you have implemented this it can be trivial to add more threads to the Update() side

To summarise:
Update() only writes to one DrawState object
Draw() only reads from the other DrawState object
Every frame you switch / flip / toggle between the DrawState objects

I don't see the point in writing some games as single threaded and some as multithreaded when you can use this technique for all games.
0

Share this post


Link to post
Share on other sites
Here's a really good article that answered a lot of questions for me. Also I really suggest you use Intel Thread Building Blocks since they implement a lot of stuff you need like a task manager and lockless data structures.

[url="http://software.intel.com/en-us/articles/designing-the-framework-of-a-parallel-game-engine/"]Designing the framework of a parallel game engine[/url]

I'm still working on my engine but my idea so far is to have separate threads for the game logic and the rendering and audio. Then each thread spawns tasks each update. The rendering happens completely asynchronously from the game. I've never really been strong on the whole Model View Controller idea especially in games, but I ended up going with that kind of model naturally without even trying. The game thread has a controller that updates the entities, which are like the model. Then the renderer is like the view and has scene nodes that get updated by the controller.

Also PhysX 3 uses this kind of task based system which fits naturally into the multithreaded task processing architecture I'm going for.
1

Share this post


Link to post
Share on other sites
I've been thinking about this a lot recently after [url="http://www.gdcvault.com/play/1012192/-Sponsored-Firaxis-Civilization-V"]watching the GDC presentation by the guys who wrote Civ V[/url]. As many have pointed out above, task based systems are best because they are more scalable. I can kind-of understand how you'd do it for a game like Civ, which seems to me to be very well tailored from a processing point of view to multi-threading. But when it comes to processing more complex structures, like the entities in an Octree, I keep finding blocks where the code will end up becoming serialised regardless of threading. That is you break the tree up into little processing blocks and dispatch them to various threads, but at some point you have to complete the circle and update the structure to a newer version. As soon as you start locking various data structures in order to apply updates, you're losing the benefits of threading. Worse, you can find yourself losing cache coherence and make performance worse rather than better.

As far as Civ was concerned, the moral of the story is to profile the application and various ways of doing things. I'm not sure you can write scalability like that in a generic way. Rather, it has to be done on a case-by-case basis, perhaps with constructs like `parallel for' and so on.
0

Share this post


Link to post
Share on other sites
You basically have all entities in the game update as separate tasks each game loop cycle. There should be no locking at all since they are all independent of each other. They tell a central object about their updates, which has a lockless queue on it to prevent too much slowdown on a shared resource.

When all entities have updated at the end of the game loop, the central object notifies all entities that care about those entities about the changes. It also updates the visual side representation of the objects saying this object is now at this position and so on...
0

Share this post


Link to post
Share on other sites
I don't see a problem with traversing a tree in parallel. Updating it with new state is the issue of course. With "lock step" threading I can see how it would work. Jobs run, store their results, all results are applied to the tree at the end. I'd be sceptical about the resulting performance improvements of course. Anyway will check out the BF presentation later.
0

Share this post


Link to post
Share on other sites
[quote name='RobinsonUK' timestamp='1329137896' post='4912571']
I don't see a problem with traversing a tree in parallel. Updating it with new state is the issue of course. With "lock step" threading I can see how it would work. Jobs run, store their results, all results are applied to the tree at the end. I'd be sceptical about the resulting performance improvements of course. Anyway will check out the BF presentation later.
[/quote]

On the surface, yes, you're bang on the money. You can certainly have multiple threads be traversing the 'physical' tree in memory without incident. Actually doing work on it, however, is a bit murky. Consider the case of arrays for a moment. You can do a blanket segregation at the beginning, maybe partitioning the whole thing out into n chunks (where n is defined as being at or around the number of physical processor cores on the target machine for simplicity's sake) and then have all your threads go in guns blazing. If your computations are otherwise embarrassingly parallel or you're doing some super scheduling cleverness, you can actually get away with no additional synchronization; threads can just do computation without ever needing to interact with one another. You can actually handle array modifications in a similar way by trying to provide some upper bound on work that each thread would do, segregating output slots ahead of time, then having all threads write their (potentially variable-width) output within that set of bounds. You can then coalesce everything in later (possibly also parallel) passes over that data-- this is actually a fairly common approach (though not often directly seen) used in graphics and it actually has a name: stream compaction. There's quite a bit of research dedicated to doing this effectively and I'd suggest some Google-fu if it sounds interesting, as it's a very broad topic mostly outside the scope of discussion.

So, bringing that massive (bi)tangent back to heel, we're back with the idea of work distribution for arrays vs. trees. This is actually somewhat more subtle, but the speed advantage of trees for searching comes directly from their hierarchical skipping of regions based on previous sorting. Since we're in a hierarchy, we don't actually know what region(s) we need to search next until we run the current set of computations; this really, really kills the parallelism factor since it introduces hard synchronization points, and frequently. You can task multiple queries through the structure and see some speedup there, but it's still the same sort of 'bolted-on' concurrency dead end as things like a 'physics/render thread.' It helps in edge cases, but it's very likely those edge cases won't be a limiting factor or even present at all for most of your processing time. As an example, parallel view queries won't help you if you only have one camera, but parallel queries in of themselves will benefit one camera or 10,000 equally.
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0