• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
realh

Multithreading vs variable time per frame

29 posts in this topic

Most existing game engines seem to be based around a monolithic single-threaded main loop which serially handles input, AI etc and rendering. I think separating rendering from game logic with threads has major advantages:

 

Rendering has the biggest influence on performance. With a single thread the game logic has to be able to cope with being polled from anywhere <20 times per second to hundreds, and that can be difficult to manage in some games. With a separate game logic loop you can set it to "tick" at a certain rate independently of the achieved rendering frame rate, and in most cases assume the system can guarantee that frequency unless the whole thing is so bogged down that slowing down the gameplay is acceptable or even advantageous.

 

The cons of multithreading are the notorious difficulty of thread-safe programming and tracing bugs when you get it wrong. You have to use mutexes etc to make sure the rendering thread doesn't try to render an object while the logic thread is halfway through updating its coordinates or something, and there is a certain trade-off between the amount of blocking and getting every frame to render precisely.

 

Have skilled programmers over the years examined these pros and cons carefully and decided that single-threading is still the way to go? Or are things changing now that all the major platforms have multicore CPUs and decent threading implementations? And, for example, in Android it's important not to block the input thread and it uses a dedicated rendering thread by default; although native_app_glue apparently favours forwarding input events to the rendering thread.

2

Share this post


Link to post
Share on other sites

I think you're talking about a different situation/problem. You're describing a sort of load balancing but I want to decouple rendering from game logic so that the latter isn't affected by the speed of rendering. In the game I'm currently writing the logic is simple with a relatively low CPU demand, and doesn't really justify being split up into threads, and I think even a low-end phone should be able to maintain a consistent 60 ticks per second. But the rendering engine might not be able to keep up with that, and I'd like to prevent the game logic having to deal with being called at varying intervals.

 

As you say, some platforms are restricted to performing all OpenGL calls on a particular thread, or at least strongly advise against spreading your OpenGL calls across threads, so I thought a rendering thread and a logic thread would be a good way to do it. Is this a bad idea full stop because of the need for mutexes? I don't think they can be safely eliminated whenever more than one thread has access to the same object.

 

I think there's another possible approach, which is to use a single thread, but design the game loop to use ticks of fixed duration for AI/logic. Each time round the loop you check the time, and call the tick handler as many times as necessary to keep the average tick rate constant. Or if the frame rate has dropped too much you can allow the gameplay to slow down (like when Quake shows the tortoise icon). The disadvantages of this are that the game logic still has to be separated from the rendering in case you need to tick more than once per frame, and it's more likely to suffer from laggy response to input than separate threads.

2

Share this post


Link to post
Share on other sites

Thanks for all your input, I've learnt a lot from this thread and by reading the articles mark ds linked to. I think there are still some reasons I could stick to having a separate rendering thread though:

 

  1. I've already invested some time writing it that way.
  2. It mirrors the way Android seems to favour.
  3. I've thought of an easy way to eliminate all those mutexes. I can give every object and the camera etc a pair of matrices and whatever else needs to be accessed in both threads and use them in much the same way as double buffering. Then I only need one global flag to make the rendering thread use one copy while the game thread is using the other, then flip them over at each tick, and the only thread-safe constraint it needs is to be volatile. How does that sound?

[Edit] That's no good after all, the global flag wouldn't protect against one thread flipping the flag and accessing a matrix while the other thread is still in the middle of using it. But I think it could work with a triple copy setup and/or an extra flag to indicate whether one thread has completely processed one bank and it's safe for the other to flip and reuse it. Such a flag would need a cond though.

Edited by realh
2

Share this post


Link to post
Share on other sites

I've thought of an easy way to eliminate all those mutexes. I can give every object and the camera etc a pair of matrices and whatever else needs to be accessed in both threads and use them in much the same way as double buffering. Then I only need one global flag to make the rendering thread use one copy while the game thread is using the other, then flip them over at each tick, and the only thread-safe constraint it needs is to be volatile. How does that sound?


(assuming C++ and not C#) There's no need to be volatile. Use atomics to track the pointers, or just a regular mutex, as volatile doesn't really do anything. You should spend some time reading up on atomics, memory barriers, and locking primitives. You'll also want to be careful with having only a flag, as that leaves room for bad race conditions if you aren't very sure about your memory operation ordering (and what the compiler or CPU might do behind your back).
0

Share this post


Link to post
Share on other sites


I've thought of an easy way to eliminate all those mutexes. I can give every object and the camera etc a pair of matrices and whatever else needs to be accessed in both threads and use them in much the same way as double buffering. Then I only need one global flag to make the rendering thread use one copy while the game thread is using the other, then flip them over at each tick, and the only thread-safe constraint it needs is to be volatile. How does that sound?

 

As Sean points out, it doesn't need to be volatile and actually volatile is a bad thing to use anyway.  You don't even need this to be atomic either.  Give the sim and renderer a working index and use three threads: control, sim and renderer.  Control starts off by issuing a sim tick to get the initial set of matrices and when sim completes it tells the renderer to render using index 0.  Control increases the index for the sim to 1 and tells it to compute the next frame as the renderer is going.  At this point you can completely decouple rendering and simulation.  If the renderer finishes before the next simulation finishes, you can re-render the same data or wait.  If the sim has finished another frame and is working on a third, renderer can extrapolate from last frame through current frame.  If the renderer is slow, sim keeps alternating between two matrices that it fills with the most recent data.  The key point is that the sim and renderer notify the control thread and wait until told what to do, a single mutex/blocking state per game frame is completely viable and won't pose any significant performance issue.  If the logic controlling the indexes in use is in the control thread, you don't need any thread safety on the indexes since the other threads are unable to contend for the data at that point.

 

Using shadow data of this form is fairly common.  Deciding what to do when one or the other items is slow, that's really up to you and how you want to deal with it.  In general though, I believe all the mixtures of possible fast/slow require 3 matrices stored and only the control thread is allowed to change the index which the other threads will use.

2

Share this post


Link to post
Share on other sites

Thanks for the advice. TBH I was expecting to be told this is a silly idea, just rewrite with a single thread smile.png.

 

About using volatile, I thought that was necessary. I thought it was there to make sure any writes to a variable get written to memory immediately rather than allowing the optimiser to cache intermediate values in a register. The caching can prevent changes propagating to other threads. The sim and rendering threads only need to read the index (of which set of matrices etc to work with) once per tick, so they can copy the volatile master index to a non-volatile thread-local variable to avoid the overhead of volatile. Alternatively the control thread can pass the indexes as arguments to the other threads' tick functions.

 

Atomic needs C++11 doesn't it? I've been conservative and used older-style C++. The main reason is that I'm a Linux geek and want to support MinGW in case I open source parts of my engine/framework, and I haven't been won over by MSVC either. Unfortunately MinGW doesn't support the C++11 threading API for some reason (as of a few months ago anyway), and I suspect that might imply lack of support for atomic too. I've used a wrapper API around pthreads or WinAPI as appropriate.

2

Share this post


Link to post
Share on other sites


Volatile is only needed if you're trying to reinvent mutexes yourself -- in most projects, use of the volatile keyword is simply banned: if you try to use it, the team will just assume that your code is buggy, because it most likely is

I need to find a good "volatile considered harmful" article, don't I.

0

Share this post


Link to post
Share on other sites

If you end up with something like a mutex per object, which is there to allow multiple threads to randomly share it.... Then you're completely on the wrong track ;-)

The standard model in use by engines these days is not to have a fixed number threads that each run their own system. Instead you have a pool of threads (sized depending on the number of cores in the system) which are all pretty much equal. The game is then decomposed into a directed-acyclic-graph of small "jobs". A job is a function that reads some inputs and writes to some outputs (no global shared state allowed!). If the output of one job is the input of another, that's a dependency that affects scheduling (this job cannot start until the input actually exist). From there you can construct a schedule, or dependency graph can be constructed, so each job can tell whether it's allowed to start yet or not. Every thread then runs the same loop, trying to pop jobs from a job-queue and executing them (if allowed).

There's no error-prone mutex locking required, there's no shared state to create potential race-conditions, and it takes full advantage of 1, 2, 4, 8, or more core CPUs.

That's the ideal version. Often engines will still use one of those threads as a "main thread", still in a typical single-threaded style, but anything computationally expensive will be diced up into jobs and pushed into the job queue.
Some platforms have restrictions like you mention, so you may have to restrict all "rendering API jobs" to a particular thread too.
But overall, the "job queue" or "job graph" (or flow-based) approach has become the defacto standard in game engines, rather than mutexes (shared state) or message passing styles of concurrency.

Also, message-passing should always be a default choice over shared-state too ;-)

 

Do you by any chance have any pointers to papers or presentations that go into concrete implementations of these systems? I read tidbits about task based parallelism here and there sometimes but I've never been able to find a comprehensive overview of such a system and some of the gritty details concerning a practical application (e.g. what to do about context switching, how do you actually schedule [implicitly/manually or use some kind of scheduling algorithm at runtime], etc.).

 

Also I could swear I just recently saw a presentation for a PS4 game where someone said that overall 20-30% or something of the used processors are used at any time, i.e. 70-80% of the potential performance isn't even being used. But that's probably to be expected, parallelism is a bitch and I guess if you're at 30% with 6 cores it's better than being at 17% (i.e. single core performance).

Edited by agleed
0

Share this post


Link to post
Share on other sites

I don't know if it's considered good form to post for this, but that is the most interesting read I had in a long while, it made me reconsider how I will approach multi-threading in my future projects (as it's a bit too late for the current one). So, thank you all :)

2

Share this post


Link to post
Share on other sites

About using volatile, I thought that was necessary. I thought it was there to make sure any writes to a variable get written to memory immediately rather than allowing the optimiser to cache intermediate values in a register.


In some cases, it literally does nothing. In other cases, it doesn't do nearly enough. Most of the time, volatile just causes needless de-optimization without solving any real problems.

You need more complex memory barriers to ensure proper ordering of reads and writes (especially since either the compiler or the CPU itself can do instruction reordering or memory store/fetch reordering). For some types of values or some CPU architectures, you also need explicit atomic instructions just to ensure that other threads don't see half of a variable change (not a problem most of the time on x86, but it can be on other architectures).
 

Atomic needs C++11 doesn't it?


If you want to write purely ISO-conforming C++ with absolutely no extensions or libraries, sure. GCC and Clang support intrinsics and other extensions to support atomic values portably across different OSes and architectures. Many game libraries provide their own platform-neutral APIs for threading and atomics (and some even offer higher-level abstractions that are very useful). You can very easily use threads and atomics portably across Linux, OSX, Android, Windows+MingW, Windows+VC++, iOS, etc. using pure-C APIs like SDL2.

https://wiki.libsdl.org/APIByCategory#Threads - SDL threading/atomics support, and I'd guess that you're probably already using SDL (or something equivalent) anyway
https://www.threadingbuildingblocks.org/ - Intel Threaded Building Blocks, which is a high-level concurrency library that supports Windows and Linux
0

Share this post


Link to post
Share on other sites


In some cases, [volatile] literally does nothing. In other cases, it doesn't do nearly enough. Most of the time, volatile just causes needless de-optimization without solving any real problems.

You need more complex memory barriers to ensure proper ordering of reads and writes (especially since either the compiler or the CPU itself can do instruction reordering or memory store/fetch reordering). For some types of values or some CPU architectures, you also need explicit atomic instructions just to ensure that other threads don't see half of a variable change (not a problem most of the time on x86, but it can be on other architectures).

 

The important point for me was not why I shouldn't use volatile, but why I didn't need to. I didn't understand that function calls act as memory barriers, ensuring that any variables cached in registers are made consistent with memory at that point.

 

I am using SDL, at least for the PC versions, but I've had some problems with the Android version and will probably use Android's API directly for that. It advises strict caution when using its atomic support.

0

Share this post


Link to post
Share on other sites

What I read implies that POSIX demands most function calls act as memory barriers. If the called function's definition is in a separate source file the compiler can't know its content and whether it in turn locks or unlocks a mutex so I think the only safe option is to assume it might and act in a thread-safe way.

0

Share this post


Link to post
Share on other sites

What I read implies that POSIX demands most function calls act as memory barriers. If the called function's definition is in a separate source file the compiler can't know its content and whether it in turn locks or unlocks a mutex so I think the only safe option is to assume it might and act in a thread-safe way.

yeah, it's very likely that the compiler will act in this way... But the problem is that your CPU hasn't been told about the fence!!
Modern CPUs -- in the never ending quest for finding more performance by adding more transistors/complexity -- have the capability to actually re-order the stream of ASM commands, and re-order their reads/writes of memory. They're allowed to do as they like, as long as the results that they produce are the same in the end. This is only an issue with multi-core software.
In a single threaded program, whether you write "data=5; readyToReadData=true;" or "readyToReadData=true; data=5;" doesn't matter.
But in a multi-threaded program, this ordering really can matter -- the latter bit of code might mean that another thread sees the Boolean as true, and tries to read the "data" variable before the number "5" has actually been written to it.

Even if your compiler is nice enough to not reorganize your code, you still need to tell the CPU that it's not allowed to reorganize these crucial two steps -- the data must reach RAM before the Boolean does.
To ensure the CPU doesn't mess this ordering up, you need to use (expensive) memory barrier instructions at the right places. The compiler won't do this automatically for every function call because it would make your code 10-100x slower! It only emits these instructions where you ask it to - either automatically when you use mutexes/etc, or manually.

Doing this manually is often called "lock free" programming, but it's extremely dangerous and error prone - you really need to be very familiar wih the hardware architecture. Alternatively, proper use of standard synchronization primitives, such as mutexes, also ensures that CPU memory-barrier instructions will be placed in all the key places.

If you're doing shared-memory concurrency (where a mutable variable is used by more than one thread), either you use the standar synchronization primitives, or you use your expert knowledge of the CPU and memory architecture at an assembly level to hand-code the CPU-level synchronization instructions (using compiler-specific intrinsics, raw ASM, or the C++11 std lib or similar) and write some tests to be sure -- anything else surely has subtle bugs.
2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0