when to use concurrency in video games

Started by
17 comments, last by Syntac_ 7 years, 11 months ago

ok, now to clarify i'm not asking ?what is concurrency? ?I am asking when to use concurrency when developing a video game. I have been thinking about it and i have only been able to think of one use case which is AI, by AI i mean systems that control ai pathfinding, that tell the AI what to do, when to attack when to retreat and stuff like that.

Things like telling one group of ai characters to shoot while the others are gathering resources for example.

?I would like to know where concurrency is useful in video games ... and why?

?Also you could also be kind and differentiate concurrency from parrallelism

Advertisement

To further clarify do you mean when to use concurrent/parallel algorithms?

If yes, then I'd agree AI is one of few places you can use it. If not, and you mean concurrency in general, you can use it loads of places. Input, game logic, rendering, audio, file io, the sky is the limit really. There are a lot of things in a game which don't need to happen in a serialized manner but could be run in parallel to one another.

Concurrency and parallelism are different names for basically the same thing. You want to do multiple things at the same time. Various people have their own individual definitions but they are not standardized. Someone might think the work between tasks must be interrelated or independent. Others will want the tasks to be run by a particular pattern. Some might require that two different pieces of hardware are working at the same time versus an operating system running a task scheduler that could potentially put multiple processes on the same hardware in time slices. The difference between concurrency and parallelism is a religious war more than anything.

To get to when and the why you need to understand a lot of details first.

You need to know the machines you are on and their parallel abilities. If you know your machine has three parallel x86 processors, that's important. If you know your machine has 6 SPE processors each with 6 parallel execution units, that is important too. You need to know what the hardware looks like. If you want to do something in parallel that is beyond what your target hardware supports you will get terrible support. If your write code expecting 8 hyperthreaded processors and you only have 2 non-hyperthreaded processes, that is not good. If you are working on parallel code for a graphics card, you need to understand what hardware is available there, instead. Understand your minimum hardware.

You need to know what problems you are interested in doing in parallel. Parallel searches, parallel sorts, parallel effects, parallel physics, parallel simulations, you need to understand the problem. Understanding the problem also includes understanding the size of the problem. A sort of 100 data elements can be quite different from a sort of 100,000 data elements. Know your problems.

Then you need to break down your processing in your game. You have core work that must get done. You must run your simulation. You must run physics. You also have optional work that is not necessary but can be nice. Giant particle systems, visual effects, audio effects. They are nice to have, but not critical to your game.

You mentioned AI, but most AI systems should be core work. You don't want someone on a high-end machine having a more powerful AI than someone on a low power machine.

Parallel processing for core architecture works great only as far as your minimum spec goes. If your minimum spec is for two processing threads it must work for two processing threads and any extra threads are a bonus. If your minimum spec is 6 SPE threads then it must work for that, instead. The minimum spec is important, and the difference between the minimum spec and the target ideal spec is important.

Once you've identified those, you need to look at algorithms that fit your architecture. Some problems have specific parallel algorithms. Parallel searching, parallel sorting, parallel independent tasks, these have straightforward parallel algorithms that are well-studied. Other concurrency like asynchronous calls for I/O are also well-studied. But not everything can be concurrent, some algorithms are inherently serial processing like numerical methods used in physics. Once you understand them you can each physics interaction is inherently serial, but you can do many of them side by side. You need to partition your problem into pieces that can be studied and eventually mapped to a target number of processors. Then typically you break down communications patterns between them, bundle them into groups of work, and map them to the actual CPUs for work. But that is more how rather than when.

We are closer to figuring out when. With this information you can figure out how to partition, agglomerate, and map the work, and you can estimate the kind of speedup you get. That is, you have problems that can be done in parallel either with a superlinear speedup (like parallel search) or with a roughly linear speedup (like compartmentalized work tasks) are great for parallel processing. If your problem can be readily broken down into either one, and the problems are large enough that the effort makes sense, it is a good candidate.

That leads to asking: how much should it be broken down? You should break down what you must do into at least as many minimum processing nodes as your system has, but you also can break it into more of them if that makes sense for your architecture. If you are developing on a game console where your hardware is fixed you know exactly the number of processors you will be mapping to. You can fine-tune to run on six parallel processors or whatever is in the console box.

That is what makes it tricky for PC development. If your minimum spec is 2 processing threads but you happen to be running on an Intel 8870 with 18 CPUs (36 HT processors), you'll have far more processing power available but no additional core work to do. You can write algorithms that take advantage of the core work you must do, but once that is done, you'll have time for optional work. If you write code that naturally breaks work down into too many tiny pieces it will add extra overhead. If you break the work down into hundreds of tiny pieces and only have two workers you add overhead for each tiny piece. But on the other hand, if you break it down to many pieces and have many workers available they can do the task more quickly.

So after you've figured out all those things, what you must do and what is optional, and you know the hardware you are doing it on, and you know the scope of the problems and what tasks it could be broken down to, you have enough information to make a business decision to answer the question of when to do it. You need to look at the cost of implementing it and the benefit you get from doing it. And you need to look at the costs of not implementing it and the benefits of not implementing it. If the effort is large and the benefit is small, it probably is not worth it. If the cost is small but the benefit is big, it makes sense to do it. But when the cost is big and the benefit is moderate, you need to decide on your own if it is worth while. When the business decision says yes, or the business decision says no, that is the choice you should make for when to do it or not.

The things I use multithreading for is whatever takes a lot of processor time and I don't want the player to wait on it. My current project is a space sim that the player can go between stars and visit planets. I don't want a "load screen" that the player has to wait on so I can generate the planet textures. I use it for other things, but that's the big one.

If all you are doing with your threading is AI, then it shouldn't be a problem. Most systems will have at least a dual core processor now unless it's "grandma's internets computer for the face books"..... (sorry to offend anyone if they roll like that)

The problem with multithreading is knowing how to do it. Whenever you have a thread manipulating a variable or data structure, it makes them unavailable to other threads unless you declare it with "volatile".

I looked up this guy to learn how .

" title="External link">

Multithreading is a powerful tool, but it's easy to lock up your code if you do it wrong.

The problem with multithreading is knowing how to do it. Whenever you have a thread manipulating a variable or data structure, it makes them unavailable to other threads unless you declare it with "volatile".

Multithreading is a powerful tool, but it's easy to lock up your code if you do it wrong.

If you use volatile for multi-threading, you're doing it wrong.

The things I use multithreading for is whatever takes a lot of processor time and I don't want the player to wait on it. My current project is a space sim that the player can go between stars and visit planets. I don't want a "load screen" that the player has to wait on so I can generate the planet textures. I use it for other things, but that's the big one.

If all you are doing with your threading is AI, then it shouldn't be a problem. Most systems will have at least a dual core processor now unless it's "grandma's internets computer for the face books"..... (sorry to offend anyone if they roll like that)

The problem with multithreading is knowing how to do it. Whenever you have a thread manipulating a variable or data structure, it makes them unavailable to other threads unless you declare it with "volatile".

I looked up this guy to learn how .

" title="External link">

Multithreading is a powerful tool, but it's easy to lock up your code if you do it wrong.

That's now how it works. All memory is available to all threads in a process. This means that any code you write can read and write to any part of memory at the same time.

The volatile keyword is really just a hint that may cause what you write into that memory location to be flushed to ram from the CPU cache (heavy emphasis on may, volatile is poorly supported).

To share memory between two threads you need to protect it with a mutex (mutual exclusion). A mutex is basically a varable that you use library functions on to lock it and unlock it. You have to be sure to lock it before you mess with your other variable. This means when your second thread tries to lock it to also mess with that variable it will block until the first thread unlocks that mutex.

This is a big bottleneck so make sure you share the bare minimum amount of memory to accomplish the task and don't share very often.

This is usually accomplished by having separate working copies of data and syncing them when other tasks are done as well as other ways.

One programming pattern that I like to use enforces the locking and unlocking of the mutex. You create the main object and make everything protected. Then you have a "lock" method that creates a locked version of that object (another object that contains a reference to the first object and has all of the same methods made public and they simply wrap the methods of the first object). When the locked version is deleted it calls the method on the first object that unlocks the mutex.

?I am asking when to use concurrency when developing a video game.

Whenever you've got a function that has to operate on more than one object. Most game systems have more than one object in them... So, everywhere.

?Also you could also be kind and differentiate concurrency from parrallelism


First, for (b), unlike previously stated these are _not_ the same thing. Parallelism is when two threads of execution are being evaluated simultaneously. This requires hardware and OS support in the form of a multi-core CPU or multiple CPUs and OS support for exposing these hardware features to applications.

Concurrency is when two threads of execution can both make progress. Basically, concurrency is the equivalent of "is there more than 1 thread" while parallelism is "can more than 1 thread actually run at the same time." Parallelism is for performance while concurrency is for correctness.

This is important because in a multi-tasking OS you can have concurrency without parallelism. For instance, the OS might be _cooperatively_ multi-threaded (meaning the threads have to explicitly say when another thread can run) or the OS might pre-emptively multitask on a single core (e.g., how threads work when you have more thread than you have cores).

For game developers particularly, this distinction is important because naive programmers will make too many threads, not realizing that they're over-saturating the hardware. A multi-threaded algorithm is not magically faster than a single-threaded algorithm; in fact, the overhead of trying to support multiple threads usually makes code _less_ efficient. Multi-threading only helps raw performance when (a) you have at least as many cores as you have busy threads and (b) the cost of transmitting data and signaling threads is swamped by the cost of the work actually performed by the threads.

Basically, don't use the "one thread per task" model. Instead, use a worker thread (aka "job system" or "task system") model, and be sure to only create jobs/tasks for non-trivial work. For instance, if you have a parallel_for algorithm (execute a block of code N times), you should only kick off jobs for other threads if N is greater than the number of threads times some threshold. Otherwise, the cost of creating the data structures and signaling the other threads will take longer than just running a plain loop in the original thread.

?I would like to know where concurrency is useful in video games ... and why?


The "why" mostly comes down to being "pretty much all CPUs today have multiple cores and you're throwing all that processing power away if you aren't using threads."

The "where" boils down to several broad categories. One is: anywhere that you have a lot of non-interdependent data to process. For instance, your AI example. If you have 1,000 Goombas in a level and all they have to do is decide whether to walk left or right, you could split up that big job of "update 1,000 Goombas" into two separate jobs of "update 500 Goombas" and run those on two cores, potentially doubling the speed of your AI (in practice you'll never actually scale that perfectly).

Not that I said non-interdependent. _Synchronizing_ threads is very, very expensive - mutexes will stall (stop) a thread and even atomics have a measureable overhead. You want to _avoid synchronization_. If object A can affect object B, it is necessary for object B to receive inputs from object A. If they're being updated on two separate threads, that requires some form of synchronization.

Now, there are _many_ ways to solve these kinds of problems that still works with threading (e.g. multi-pass algorithms), but those solutions are _more difficult_. If you're a hardcore game developer who's set out to be excellent at their craft, difficulty won't scare you away... but if you're just trying to ship a fun game, you have to prioritize which difficult problems you're going to spend your time on. It may be that your AI or rendering or the like is more valuable for your game that a fancy job/task system architecture.

That said, such architectures are popular because there's _many_ places it's useful. Take rendering, for instance: object culling is mostly non-interdependent. You "just" have to iterate through some spatial data structure and test each branch to see if it's within the frustrum and not obscured. If you have potentially tens of thousands of objects, splitting those up into some larger clumps and distributing those tests over 4 cores is a huge win.

Physics can likewise gain, though parallelizing physics is _hard_. Granted, most people don't write their own physics engine and just use Havok or Box2d or whatever, so you don't have to worry too much about that.

AI can gain, as you conjectured. UI rendering is often very complicated and there's a lot of "easy" gains there with parallelism (e.g., calculating UI layout while other tasks in the engine are executing).

The "where" question does come down to some other broad categories. Latency is a big one where concurrency can be a huge win even if you lack parallelism (e.g., using threads is good even if you don't have enough cores). Audio is the classic example, as audio buffers have to be processed frequently independent of what else the game is doing. If you have particularly complicated algorithms that cannot easily be broken down into smaller tasks, it might be beneficial to move those off to their own threads in order to avoid pauses or hitches in your game.

Really, though, the best (and much shorter) answer is probably that all CPUs are gaining cores faster than they're gaining hertz, so multi-threading is mandatory to squeeze more performance out of newer hardware. Resource IO is an example here; many OSes force a thread that makes an IO request to stall and wait, so in order to keep your game presenting frames smoothly you have to make those IO requests on a separate thread. This can be a case where concurrency without parallelism still wins (because the extra threads aren't actually _working_ or taking up time on your CPU).

For example, the XBox 360 has three hyper-threaded cores that run at 3.2 GHz, while the XBox One has two four-core APU modules running at 1.7 GHz, yet there's little debate that the XBone is the more powerful hardware. Granted, there are _many_ other factors at play and direct comparisons are very very complicated (buses, memory, instruction set, quality of architecture, etc.) but the "8"-core XBone handily beats the "3"-core XB360 (quotes because both CPUs share various computation resources between threads; the XBone's CPU "shares less" though so I'm considering its two quad-core modules as eight full cores, while the XB360's hyper-threaded cores share far more so I'm not counting each core as two like some marketing material likes to do).

Sean Middleditch – Game Systems Engineer – Join my team!

The problem with multithreading is knowing how to do it. Whenever you have a thread manipulating a variable or data structure, it makes them unavailable to other threads unless you declare it with "volatile".

Multithreading is a powerful tool, but it's easy to lock up your code if you do it wrong.

If you use volatile for multi-threading, you're doing it wrong.

That isn't necessarily true, granted using it everywhere or for everything which is shared between two -or more- threads would be morally reprehensible.

If all you need to do is read some memory, and you don't truly care if SNAFU is the word of the day.. then feel free to use volatile.

I used volatile on certain data members I wanted to draw to screen as text. Health, ammo, score. Single values which get updated *maybe* once a frame, so if a couple frames got fubar'd I didn't care. In practice, every single frame was A-OK. Perhaps my loads weren't heavy enough.

?Also you could also be kind and differentiate concurrency from parrallelism

First, for (b), unlike previously stated these are _not_ the same thing. Parallelism is when two threads of execution are being evaluated simultaneously. This requires hardware and OS support in the form of a multi-core CPU or multiple CPUs and OS support for exposing these hardware features to applications.

Concurrency is when two threads of execution can both make progress. Basically, concurrency is the equivalent of "is there more than 1 thread" while parallelism is "can more than 1 thread actually run at the same time." Parallelism is for performance while concurrency is for correctness.

This is important because in a multi-tasking OS you can have concurrency without parallelism. For instance, the OS might be _cooperatively_ multi-threaded (meaning the threads have to explicitly say when another thread can run) or the OS might pre-emptively multitask on a single core (e.g., how threads work when you have more thread than you have cores).

....

For instance, if you have a parallel_for algorithm (execute a block of code N times), you should only kick off jobs for other threads if N is greater than the number of threads times some threshold. Otherwise, the cost of creating the data structures and signaling the other threads will take longer than just running a plain loop in the original thread.

First of all, thanks for that - what should have been - obvious explanation; I hadn't considered the intent communicated by each word.

Second. I'd kill a man for some additional elaboration on that last point.

You seem to have a science worked out for evaluating the cost gain ratio of a concurrent or parallel algorithm. I seem to have gotten a bit hung up in the middle of the sentence though, I am not sure what the quantity of threads is referring to (ie. The value you compare N against.)

How many threads the job is being broken up into?

Or perhaps, is it the number of threads available to run in parallel?

Concurrency is when two threads of execution can both make progress. Basically, concurrency is the equivalent of "is there more than 1 thread" while parallelism is "can more than 1 thread actually run at the same time." Parallelism is for performance while concurrency is for correctness.

While this can be true, for modern environments it is generally not compelled to be true.

If I launch four threads to do a task and I have four CPUs on my machine, I generally presume the operating system will put one task on each thread. But unless I take specific steps to ensure they are mapped that way, the OS has many scheduling options available.

The OS can run them sequentially on a single CPU and swap them out when they do blocking operations, or run them sequentially on multiple CPUs, or it can time-slice them on a single CPU, or time slice them on any number of CPUs. It can move them from one CPU to another CPU mid-execution. It can schedule them among other processes in any way the scheduler chooses.

You can take steps in most systems to require that the OS map a processing thread to a specific CPU, but even then the scheduler won't guarantee they are run simultaneously on different hardware at the same instant except on a small number of real-time operating systems for hardware that doesn't apply here on gamedev.net.

The parts about performance and correctness I feel is nonsense. Code should always be correct or you are writing bugs.

This topic is closed to new replies.

Advertisement