Upcoming Events
Southwest Gaming Expo
11/20 - 11/22 @ Dallas, TX

Workshop on Network and Systems Support for Games (NetGames 2009)
11/23 - 11/25 @ Paris, France

ICIDS 2009 Interactive Storytelling
12/9 - 12/11 @ Guimarães, Portugal

Global Game Jam
1/29 - 1/31  

More events...


Quick Stats
6658 people currently visiting GDNet.
2341 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!



Link to us

Link to us

  Intel sponsors gamedev.net search:   
Practical Multithreading for Game Performance
Posted March 13 11:16 PM by Richard Fine
In 2006, 70% of the processors Intel sold were dual core. By the end of 2007 Leigh Davies estimates it'll be 90% - two or more cores.

We're dealing with four components to a chip stack: thread state, the execution unit, cache, and the bus. Basic processors - such as the original Pentium - have one of each. The hyperthreading Pentium 4 introduced a second thread state to be used by the idle half of the single execution unit. In 2005 the Pentium D saw two entire stacks on one chip; then in 2006 the Core 2 Duo had two state and execution units sharing a single cache and bus. Most recently, the Quad Core chips have four state and execution units sharing two caches and buses. So the hardware is most definitely there.

Crytek have seen upwards of a 60% performance boost from optimizing Crysis for multicore, so it's definitely plausable for real-world applications. Bear in mind that many of the regular optimizations that apply to single-threaded applications don't apply in multithreaded situations; when you have multiple things happening in parallel, the frame time will be determined by the length of the critical path, so optimizations made to things that don't lie on the critical path won't improve the frame time.

Leigh's six main points:


  1. Design is critical.

  2. Bad multithreading is worse than no multithreading. If your multiple threads are still executing in a serial manner, then all you've done is add the overhead of context switching into the mix.

  3. Profiling is critical. (Not least because without it you won't be able to identify the critical path).

  4. It's likely to affect the entire architecture of your game.

  5. It needs to be flexible enough to cope with varying hardware - both scaling down to single-core machines, and scaling up to quad-core (and beyond! There was talk of an 80-core processor prototype Intel have in their labs somewhere).

  6. The five key things you need to consider in your design: Efficiency, latency, throughput, concurrency (and Amdahl's law), and bottlenecks.


There are four main 'concurrency design patterns' one might use to shorten the critical path:

  • Data decomposition - splitting a large dataset into pieces and processing each piece in parallel. Frameworks like OpenMP can assist with this. This is usually simple to do, but limited in scope.

  • Functional decomposition or "pipelining" - splitting your application up by subsystem and running different subsystems on different threads. This is one of the conceptually nicer ways to approach things, but suffers from latency issues, i.e. user input might not produce a visible effect for a greater number of frames.

  • The consumer/producer model: one thread produces data that the other thread consumes as soon as it is available, overlapping the processes. This requires one-way dependencies, really.

  • "Work Crew" - split your application up into 'tasks' and let threads take them on as soon as they become free.



In general it would seem that you'll want to use a combination or some or all of those to get the best result.

Leigh also spoke a bit about the importance of understanding how Windows thread scheduling works, illustrating how a background thread that doesn't yield could inadvertently end up blocking the critical path. He recommended avoiding the use of SetThreadAffinity, as it reduces the freedom the scheduler has to manouver. He also recommended looking into the Win32API function InitializeCriticalSectionAndSpinCount() for short locks.

Lastly, one major piece of profiling advice: a repeatable rolling demo that you can use to benchmark performance and make comparisons is priceless.


 
 
Menu
 Back to GDC 2007
 See more Programming
 Discuss this article