• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Ripiz

Multithreading in games

20 posts in this topic

I've implemented Fixed-Time-Step based on L.Spiro's article ([url="http://lspiroengine.com/?p=378"]http://lspiroengine.com/?p=378[/url]); such approach forces to separate update and rendering completely. I thought this would be a good place to split them into 2 threads:[list=1]
[*]update thread; input, physics, AI, etc, etc; doesn't touch rendering at all;
[*]rendering thread; not allowed to write shared data at all, only reads it; takes care only of rendering.
[/list]

However there was other multithreading thread and some people said it's bad approach because of multiple dependencies, therefore tasks are better, ex. resource loading. Does this still apply even though threads are quite separate, and dependencies aren't very high (GUI seems to have heaviest dependency)?

Thank you in advance.
0

Share this post


Link to post
Share on other sites
This type of decoupling is different from multi-threaded rendering. Both logic and rendering happen on the same thread so it is implicit that, when rendering, nothing else is happening to the vertex buffers or object states, etc.

Actual multi-threaded rendering is done by keeping a synchronous command buffer of your own design which takes render commands from the game thread and executes them in order on the render thread.
Any resources such as vertex buffers and index buffers that need to be modified on the game thread while there is a change it is being used on the render thread need to be double-buffered as mentioned by Hodgman.


L. Spiro Edited by L. Spiro
2

Share this post


Link to post
Share on other sites
Hm... Thank you.
That does have a point, but I wonder where I could create tasks to make use of at least 2 cores. Resource loading doesn't happen every frame.
0

Share this post


Link to post
Share on other sites
[quote name='Ripiz' timestamp='1346864007' post='4976888']
Hm... Thank you.
That does have a point, but I wonder where I could create tasks to make use of at least 2 cores. Resource loading doesn't happen every frame.
[/quote]

Where ever you have N items to update, ie: 200 particle effects can be split into 4 tasks, where each one can update 50 particle effects.
1

Share this post


Link to post
Share on other sites
[quote]Also input should be on its own thread (on Windows it has to be the main thread since input is sent as window messages) since that is the only reliable way to time-stamp the input values you get. This is necessary for smooth processing of input data.[/quote]

L. Spiro, why don't you make use of [url="http://msdn.microsoft.com/en-us/library/windows/desktop/ms644939(v=vs.85).aspx"]GetMessageTime()[/url] in the Windows Procedure? This seems easier than handling input on a separate thread. Is it not accurate enough? What sort of precision between keystrokes do you typically look for?
0

Share this post


Link to post
Share on other sites
I personally prefer to handle all of my times in microseconds rather than milliseconds, but milliseconds are acceptable for input timestamps.
If you also handle the other issues related to using GetMessageTime() then it is an acceptable alternative.


L. Spiro
2

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1346750504' post='4976351']
[quote name='Ripiz' timestamp='1346748762' post='4976340']rendering thread; not allowed to write shared data at all, only reads it[/quote]Assuming no synchronisation, this is still a race condition. So you need two copies of your shared data - the update thread is reading the previous state from B and writing a new game-state to A, while the render thread is also reading the previous game-state B and drawing it, when they're both finished, you swap the A/B pointers around and start both threads going again.
There's a microsoft presentation called "[i]Multicore Programming, Two Years Later[/i]" that explains this technique quite well, but all my links are dead [img]http://public.gamedev.net//public/style_emoticons/default/dry.png[/img]

N.B. this design only scales up to dual-core CPUs, and is only of any use if your CPU usage is fairly well split between "update" and "render" tasks.
[/quote]

to add to this, Hodgman's method(from my understanding), means that you can theoretically stall the update thread, when waiting for the render thread to complete it's job. Another approach is to use 1 set of data, and 2 draw buffers, the update thread can continuously modify the data set, and when it's done, it can check if a buffer is ready to write the data into(if not, the frame is essentially dropped), so, if the buffer is ready, it writes the data, and mark's it as swappable(this flag is what determines if the buffer is writable or not), so the renderer comes along, swaps the draw buffers if the flag is set, clear's the swappable flag, and then continues to draw the same data until it sees the swapable flag again.

in essence, this doesn't tie your two threads together at all, and you can still do time-stepping code without worry about potential live-locks.

This of course only works for two threads, any more would require some other method of synchronization, such as Thread pools. Edited by slicer4ever
1

Share this post


Link to post
Share on other sites
[quote name='L. Spiro' timestamp='1346932640' post='4977152']
My recommended setup for Windows would be:
Main thread dedicated to input.
2nd thread runs game logic.
3rd thread does rendering.
4th thread does sound processing (and runs on a lower priority, often sleeping).

These 4 threads are dedicated (always running).
Then the thread pool can be used to send more threads out to handle resource loading etc.[/quote]
Does "Networking" count as input, and must be done on the main thread, or is that a fifth thread?

Further, you say your "setup for [i]Windows[/i]" - aside from specific platforms (like game consoles) where the hardware is always known ahead of time and can be optimized for, is there some reason to be laying out the threads differently on Linux or Mac OSX?
0

Share this post


Link to post
Share on other sites
Networking would be on another thread.
Quad-core systems are fairly standard today so if we assume 4 cores, my recommended layout would be:

Sound takes medium resources and network takes medium-to-low–medium, so these could be on a core together.
Input takes few resources and the game thread should only be firing once every 30 milliseconds or so, so these could be on one core.
Then rendering would have its own core.
This leaves one core entirely free for whatever else you need, especially for the thread pool and resource loading.


I singled out Windows just because I guessed it would apply to him or her, but it would work just as well on Linux and Macintosh OS X.


L. Spiro
2

Share this post


Link to post
Share on other sites
How are you telling Windows to run a thread on a certain core? Can we assume the OS will automatically distribute your threads across cores for you in an appropriate manner? Does this information show up in the task manager/resource monitor for verification?
0

Share this post


Link to post
Share on other sites
Yeah you can assume that Windows will do a decent job of distributing your threads over the cores.
You [i]can[/i] override it's decisions with the [font=courier new,courier,monospace]SetThreadAffinityMask[/font] function, but this can be harmful to performance if you're not as clever as Windows. e.g. maybe the user is encoding a video on the background which is fully using up one core -- Windows knows that but your game doesn't.
I provide an option in my config file that's off by default, but if it's enabled then it specifically binds the threads to cores -- users can turn on this option at their own risk.
2

Share this post


Link to post
Share on other sites
[quote name='Ripiz' timestamp='1346864007' post='4976888']
Hm... Thank you.
That does have a point, but I wonder where I could create tasks to make use of at least 2 cores. Resource loading doesn't happen every frame.
[/quote]

Just check out the opensource version of Intel Tread Building Blocks. Runs on VC++/GCC in 32/64 Bit. You simply have to redeploy the
tbb.dll on your Setup or compile it statically.

http://threadingbuildingblocks.org/ver.php?fid=188

Use the:tbb41_20120718oss_win.zip file and compile it with visual C/C++, its compiled and ready in just under 2 minutes.
There are a lot of samples in the package where you can see how Task based Multicore OOP Operations can be defined.

But be honest. This is nothing if you compare it with OpenCL or CUDA but there is no OpenSource implementation out
there right now, so you have to make source the specific closed source CL driver is present, wich unfortently comes only
budled with the specific Videoadapter. Bottomline: For CL you need at lest a ATI and NVidia Card installed and
the Vendorspeficic driver for each Graphichardware. My advide. Stay with TBB and OpenGL 3.2 and use shader
where possible, its lower headache.

Peter
1

Share this post


Link to post
Share on other sites
Hmm,
[quote name='L. Spiro' timestamp='1347013235' post='4977552']
Quad-core systems are fairly standard today so if we assume 4 cores, my recommended layout would be:
[/quote]

Intel in 2009 talks about the possibilities to integrate 1024 cores on a chip. There will be a lot more cores in relative short time, so any code should in generall be prepared to scale over a hugh amoun of cores. Also the OpenCL, DirectCompute and Cuda multipuporse Units are growing every day. On modern Graphicards over 2000 Shader cores are today deployed to the customers and far beyond the power of today Multicore CPU's. I think it is important. The other thing is cloudbased, serverside rendering and
small, mobiel device in an interconnect scenario.

[quote]Sound takes medium resources and network takes medium-to-low–medium, so these could be on a core together.[/quote]

For networking you have to deal with lags and you cannot run a such important part in an async thread, but syncronizing threads and locks are a performance killer firstclass.I think using Intels TBB's taskbased approach can be the easier way instead of dealing with platformspecific, native threadsubsystems (posix/mac/win/threads).

Peter
0

Share this post


Link to post
Share on other sites
[quote name='pmvstrm' timestamp='1347036888' post='4977719']
But be honest. This is nothing if you compare it with OpenCL or CUDA [/quote]

And nor is it meant to do the same thing.

Despite that NV might want you to believe the GPU isn't "BEST AT EVERYTHING!!!!!!" and the CPU still has plenty of work to do which it comes to things you need to get the result back from quickly.

Using the GPU is good when you aren't too worried about the latency involved in getting the data back but it's not the be all and end all of parallel development.
0

Share this post


Link to post
Share on other sites
Not if you use cuda or opencl. In this scope any CPU/GPU is only a core. You are not forced in most cases to transfer data from memory to GPU Memory and vice versa.You can write an opencl kernel wich can access the videoram directly without waiting of the CPU Cores or RAM. Anyway: If the Future is no longer Multicore and now Genereal Puporse Cores for anything, i a few years 16 cores per CPU is Standad like Quadcore is today standard. GPU's are todays 500 upto 2000 and counting. We have to deal with thousands of core in the Future and some code wich is not limited or using only one core is future ready.
0

Share this post


Link to post
Share on other sites
Yes, even if you use CUDA or OpenCL (more so if you use CUDA as you are locked on NV hardware and not even on the CPU).

Not all work loads are going to parallalise well onto a GPU and use them effectively, at that point you need alternative solutions.

GPUs are good at highly parallel workloads where you can get good occupancy and don't need to worry about the latency involved. However there is a point of deminishing returns when it comes to the occupancy issue and if you start issuing too little work then the GPU starts to stall out waiting around for memory and your thousands of cores goto waste. Dispatching less than 64 threads worth of work on a modern GPU is going to bite you in the effiency stakes. GPUs also don't deal well with branching as with 64 threads all moving in lock step you need to ensure thread branch cohearancy is good or you'll start wasting time and resources. If you had an 'if...else' block on a GPU where both paths are approimately equal in cost then all it would take for your GPU code to run slow would be one thread going down the 'else' path and doubling your run time.

CPUs, on the other hand, are very good at low latency branchy code where you have a few diverging paths you can take. While OpenCL can deal with this it isn't going to always be the best way of dealing with the problem which is where libraries such as TBB and MS's TPL come to play. Expressing a parallel 'for' loop is trivial in TBB/TPL; not so in OpenCL.

As for The Future, right now AMD have the right plan; a mixed approach where a CPU has both conventional and ALU-arrays (GPU in other words) which can do work loads they both do well at. The conventional core race has hit a wall, notice how we haven't increased core count recently? (I bought a 4C/8T i7 back in 2008, just recently I brought a 4C/8T Ivy Bridge i7). The future is mixed cores and even with OpenCL around you need to place your workload and pick your API accordingly.

So once again; they do not do the same thing. The GPU isn't best. Don't depend on increasing core counts to fix performance issues. There is no 'one API to rule them' when it comes to this kind of work.
1

Share this post


Link to post
Share on other sites
N.B. GPU 'cores' aren't the same as CPU cores. GPU manufacturers greatly exaggerate their numbers by multiplying by hardware threads and SIMD width.
Using the same definition, some quad core CPUs would actually have 32 cores...

Also the behavior is completely different - CPU cores that support 2 hardware threads is called 'hyperthreading' and gives a small performance boost by switching threads during CPU stalls. On the other hand GPU cores are designed to run the same kernel many times with known stall-points by saving the state of execution to a huge register bank, which allows for hundreds of hardware threads per core (hence the inflated numbers). Also the number of "hardware threads" per physical core isn't fixed like on a CPU, it can vary depending on how many registers the current kernel requires to store it's state.
It's a very different design that makes it hard to do an apples and apples comparison.

If I wrote software to cleverly round robin a batch of kernel executions on a CPU core by switching them in/out of L1 in the same way that a GPU does, then I'd again get to multiply that '32' by another constant. Using their definitions, I could make a quad core CPU actually be a 512 core GPU.... Edited by Hodgman
0

Share this post


Link to post
Share on other sites
TBB as mentioned by Peter has been used by a number of AAA PC games, and they've added functionality to pin (affinitize) certain tasks to a given hardware thread for OpenGL/DX where you need to use the main thread. There are a few good examples of using TBB for games from Intel (full disclosure I used to work there), see [url="http://software.intel.com/en-us/vcsource/samples"]http://software.intel.com/en-us/vcsource/samples[/url] in particular the tasking update. Intel's [url="http://software.intel.com/en-us/vcsource/tools/intel-gpa"]GPA Platform Analyzer[/url] is handy for multithreading optimizations as well.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0