• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
tom_mai78101

How many average number of threads does a game needs, regardless of simplicity?

20 posts in this topic

Your only required to have 1. However, it may be beneficial to split parts of the game up into separate threads.
0

Share this post


Link to post
Share on other sites
[quote name='tom_mai78101' timestamp='1341140665' post='4954489']
If a basic game requires to be able to do multitasking stuff, how many threads does a game requires? Thanks in advance.
[/quote]

You only need 1 thread (computers are fast and multiple tasks executed after eachother will appear to run at the same time (The vast majority of games on the market are singlethreaded)).

I'd strongly recommend against writing multithreaded code if you are just starting out, it adds quite a bit of complexity.

If you are going to go the multithreaded route anyway i'd recommend looking at thread pools, don't split the game in one physics, one ai, one renderer thread etc (it doesn't help much at all), instead split for example the AI into multiple smaller units that can be processed independently and have a pool of worker threads that processes these work units for you(You can vary the number of worker threads based on the hardware it runs on more easily aswell). (a semi functional approach to your state updates will make it easier to avoid excessive synchronization (just write to an old state object/structure rather than returning a new one to avoid unnecessary allocations).
2

Share this post


Link to post
Share on other sites
Physics could be a good candidate for a bit of parallellisation right? I.e. If you have 1000 objects to process say 1-250 on core 1 251-500 on core 2, 501-750 on core 3, and 751-1000 on core 4. Or is that impractical when you're not sure which batch will finish first and wait for the others (race condition perhaps?)
0

Share this post


Link to post
Share on other sites
Problem with threads is timing, you got to ensure only 1 thread can access a piece of data at a time so you lock the data and if another process needs that data it has to wait so you could end up with long delays in each loop which would slow your code even more than single thread code.
Games generally run in a set order ... check for user activity , calculate the logic and then render the scene. If one is delayed, the other sections would be delayed.

One usual use is to show the user something while it loads stuff, or load more portions of the level while the player is still busy in a different level... situations where the one thread will not require data another thread may potentially access at any time.

It may sound simple enough to use locks and unlocks to control access of data but when you get a few layers into the locks and unlocks its easy enough to get caught in a jam, where one thread is waiting for a unlock but the thread that did the lock is in turn waiting on a lock the first thread done.
My knowledge of threads is pretty limited so I may be talking rubbish at this stage and things have changed since I last read about threads :P
0

Share this post


Link to post
Share on other sites
[quote name='SimonForsman' timestamp='1341143321' post='4954506']
You only need 1 thread (computers are fast and multiple tasks executed after eachother will appear to run at the same time (The vast majority of games on the market are singlethreaded)).[/quote]
Not true for AAA games in the market. Though, they don't use more than 2 threads, while a very short number number of games running with 4+

[quote]Physics could be a good candidate for a bit of parallellisation right?[/quote]
Yes.

[quote]I.e. If you have 1000 objects to process say 1-250 on core 1 251-500 on core 2, 501-750 on core 3, and 751-1000 on core 4. Or is that impractical when you're not sure which batch will finish first and wait for the others (race condition perhaps?)[/quote]
The basic principle works like that. But it's more complex because... what happens if object #150 interacts in any way with object #551? There are two ways to solve it:
1) Use locks so they are resolved safely. (SLOWW)
2) Process them on the same thread. You're gonna need an algorithm that previously determines which objects are likely to interact with which objects; so that you can split them into groups of bodies that are isolated from each other; and update those groups in different threads. This is how it's usually done (google "Physics Simulation Island").

It's easier for Graphics scene graph culling and render queue parsing, because objects don't usually interact with other objects (except for node inheritance which is a trivial case), therefore makes it a good candidate for the "parse 1-250 on core 1; 251-500 on core 2" approach.
0

Share this post


Link to post
Share on other sites
Unless your game has performance problems which can be effectively resolved by offloading work to additional threads, you should keep it single threaded.

The [url="http://en.wikipedia.org/wiki/KISS_principle"]KISS[/url] principle is generally useful.

[quote name='Matias Goldberg' timestamp='1341171571' post='4954622']
You're gonna need an algorithm that previously determines which objects are likely to interact with which objects
[/quote]

I think that AABBs (Axis Aligned Bounding Boxes) are typically used make that determination.

I mention this because googling "Physics Simulation Islands" doesn't seem to provide a treasure trove of information (as one would expect). However, googling for "Axis Aligned Bounding Box" ... Seems like a better starting point.
0

Share this post


Link to post
Share on other sites
"Island" is a standard term for a group of simulated bodies that don't interact with anything else.

AABBs are a far more general-purpose concept than a physics island, so it stands to reason that there's more reference material talking about them.

Also, AABBs are insufficient for island membership determinacy. You need to do a shape cast to get correct results in general. There [i]are[/i] spatial partitioning techniques which are useful for culling the casts needed to do island detection, but they are generally nontrivial. (Some are even carefully protected trade secrets.)
0

Share this post


Link to post
Share on other sites
[color="#000000"]OpenGL is perfectly happy with multiple contexts sharing the same handle space via sharelisting. (However, not all flavors of OpenGl currently support sharelisting; OpenGL ES, for example, though it might in the future, according to the folks at Khronos.) [/color]

[color="#000000"]For 'desktop' OpenGL, It is often beneficial to have 'prep' threads and a main render thread that consumes resources prepared by those prep threads. But for that to happen, the contexts in each thread must share the same handle space. Prep threads can be used to isolate disk and other latency that must be dealt with to prepare resources from the main render thread, which should only ever deal with prepared resources.[/color]

[color="#000000"]A resource pool manager(that delivers availan;e handles and accepts freed handles), plus sharelisted threads isolated by threadsafe FIFOs, is more than adequate to guarantee collision free operation without expensive locls. (The only locls required are in the low duty cycle updates to FIFO state and pool manager state; the prep threads and main render thread spend most of their duty cycle prepping and rendering, and little time changing FIFO state, which is simply a matter of updating a couple integers for head and tail.)[/color]

[color="#000000"]Headsup with sharelisting; make sure all contexts that are going to be sharelisted are requested before any of them are selected as a current opengl congtext(and this for sure, before any resource handles are allocated among the contexts that will be sharelisted. Sharelisted means 'share the same resource handle space' which is required for multithread Opengl.[/color]

[color="#000000"]Headsup with the design of the threadsafe FIFO; it must use a two-step allocate and release model, because there is finite execution time between when a handle is pulled and when it is prepped or consumed. But that is easily done. A FIFO object is basically tracking a head and a tail in a circular fashion, with some maximum FIFO size. The FIFO should provife booleans for IsFull, IsEmpty, etc.[/color]

[color="#000000"]You don't have to do any of that when you write a game. It adds complexity. But it provides performance and behavior you can't achieve in a single threaded model.[/color]

[color="#000000"]as in --------Please wait....scene loading---------...[/color]
0

Share this post


Link to post
Share on other sites
You can still benefit from more threads than cores; here is an extreme example: a single core/single lane machine.

Is there ever any need for multithreading on such a machine?

Sure; suppose you have a prep thread that is waiting on I/O or some other condition, like a FIFO being less than half full. It can yield while waiting to another thread. If you are single threaded, then that single thread eats all the latency in your model, and latency can't be hidden at all..

Another approach would be to put that thread in a tight loop constantly polling to see if the FIFO was less than half full but that is the poont od using FIFOs... lets assumr its a filling or prep thread; its job is to make sure that its draining thread never sees an empty FIFO. The prep thread can periodically detect 'half empty. wake up and process some number of resources, and then yield again.

Same for mutli-core/multi-lane machines. As long as threads can intelligently yield when possible(they are waiting for some condition, like a FIFO to be less than half full) then that time yielded can be used by another thread.

If you have threads that never yield at all, then each thread will try to consume a complete core. (Depending on the O/S, it will still occasionaly lose bandwidth and be parked, but it will always be crying for attention. Sometimes that is a necessity, but if such tight polls or freewheeling threads can be minimized, modern O/S can manage lots of well behaved yielding threads far in excess of the number of available cores/lanes...
0

Share this post


Link to post
Share on other sites
[quote name='ApochPiQ' timestamp='1341185161' post='4954689']
Also, AABBs are insufficient for island membership determinacy.
[/quote]

Could you elaborate on that?

I'm pretty sure that AABBs can be used to determine if two objects could potentially collide (I think this is bascially what "broardphase" is all about), and if they should therefore be on the same "Island" (for a hypothetical multi-threaded "narrowphase" that would follow).
0

Share this post


Link to post
Share on other sites
This is another one of those "no absolute answer" questions and reminds me of your (OP's) earlier "how many books should you read?" thread.

A game needs as many threads as it needs, no more, no less. Sometimes one is enough, sometimes you need more, and sometimes your driver(s) and/or the D3D runtime (if using D3D) will spawn extra threads for you even without asking. If a game is doing work that it might be useful to send off to another thread, and if there are no major design consequences from doing so, it can do so. The author may or may not choose to do so depending on other priorities, performance considerations (it may already be fast enough) and a whole host of different and messy factors. Things are not so neatly compartmentalized as to have any kind of "you should have X threads" rule.
1

Share this post


Link to post
Share on other sites
[quote name='Goran Milovanovic' timestamp='1341266280' post='4955074']
I'm pretty sure that AABBs can be used to determine if two objects could potentially collide (I think this is bascially what "broardphase" is all about), and if they should therefore be on the same "Island" (for a hypothetical multi-threaded "narrowphase" that would follow).
[/quote]
I think he's referring to continuous collision detection, which is an advanced topic. The AABB in that case is not enough because would need to know if the Body may collide with any other body along all it's movement during a single frame, which can be pretty large.

Anyway, he may also be referring that, although AABBs are used for determining islands, note that if A interacts with B, and B interacts with C, then A, B, & C need to be put in the same island.
Additionally, you need a spatial algorithm so that islands are lazy updated. If you regenerate every island on each frame, it's going to be slow and adds a lot of overhead running in a single thread when you're trying to run things concurrently.
A spatial structure will only update the islands when the system flags a body is potentially about to leave an island, or cause two islands to be merged. But meanwhile, every frame doesn't need to recreate the islands, and therefore run only the concurrent-side of the code (speeding things up).

Recreating the islands every frame using brute force is slow. Worst case scenario every object interacts with every object, which has O(n^2) complexity. I guess that's why ApochPiQ said aabbs aren't enough.
0

Share this post


Link to post
Share on other sites
[quote name='Goran Milovanovic' timestamp='1341182473' post='4954673']I mention this because googling "Physics Simulation Islands" doesn't seem to provide a treasure trove of information (as one would expect). However, googling for "Axis Aligned Bounding Box" ... Seems like a better starting point.
[/quote]
Sorry for that. Download [url="http://www.havok.com/try-havok"]Havok SDK[/url] (it's free for developing games), go straight to the +1000-page manual and look for simulation island.
0

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1341293591' post='4955170']
[quote name='FGBartlett' timestamp='1341197175' post='4954738']
You can still benefit from more threads than cores; here is an extreme example: a single core/single lane machine.

Is there ever any [b]need[/b] for multithreading on such a machine?[/quote]No, there isn't very often a [b]need[/b] for it -- it may be one way to solve the problem, but I assure you there's probably a single-threaded solution as well.[color=#000000][quote]But it provides performance and behavior you can't achieve in a single threaded model.[/color]
[color=#000000]as in --------Please wait....scene loading---------...[/color]
[/quote]For example, background loading of scenes definitely is possible in a single-threaded game...
[quote]Sure; suppose you have a prep thread that is waiting on I/O or some other condition, like a FIFO being less than half full. It can yield while waiting to another thread. If you are single threaded, then that single thread eats all the latency in your model, and latency can't be hidden at all..[/quote]Threading should not be used for I/O-bound tasks (only processing-bound tasks). The OS is designed to handle I/O-bound tasks asynchronously without using extra threads already -- use the OS's functionality instead of reinventing it with extra threads.
If a single-theraded program wanted something to occur when a FIFO was half-full, it would likely use a callback that is triggered by the push function.
[quote]If you have threads that never yield at all, then each thread will try to consume a complete core. (Depending on the O/S, it will still occasionaly lose bandwidth and be parked, but it will always be crying for attention. Sometimes that is a necessity, but if such tight polls or freewheeling threads can be minimized, modern O/S can manage lots of well behaved yielding threads far in excess of the number of available cores/lanes...[/quote]But if you're writing a [i]high-performance real-time system[/i] ([i]such as a modern game engine[/i]), then you want a small number of threads that hardly ever yield to get predictable performance. Yielding a thread on Windows is totally unpredictable in length, with your only guarantee being that it's unlikely to be longer than 5 seconds ([i]although yes: that case shouldn't occur unless you're massively oversubscribed[/i])...
[quote][color=#000000]Headsup with the design of the threadsafe FIFO; it must use a two-step allocate and release model, because there is finite execution time between when a handle is pulled and when it is prepped or consumed. But that is easily done. [/quote]What's this "[/color][color=#000000]two-step allocate and release model"? Is that specific to your OpenGL resource FIFO, or are you talking about thread-shared FIFOs in general?[/color]
[quote][color=#000000]A FIFO object is basically tracking a head and a tail in a circular fashion, with some maximum FIFO size. The FIFO should provife booleans for IsFull, IsEmpty, etc.[/quote][/color]
[color=#000000]Functions like [/color][color=#000000]IsFull[/color][color=#000000] and [/color][color=#000000]IsEmpty[/color][color=#000000] are nonsensical in the context of a shared structure like a multiple-producer/multipler-consumer FIFO -- they can never be correct. It makes sense for [/color][color=#000000]Push[/color][color=#000000] and [/color][color=#000000]Pop[/color][color=#000000] to be able to fail ([/color][i]if the queue was full or empty[/i][color=#000000]), but simply querying some state of the structure, such as [font=courier new,courier,monospace]IsFull[/font] is useless, because by the time you've obtained your return value, it may well be incorrect and any branch you make on that value is very dubious.[/color]
[/quote]

Re: FIFO half full. There is no need for this to be precise; the point is, if you respond to the event FIFO Empty, it is too late. The assumption is that 'about half a FIFO' is enough latency to respond and keep the FIFO 'not empty', which is all the draining thread cares about. You never want to starve the draining thread or you get a stall.

Re: Two Step FIFO accessors. They can always be safely used; the STL variants can sometimes be used, with more care, and if your resource model changes, you have to review each usage. So I always use the two step scheme. IOW, if the two step variants are used, they are always thread safe. If the STL single step variants are used, they are sometimes thread safe. It depends on your resource model, how and if they are cycled, reused or shared.

[i]If you are using a pooled resource model [/i](resources cycled/resued)when both the filling thread and draining thread are finite in time between accessing a FIFO member and doing soemthing with it, you don't want the act of accessing the FIFO member to change the state of the FIFO; you want the act of releasing that FIFO member to change the state of the FIFO. The two step FIFO usage makes that explicit. (It can usually be implicitely handled without the two step process...and that will work as long as it is. The explicit two step process makes it harder to implement this wrong.) It can be and usually is arranged that every filling thread is done with the resource before touching the FIFO state, As long as it is. And dittto the considerations for the draining thread.

The assumption is, only the filling thread adds to the FIFO and only the draining thread pulls from the FIFO. So the filling thread a] get the next FIFO slot, b] does something to the associated resource(even if just to define it)and c] releases the FIFO slot, changing the FIFO state. Ditto the draining thread. Otherwise, if the act of accessing the FIFO slot simultaneously changes the FIFO state, you could have a condition where an EMPTY FIFO immediately changes state to NOT EMPTY, the waiting draining thread accesses the resource in process, and the filling thread is not finished prepping the resource.

The above assumes that some kind of pool of resources is being reused/cycled, not being continuously allocated anew by the filling thread. In that case, a filling thread -could- create the resource complete and then change the state of the FIFO and no harm. And the draining thread can as well, because the scheme is not re-using resources (like a buffer, FBO, VBO, or texture handle) but continuously allocating them and destroying them. But if you switch to a rotating pool of resources(to eliminate the constant creation/destruction of resources)then you might run into this need for 'two step' FIFOs.

These two step thread safe FIFO things are actually pretty simple. They are tracking a few integers (head, tail, max, count) and maybe maintaining a few booleans(FIFO empty, FIFO full, FIFO half full, etc., to drive events). (The models I usually use don't actually try to push resource objects themselves through any FIFOs-- the resources are from a pool and aren't copied, but are recycled. A Pool manager allocates a new resource if a free one of the requested flavor/size isn't available. When the first filling thread in a chain needs a resource, it requests it from a PoolManager. When the last draining thread in a chain is done with a resource, it returns it to the PoolManager. The PoolManager, most of the time, is simply changing some state value on the resource, to make it served or available. I also usually wrap the resource with some pool attributes so I can trace which thread is currently banging on a particular resource. But the models I use push handles to resources through the FIFOs. Because the locks are on the FIFOs and not the resources, because the thread access to these FIFOs is a low % of total thread bandwidth (beginning and end of each thread process cycle), and because mods to the FIFO are trivial, you really have to work at it to serialize your threads using this model. The concept of the FIFO itself isolates resources between the threads. The locks are not on the resources, but on the objects that isolate the resources. In that sense, the resources themselves are never locked, but isolated just the same. The things that are locked are seldom accessed(on a % basis). So no thread is ever left starved waiting for a resource conflict while any length process is being done on it.

A draining thread only cares about FIFO EMPTY. If its resource source FIFO is not EMPTY, it can process. If it is EMPTY, only its filling thread can change its state. Same thing with the filling thread. It only cares about FIFO FULL. If its output FIFO is not FULL, it can process. If it is FULL, only its draining thread can change its state.

In most chained thread models, the gating thread is the final compositor thread that pulls resources from its source FIFOs at whatever frame rate is required. The filling threads that service the FIFOs either need to keep up or else the render thread will be starved and frames will be skipped. But that is always the case, even in a single thread model. The output is usually driven by some target frame rate.

This gets hairy in real time video processing models, in which there exists both an input contract (sampled video input frames) and output contract(output video frames) This is a 'two clock' gating problem, even if it is the same clock, and in this case, the function of all those FIFOs in the process is to provide compliance for latency. This is why video processors almost always have a video processing delay; there is significant compliance in the streaming model, to accomodate latency. Video processors must lag the output video to accomodate this. You can always tell when they don't because the audio will be ahead of the video by the amount of the video processing lag.)

This is why, in the old days with DirectShow, you always saw canned examples of video to disk, and disk to video, but never video to video... it was a largely rigid model tolerant of only one gating sink or gating source. You can always cache ahead disk access, smooth it out and gate video out, or gate video in and cache it to disk, but gating both video in and video out in a streaming model is a challege. And FIFO's as caches are critical elements. Also, no way anything like that happens in a single threaded model. If live video input(not from store, but live video)ever becomes a significant part of game processing, this will become apparent. Games might tolerate glitch/missed/stuttering frames in the playback, but broadcasters definitely do not.

I also diagree re; mutlthreading I/O, even async. If your process spends any time at all waiting for an async I/O to complete, that time waiting can be put to better use. I just completed a project that demanded highest possible throughput to disk, and it was a streaming model that was not only async but multithreaded; in practical terms, it was the difference between a disk access light that blinked and one that was solid, running at full bus bandwidth. This also required lining up write sizes with target sector size multiples, unrelated to threading, but while this is occurring, a 400Hz update streaming waterfall plot is being handled as well, part of the same streaming chain. (The GUI thread isn't updated at 400Hz, but the FBOs are updated in the background at 400Hz and presented to the foreground GUI at reduced frame rate as a streaming freezable/scrollable in the GUI waterfall plot, without interrupting continuous stream to disk. I don't think anything close to that is possible in a single threaded model. Not only would you be trying to do it with maybe 1/8th the available bandwidth, but any time waiting for async I/O to complete is lost..
0

Share this post


Link to post
Share on other sites
Sorry, I meant lag the audio-- the audio in must be delayed to match the video lag. These are easy to build-- continuously running samplers, to a ring buffer, with an offset to drive the output sampling. The offset determines the audio delay.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0