Multithreading madness

Started by
4 comments, last by Antheus 14 years, 10 months ago
Hello guys! Finally I've moved to multithreading programming...I know that this is the present and also the future, but actually I cannot find good resources on HOW many engines uses multithreading. I mean, the granularity of the tasks done by threads can vary a lot with also the performances. I found a very good article here: http://www.gamasutra.com/view/feature/3941/sponsored_feature_designing_the_.php?page=6 but even if it talks about an engine multithreading, it doesn't talk enough (for me) about the granularity of the threads. Actually I don't like the multithreading approach based on dividing the work as Rendering - (physics + ai) and stuff like that...I've always thought about more granular kind of tasks. How do you handle multithreading and its granularity in your engines? Do you double buffer data? I'm really curious about this! Thanks guys!
---------------------------------------http://badfoolprototype.blogspot.com/
Advertisement
What do you mean by granularity of tasks?
If you mean what level of tasks can be threaded you can do as you please but you will always end up having to do some sort of thread controller to do synchronization and eventually spawning and despawning.
For the extreme version look into CSP which is a technique that more or less says one entity -> one process (thread), the process then uses a alphabet to communicate it's actions via channels to related processes.
There exists CSP libraries for Java/C#/C++/Python and more.
I have worked some with CSP and it is truly a exciting area. And you can do some amazing things with it.
I can also say that implementing an engine using this technique would be somewhat unorthodox but I do see possibilities.

I don't have a good article my self but before you consider using threads I find that you need to make some more general decisions first.
Are you prepared to spend more time debugging and tracing?
Is your engine large/complex enough to justify threading?
If not you can easily do it single threaded.
Unless you do some processor affinity masking explicitly (which should be handled with care) your threads are likely to be executed on one core anyways so the performance gain can be disssapointing.

Personally I divide my engine somewhat similar to what your article suggests.
I try to set the local workspace as large as possible to minimize synchronizations (which can kill performance). This leads to increased memory usage but better performance and smoother flow.
There are several reasons but the most important one is that they can work independently at individual frequencies.
The downside is securing shared resources that can lead to deadlocks race conditions etc.
If not done properly you are better off not using threads.
Quote:Original post by JorenJoestar

it doesn't talk enough (for me) about the granularity of the threads.


Smoke is as granular as it gets (perhaps even too granular, some task splitting parts of source code were commented out last I checked). Did you read the technical articles on Intel's site?

Quote: Actually I don't like the multithreading approach based on dividing the work as Rendering - (physics + ai) and stuff like that...I've always thought about more granular kind of tasks.


It's not that easy. Some problems can be decomposed via vectorization, others into tasks. Smoke combines both approaches.
ok back the truck up a bit.

a) multithreading things for the hell of it, and not for actual performance reasons, will make your code more complex, more difficult to debug, and generally more of a PITA for no apparent reason.

If your not really planning on pushing the boundaries of, well, something, then its really a waste of your time.

b) prototype multithreading using open mp, to see if there will likely be any benefit from multithreading things.

personally, i favor a job approach. A game is a linear set of steps which must be completed in order. the trick is to go as wide as you need to at each point to attain the correct level of performance. So for example, you would run physics as a step, then render list generation as a step, and so on.

if your really mad, have a read of CSP (communicating serial processes). I dont recommend following it entirely but the idea of dependency less programming is something worth thinking about.

oh on that topic, if you want to write good MT code
- no globals
- no static functions
- no singletons
- and NO GOD DAMN GLOBALS.

did i mention globals?
the whole trick is avoiding dependencies. the less dependencies, you have, the easier your code is to run on multiple cores.












your never as good as they say you were, never as bad as they say you was.
I use a job system similar to this.


Each entity is a job. At the start of a frame (not 'frame' as in rendering, just a processing 'cycle'), the main thread schedules all the active jobs into one list per core. Each core then starts up and goes thru it's list of jobs. Rinse and repeat.


Because each entity is updated in parallel, inter-entity communication is not thread-safe, so I use a kind of message passing that I call "deferred function calls".

When an entity calls a function in another entity, the parameters of the function (and a pointer to the entity and a pointer-to-member-function) are automagically pushed into a wait-free queue of 'actions' to be performed next frame.

At the end of the frame, the action queues are merged together (each core has it's own queue to eliminate sharing). Each entity (which is also a job) that has actions to be performed is marked as 'active' (i.e. the job will be executed next frame) and the actions are pushed into the entity.

When an entity (job) is executed, it loops through it's actions from the previous frame and uses the pointer-to-member-function + stored parameters to actually run the "deferred function".


[EDIT] I haven't had to tweak thread affinities yet; on a quad core, 4 threads get spread over the 4 cores by the OS.

P.S. Never use mutexes - they absolutely kill performance/scalability and they're just a complex design pattern that is a source of bugs. Much better to schedule away sharing problems instead. If you do really need a lock, write your own Futex instead of using the OS's Mutexes and it will perform much faster.

Also, if you're serious about designing a parallel programming system, then I highly recommend reading all of the Effective Concurrency articles!
Quote:Original post by Hodgman
I use a job system similar to this.


Each entity is a job. At the start of a frame (not 'frame' as in rendering, just a processing 'cycle'), the main thread schedules all the active jobs into one list per core. Each core then starts up and goes thru it's list of jobs. Rinse and repeat.


Further reading: Actor Model, Stackless Python and Active Object.

This topic is closed to new replies.

Advertisement