Multiple thread fun, what goes where, how many is too many? +questions

Started by
7 comments, last by Kylotan 15 years, 9 months ago
So I have this beast of a computer which set me back like $2000 (Xeon X3350, 4GB RAM, 9800 GTX, 2x600GB HDD in RAID0). Suffice to say, it can run anything. I got it for the sole purpose of being able to develop games with DX10, using all the bells and whistles. NB: The X3350 is a quad core... I'm using ONE of them... This seems like a huge waste of resources on my part, lol. Seeing how I spent so much, it's annoying me that I'm using just the one core. Since I've only been writing my game for a little over a month now, and I have a semi-working engine (I'm trying to write it generically while I make the actual game), I haven't bothered with multi-threading since I didn't want to fall into the trap of writing in useless features from the start. But I'm at the point where there's a clear point where a thread would fit nicely, and over the production of my game/engine, the FPS count has slowly been decreasing from over 3000FPS to now ~700-800FPS (yes, I know, FPS is a bad measurement, but I want that back up!). TLDR Version: What should I put into its own thread? I figure updates as one, and rendering as another. What else do people put into their own threads? Sound? Input? Script interpreter (Lua) (considering this one)? Also, how sensitive is shared data? I know at the very least I'll need some sort of Mutex/Critical Section for when a thread wants to WRITE to something, but what about when they're just reading? Can two or more threads read from the same space at the same time? Taking note there is the possibility that they could both directly read something at exactly the same time as I have more than 1 core. And finally, what is the effect of having too many locks/critical sections? Is locking/unlocking a heavy process? I'd really like to have Lua in it's own thread, since it has a lot of update related code in it, but I'm not sure how to protect the (possibly another thread) renderer from trying to read an entities details while Lua is trying to update it. There are so many possible points it could be editing something, and it seems pointless to just lock everything (might as well just have one thread, lol), but sprinkling locks all over the place would be bad... Right? Help! Ty, lol.
Advertisement
For busy threads, the rule of a thumb is 2 per core.
For some tasks, optimal number of threads will be n_cores-1.

Quote:lso, how sensitive is shared data?
Very
Quote:I know at the very least I'll need some sort of Mutex/Critical Section for when a thread wants to WRITE to something, but what about when they're just reading?

Reading is fine. Just make sure that everyone that was writing to the memory has finished.

Quote:Can two or more threads read from the same space at the same time?
Yes.

Quote:And finally, what is the effect of having too many locks/critical sections?
The question is mostly irrelevant.

If you end up with a heavily contested resource (global lock, for example), your application is likely to degrade to single-threaded performance (+overhead).

Quote:Is locking/unlocking a heavy process?
Very, and worse, it's redundant process.

For ideal and optimal concurrency, you should replicate everything you can across threads, and only exchange minimal information as needed. Read-only access is fine.

While sharing and synchronizing data usually cannot be avoided, the panacea of concurrent programming is doing things in parallel and not waiting for something. Locks make you wait.

Quote:What should I put into its own thread?
It's not really that simple. Sometimes you put single operation into n threads, sometimes you run tasks in arbitrary threads, depending on which is available, sometimes you have chains of tasks, passing data through lock-less queues, etc...

Google for Herb Sutter's articles and lectures, which give some of current approaches to general concurrency, as well as costs and issues involved. Despite everything said above, the answer is far from simple.
Second for Sutter's work.

There seem to be two ideas on how to parallelize games. The first is give each task its own thread(renderer, AI, .... ). This is good because it is easy. It is bad because it doesn't scale beyond about 5 threads.

The second is what I call branch and bound(haven't seen a term used consistently for games but the idea is data level parallelism were as the above is called task level parallelism). In this model the game loop is the same as the single threaded version, but once a task has been entered threads branch out and do as much in parallel as possible. It scales much better since with 50 cores you could do path finding for 50 guys at the same time with 50X improvement in performance.

Quote:Also, how sensitive is shared data? I know at the very least I'll need some sort of Mutex/Critical Section for when a thread wants to WRITE to something, but what about when they're just reading?

Typically you lock on both read and write since you don't want to read inconsistent state. If you can guaranty that you are reading what you need to without the lock then don't lock reads.
Quote:
Quote: And finally, what is the effect of having too many locks/critical sections?


The question is mostly irrelevant.


Not really -- the effects of too many locks/critical sections is often "your program deadlocks and stops working". :) Of course the other way ends up with random memory corruption, which is worse!

Generally, the problem of learning to program multi-threading is about as hard as the problem of learning to program. There are idioms and patterns you can use to prevent everything from blowing up and dieing a few months/years down the road when your initial hacks stop scaling, and everything falls apart.

Shared state is really expensive. For rendering, you'll want to as rapidly as possible send data off to the Renderer for the Renderer to own.

A simple method would be an A-B-C message queue.

State engine updates the states A B and C, except the one that the render queue is using.

Render queue locks the next state of A B or C that isn't being worked on by the State engine.

This results in extremely minimal contention of this resource, but it requires lots of extra memory, and lots of copying of memory from one place to another.

Of course, this gets extremely tricky when you end up working on the details.
Quote:Original post by NotAYakk

Not really -- the effects of too many locks/critical sections is often "your program deadlocks and stops working". :) Of course the other way ends up with random memory corruption, which is worse!


If you hit this limit, you have a problem elsewhere, in the very way application manages concurrency, and that's the first place to start solving the problem.
Quote:Original post by Antheus
Quote:Original post by NotAYakk

Not really -- the effects of too many locks/critical sections is often "your program deadlocks and stops working". :) Of course the other way ends up with random memory corruption, which is worse!


If you hit this limit, you have a problem elsewhere, in the very way application manages concurrency, and that's the first place to start solving the problem.
QFT.

If you have any deadlocks or memory corruption, it is due to bugs in your program and not inherent to multithreading.


Writing concurrency-safe code is not much harder than writing exception-safe code. It takes some thought and planning, but the end result is much better for the effort.
I beg to differ. Writing concurrency-safe code is extremely hard.

Writing it as "good enough" so it deadlocks only rarely isn't that hard. But writing any decent size project with high amounts of concurrency and blocking locks that never deadlocks is very difficult.

The essence of the problem is that every use of every lock interacts with every other lock in the entire program in a potentially deadlock-causing manner. Each function carries with it the implicit lock state of every context in which it can be called, and interacts with every other function that locks any of the same resources.

There are patterns that help avoid this problem -- restrict the use of locks to tightly defined places doing tightly defined things. But that basically reduces the use of locks as a means of implementing some other multi-threaded pattern, like message passing.

It is true you can avoid this -- but much like using global variables, hooking state in far distant parts of the program together is dangerous.
I haven't actually implemented anything yet, mostly just reading up on it. But how would a "State"-like system work?
What I mean is, basically, duplicating everything... Kind of...

I've pretty much come to the conclusion that if I want it to be truly efficient and really take advantage of the entire machine, and all cores, all the time, I'll need to make it so that there are no "dedicated-type" threads. So once I figure out how it's all going to work (what needs to update where and what not), I'll have threads where I send it a block of data/events to be processed, and it does it (I think someone else mentioned something similar earlier).
The problem with this method though, is that you'll either need a lot of locks for areas that can be modified by a lot of other areas (in my game Lua *can* (user defined) have a lot of "events" like OnUpdate, OnClick, OnKey, OnDrawBegin, etc, etc), or some serious thread safe coding (AtomicInts, etc; likely to add just as heavy an overhead in the end).

So I was thinking, what about essentially holding multiple copies of everything? Yea, bad on memory, but seriously, I have 4GB of RAM, and that's quickly becoming the normal, RAM really isn't something I'm concerned about.

So on a very simplified scale, we take a simple Entity (like an NPC), we've done our updates to it, and it's ready to be rendered. The problem is, what happens if the thread that is required to render it, attempts to render it when another thread now tries to update it again? Why not just duplicate that NPC data, shove it off to the render thread, and let it delete it when its done, thus negating any need for a lock (except during the copy I guess).
I'm not really sure how heavy the effect of duplicating lots of memory is, but it has to be better than locking it in the render thread.

If this isn't feasible, what other ways could I try?
Quote:Original post by NotAYakk
I beg to differ. Writing concurrency-safe code is extremely hard.


Not really...

Quote:Writing it as "good enough" so it deadlocks only rarely isn't that hard. But writing any decent size project with high amounts of concurrency and blocking locks that never deadlocks is very difficult.


... the idea being that you stop using low-level locks around things and use queues and message passing instead. Unfortunately too many people think concurrency just means sticking a few mutexes around stuff they've decided they want to share.

This topic is closed to new replies.

Advertisement