## Recommended Posts

AoS    935

So I am working on various strategy game stuff and I tried to figure out how to use multiple cores to speed up my game. From what I gathered the best way to use threads is to run a single main thread and pass off tasks from the same subsystem rather than splitting off each subsystem into a thread.

Like I run this:

Simulation

AI

Physics

Graphics

Sequentially and get my performance from having each step use as many cores as the system has available. So during the AI step I would assign a main AI thread that would make high level decisions and then spin off low level tasks in a hierarchy to separate threads/cores.

I was given to understand that this method scales better than separate threads for subsystems. For instance a particle engine has a ton of tasks its can run separately and you can pretty much split them up arbitrarily. So you look at the number of cores available and split the tasks more or less evenly among them and if you have more cores to use you just split the tasks so each core does less.

Is that the best way to use multi-threading? Do I have something wrong? I understand that advice can really only be general since you don't have detailed information on my project.

##### Share on other sites
Hodgman    51324
Yeah that's the way that I utilize multiple cores, and it's the same method that the last few console games that I've worked on have used too.

To really simplify how the "job" type model works, a simple API might look something like this:
struct Job { void(*function)(void*); void* argument; };
JobHandle PushJob( Job* );//queue up a job for execution
void WaitForJob( JobHandle );//pause calling thread until a specific job has been completed
bool RunJob();//pick a job from the queue and run it, or just return false if the queue is empty

The main thread adds jobs to the queue using PushJob. If the main thread want to access data that's generated by a job, then it must call WaitForJob (with the handle it got from PushJob) to ensure that the Job has first been executed before using those results.
Your worker threads simply call RunJob over and over again, trying to perform the work that's being queued up by the main thread.

I also use another similar pattern, where I'll have all of my threads call a single function, but passing in a different thread ID, which is used to select a different range of the data to work on, e.g.
inline void DistributeTask( uint workerIndex, uint numWorkers, uint items, uint* begin, uint* end )
{
uint perWorker = items / numWorkers;
*begin = perWorker * workerIndex;
*end = (workerIndex==numWorkers-1)
? items                      //ensure last thread covers whole range
: *begin + perWorker;
*begin = perWorker ? *begin : min(workerIndex, (u32)items);   //special case,
*end   = perWorker ? *end   : min(workerIndex+1, (u32)items); //less items than workers
}

void UpdateWidgets( uint threadIdx, uint numThreads )
{
uint begin, end;
for( uint i=begin; i!=end; ++i )
{
m_widgets[i].Update();
}
}

##### Share on other sites
KnolanCross    1974

The problem of using multi threads with AI is that it is very hard to avoid race conditions.

I will use pathfinding as an example because it is the most common use of AI I have seen in this forum. Say you are running A* on your graph, if one thread has processed a node and another thread changes that same node, the result won't be reliable. Worst results may happen, as referencing a node that no longers exist, resulting in a segmentation fault.

Using a task system is a good way to have your system executing several independent parts of the code at once (for instance, physics simulation, rendering, sound and particle effects math), but when it comes to IA is not that simple. Of course that this depends a lot on how your system works, if, for instance, your game doesn't allow path blocking (common on tower defense games), you may run pathfinding algorithms at the same time.

So, this is the best way to use threads on a game, but hardly likely to be the one for AI, also you must always keep in mind that you must no introduce race conditions.

Edited by KnolanCross

##### Share on other sites
AoS    935

Okay thanks. Just wanted to make sure I was starting at the right place. It would be horrible to do a lot of work and learn a bunch of stuff only to later realize I picked a poor method.

Now I just have to really dig into implementation.

I do have another question. How much benefit is possible here. Say single threaded vs multithread with 2/3/4 cores available? Obviously the specific implementation affects this value, but on average how much performance gain is there on RTS? Assuming you would have some way to know that.

I did google for info about multi-threaded RTS but there are only a couple of relevant results.

I know that most open source RTS games aren't multi-threaded, although I believe the Spring devs recently had a HUGE fight over making it multi-core compatible. But Glest stuff, 0AD, most other people don't seem to be doing it. So there isn't a lot of help I can get there.

Do you know of any good resources to help with this stuff?

##### Share on other sites
AoS    935

The problem of using multi threads with AI is that it is very hard to avoid race conditions.

I will use pathfinding as an example because it is the most common use of AI I have seen in this forum. Say you are running A* on your graph, if one thread has processed a node and another thread changes that same node, the result won't be reliable. Worst results may happen, as referencing a node that no longers exist, resulting in a segmentation fault.

Using a task system is a good way to have your system executing several independent parts of the code at once (for instance, physics simulation, rendering, sound and particle effects math), but when it comes to IA is not that simple. Of course that this depends a lot on how your system works, if, for instance, your game doesn't allow path blocking (common on tower defense games), you may run pathfinding algorithms at the same time.

So, this is the best way to use threads on a game, but hardly likely to be the one for AI, also you must always keep in mind that you must no introduce race conditions.

I was thinking their might be a problem with AI. But I was thinking maybe I could use it for decision making, rather than pathfinding AI wise. And I am planning to add more particle stuff so I was hoping it could add speed ups there.

I am going to have quite a complicated AI I think decision wise, but I am trying to come up with other places it might improve stuff.

Also I was thinking of adding some less traditional stuff to the game which might help. Basically part of my plans involves more than one map, although probably I'll represent maps you aren't using as abstract rather than running all the code. I wanted to make a game where you manage multiple villages/cities that evolve into a kingdom with trade and technology exchange, but still having the actual 3D map, as opposed to something like Total War or Paradox games. I was thinking that even the abstract representation of a lot of cities might be heavy duty enough cpu wise to benefit from multicore.

##### Share on other sites
wodinoneeye    1689

One thing to consider about paralleization is the data locking needed to make the data coherent (ie- state changing because different threads are doing smaller changes and the data isnt independant of what other thread also do)

Lots of data locks for fine granulated  processing  has often ALOT of overhead and can result in inefficiencies , which can be beat out by a single thread doing the entire pass of processing  (with more cores and smaller individual core this is getting less so)

Independant processing which simultaneously acts on different set of data can be assigned to different 'cores' with little need for any locking mechanisms.

ex- pathfinding working off of a locked down current state of map  (many may run sim,ultaneously independant because the data is read only)

and at the same time planners for next/future actions can be working from each objects context - again simultaneously.

The outputs of these are directives for actions in the future which can go thru to  a single atomic queing process - but the locks are used so infrequent that they dont amount for much overhead (and have little waiting to stall other threads)

The entire game loop has the phases where current data is transformed into the next current data which is locked down so the AI can work on it.

Other tasks like graphics/network comnmunications/inputs can be simultaneously working on the AI/object state data previously created  --  pipelined in big independant data lumps - with entire state data set copies 'buffered'  (unchanageable by the other processing).

We are talking specificly AI here and if you have multiple objects (or even competing solutions for each objects state)  you have parallelization opportunities as long as you can freeze the state being considered (to avoid huge data lock related problems)

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account