Sign in to follow this  
miggs

multithreading approach

Recommended Posts

while reading some stuff about the boost::thread lib i found http://www.paulbridger.com that seems to be well known around programmers.

thanks to that article i came up with a few ideads/possible approaches for the engine a few friends and me are developing.

I wanted to ask some you guys here that are more experienced with multithreading in games about the solutions you came up with and if my ideas are going in the correct way.

i either want...
1)... my game components to contain a boost::thread as a member to execute their seperated update functions (for basic logic, physics, ai..) by themselfes
my fear is that this might create to much overhead

2)... have a threadpool like classes for ai/rendering/physics..., similar to the activeobjectproxy in paul bridger's example, and each game component adds function pointers to the appropriate threadpool. if possible then even the whole class
(i think i prefer this, if my assumption is correct: that all functions/classes in one threadpool can access each other without locking as they are in the same thread - but please correct me, i think i mix up some stuff)


3)...
i have a tree setup of gamecomponents (classes) that have functions for ai/phyics and more -
maybe i should run the a game loop multiple times for ai, physics, basic logic and others, that all have access to my root gamecomponent, and have alle the loops for physics/ai.. proccessing the belonging gamecomponents function and locking/unlocking the needed members?



i also have another question: if multiple threads access a class to only read variables while no other class is writing to that classes members, is that possible? or can only one thread at the time read?


thanks for your advice in advance

Share this post


Link to post
Share on other sites
Don't make too many threads as it is bad for performance. Don't create/destroy threads on the fly. Use a thread pool instead. Lock all access to shared data, but if you are having to lock something, rethink the problem.


Edit: I'm mostly rehashing stuff from THIS link.

The first step would be to put your big odd jobs into threads. Many APIs will do this for you. FMod sound is worked on in a thread. FFMpeg video would be decoded in a thread. Bullet physics has a multithreaded solver. Graphics could be run in a thread separate from the game logic. Something like boost::asio could be used to load files in another thread.

Then take all your other things and build up some dependency graphs. You'll end up with several stages of independence, like "object update", "animation", "physics" and "effect update". For each stage, throw everything at a thread pool, then wait for the results before moving onto the next stage.

As for dependencies within one stage, double buffering can help there. If all objects read their shared state from last frames buffer and write to this frames buffer then you don't have to lock anything. If objects have to communicate with one another, send messages with thread-safe queues, or group up objects into units that get updated together on one thread. For instance, damage would be sent through messages. Or each individual "flock" in a boids AI could update as a group to keep the flocking code simple (flocks would be limited in size, but you could have lots of individual flocks). Other times, even if everything is trivially parallel (like a particle system), you still should update in minimum sized batches. One thread job per particle will kill performance, one thread job per system would be better. (sure particles could be on the GPU for even more parallel stuff, but it is easier to do complex motion affectors and such on the cpu imho)

Multithreading things is a different mindset.

As to your last question. You have to lock on both the read side and the write side. You can have multiple readers and no writers without a lock. You can have a single writer without a lock. You need a lock for multiple writers. You always need a lock for any number of readers if there are any writers. Lock as much code as needed to be safe, but minimize the number of locks and the amount of code executed while a lock is active. Ie, lock->pop->unlock->process instead of lock->pop->process->unlock.

Share this post


Link to post
Share on other sites
thanks!

your reply and that link helped me a lot, but i'm also a litte confused right now.


do you maybe have a link to a code sample how i could realize adding those stages to a thread pool?
from what i've read now, the thread pool launches a thread for each gameobject member function thrown at it, doesn't it? wouldn't that be kind of slow?
am i wrong here, or do i have to set up a whole parallel/asynchronus loop for each seperate stage ai/physics/networking..?

[Edited by - miggs on September 28, 2010 5:41:42 AM]

Share this post


Link to post
Share on other sites
Hi,

I've written a thread pool tutorial for the win32 api, however you can use those concepts for developing a boost thread pool as well

(PS - The page kinda looks like crap right now, because I'm in the process of cleaning things up, and posting up a variant of the threadpool as a lib)

http://keithmaggio.wordpress.com/code/c-win32-thread-pool-manager/

Share this post


Link to post
Share on other sites
Quote:

from what i've read now, the thread pool launches a thread for each gameobject member function thrown at it, doesn't it? wouldn't that be kind of slow?

No, a good thread pool launches several threads that idle around for work. When you give the thread pool a function to run, it is added to a job queue. All the active threads can then wake up and start popping items off the queue and working on them. Yes, there is overhead, but much less overhead compared to creating and destroying a thread for each task. Like I mentioned before, you'll want to batch up your updates for anything trivial.

Quote:

whole parallel/asynchronus loop for each seperate stage ai/physics/networking..?

You'd have to setup loops all over the place to launch thread-pool jobs. For some type of computation, you can make the code a little simpler by using Intel's Tread Building Blocks or OpenMP.

Also, by "stage" I was trying to get away from simply "ai" or "physics". Depending on how you have things setup, you could launch every single game object's update at once. But usually, to keep down the number of locks, and to keep down game latency, you'd break it up into parts. (i mention latency, because double buffering, message queues, etc. all delay actions by a frame.) You'd update all player weapons, then ai weapons, then players, then enemy ai, , then normal game objects, then update your animation system and physics, then update all game objects that needed to know about the bone positions (like attaching a weapon in your character's hand, or effects). All items at any one stage could be launched as one or more batches of work for a thread pool, but you'd waste some time waiting around so that you can still get order in the frame (ie weapons damage THIS frame not next). If your game can stand having everything delay by a frame or two, you could update all objects at once with buffers and queues like I mentioned.

(sorry I don't have any good samples for any of this)

[Edited by - KulSeran on September 28, 2010 10:29:42 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by KulSeran
You'd have to setup loops all over the place to launch thread-pool jobs. For some type of computation, you can make the code a little simpler by using Intel's Tread Building Blocks or OpenMP.

Also, by "stage" I was trying to get away from simply "ai" or "physics". Depending on how you have things setup, you could launch every single game object's update at once. But usually, to keep down the number of locks, and to keep down game latency, you'd break it up into parts. (i mention latency, because double buffering, message queues, etc. all delay actions by a frame.) You'd update all player weapons, then ai weapons, then players, then enemy ai, , then normal game objects, then update your animation system and physics, then update all game objects that needed to know about the bone positions (like attaching a weapon in your character's hand, or effects). All items at any one stage could be launched as one or more batches of work for a thread pool, but you'd waste some time waiting around so that you can still get order in the frame (ie weapons damage THIS frame not next). If your game can stand having everything delay by a frame or two, you could update all objects at once with buffers and queues like I mentioned.

(sorry I don't have any good samples for any of this)



so you mean i should have job-queues that contain the function pointers to all my gameobjects' update functions and have those organized in an intelligent way to have as few locking as possible and a correct order of launching them
-
and then a thread pool with as many threads as there are cpu-cores that pop one job after the other?

did i get it now ^^?

Share this post


Link to post
Share on other sites
Hi there,

You can use my thread pool class.
It's simple but very fast since it uses only critical sections for synchronization.



#pragma once

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// MyThreadPool.h
//
// Copyright (c) Ivica Kolic 2010 (Ivica.Kolic@gmail.com)
//
// Description:
// This is very simple, but very fast thread pool class implemented using critical sections (for speed).
//
// Usage:
//
// pool.Init(THREAD_NUMBER); // This will create THREAD_NUMBER threads in the pool. Called once per application.
// ...
// pool.Start(func1, param1); // Execute func1(param1) as a thread (one of the worker threads will do this)
// pool.Start(func2, param2); // Execute func1(param1) as a thread
// pool.Start(funcN, paramN); // Execute funcN(paramN) as a thread (N should be smaller than THREAD_NUMBER
// pool.Wait(); // This will wait all the threads to finish working - used for chaining.
// pool.Start(funcA, paramA); // Execute funcA(paramA) as a thread
// pool.Start(funcB, paramB); // Execute funcB(paramB) as a thread
// pool.Start(funcC, paramC); // Execute funcC(paramC) as a thread
// pool.Wait(); // This will wait all the threads to finish working - used for chaining.
// ...
// pool.Exit(); // This will wait until every thread is dead and clear the pool. Called at the end.
//
//
// Theory:
//
// Initial idea that didn't work:
// Basic idea was that worker thread should infinitely loop with the EnterCriticalSection(allowWorkStart) at the
// beggining of the loop. Start function would release this critical section for the thread so that thread could
// do work until it reaches EnterCriticalSection(allowWorkStart) again and locks itself once more. We would also
// have one other critical section that indicates work (set by thread) and expected by the Wait command.
// The problem was that Start (which is in the main thread) can't unlock critical section for some other thread.
// Also, worker theread can't lock itsef with the same EnterCriticalSection line (recursion count is just increased).
//
// Idea that worked:
// Since two critical sections wouldn't do, I had to get more creative and ended up using four critical sections:
// One to prevent thread from starting work (csAllowWorkStart)
// One to prevent thread from ending work (csAllowWorkEnd)
// One to force Wait function to wait (csAllowWait)
// One to force Start function to wait until thread is done with previous work (csAllowStart)
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////

#include <windows.h>
#include <process.h>

class CMyThreadPool
{
private: // Data used to control every worker thread:
struct SThreadData
{
CRITICAL_SECTION csAllowWorkStart; // We allow worker thread to start working (but not to end working - loop back at the start and takes this CS again).
CRITICAL_SECTION csAllowWait; // Wait function waits for this (that is set when worker thread finished the job).
CRITICAL_SECTION csAllowWorkEnd; // While waiting for threads to finish we allow work end, but not to start again.
CRITICAL_SECTION csAllowStart; // Stops from Start being used again until thread finishes. This prevent going from Wait to Start and skipping thread that needs to loop back to the beggining.
void (*pFunc)(void*); // Current function that worker thread must execute
void* pParam; // Parameters to the function
};
private: // Worker thread:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// WorkerThread waits for jobs from Start and blocks Wait command until work is done. It allso prevents Start
// to issue new work until old one is done.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
static void __cdecl WorkerThread(LPVOID pParam)
{
SThreadData& td = *(SThreadData*)pParam;
while(1)
{
EnterCriticalSection(&td.csAllowWait); // This prevents main program to give us right to end work and then takes it right away (if Wait is called imediately after Start)
LeaveCriticalSection(&td.csAllowStart); // Now that we can finish (we can't start) - we don't need that information any more.

EnterCriticalSection(&td.csAllowWorkStart); // We can't start work until main program has something for us.
LeaveCriticalSection(&td.csAllowWorkStart); // Now that we know that we can work, we don't need that flag any more.

if( td.pFunc ) // If there is something to do (Start has given us work)
{
td.pFunc(td.pParam); // Execute task
EnterCriticalSection(&td.csAllowStart); // Prevent Start to give us new work untill we loop at the beggining.
LeaveCriticalSection(&td.csAllowWait); // We are not working any more (Wait command expects this)
EnterCriticalSection(&td.csAllowWorkEnd); // We can't finish before main program is ok with that (it needs time to prevent new work - this is all done in Wait commant)
LeaveCriticalSection(&td.csAllowWorkEnd); // Now that we can finish (we can't start) - we don't need that information any more.
}
else break; // If there is no new work - exit the thread. This is set by Exit command.
}
LeaveCriticalSection(&td.csAllowWait); // Time to say goodbye.
}
private:
unsigned int m_nThreadNumber; // Number of worker threads
SThreadData* m_pThreadData; // Control data for each worker thread
public:
CMyThreadPool():m_nThreadNumber(0),m_pThreadData(0){}
~CMyThreadPool(){ Exit(); }
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Init creates all the control data for the worker threads and starts them.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
void Init(unsigned int nThreadNumber)
{
Exit();
m_nThreadNumber = nThreadNumber;
m_pThreadData = new SThreadData[m_nThreadNumber];
for( int t=0; t<m_nThreadNumber; ++t )
{
InitializeCriticalSection(&m_pThreadData[t].csAllowStart);
InitializeCriticalSection(&m_pThreadData[t].csAllowWorkStart);
InitializeCriticalSection(&m_pThreadData[t].csAllowWait);
InitializeCriticalSection(&m_pThreadData[t].csAllowWorkEnd);
EnterCriticalSection(&m_pThreadData[t].csAllowWorkStart); // This will prevent thread from starting to work until we have some work for it.
EnterCriticalSection(&m_pThreadData[t].csAllowStart); // This will prevent Wait Start loop that skips thread.
m_pThreadData[t].pFunc = NULL;
m_pThreadData[t].pParam = NULL;
_beginthread(CMyThreadPool::WorkerThread, 0, &m_pThreadData[t]);// Start the thread that will wait at
}
Sleep(2000); // Now we sleep to give worker threads time to aquire csAllowWait flag (critical section).
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Start gives work to the worker thread and allows threads to work (csAllowWorkStart) but prevents them from
// stopping (csAllowWorkEnd). Start will wait until thread is done with its previous work (csAllowStart is locked
// by the thread).
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
bool Start(void( *pFunc )( void * ) , void* pParam)
{
if( pFunc )
{
for( int t=0; t<m_nThreadNumber; ++t )
{
if( !m_pThreadData[t].pFunc ) // Thread is not working
{
EnterCriticalSection(&m_pThreadData[t].csAllowStart); // Until thread is done with the previous work, Start can't issue a new work
LeaveCriticalSection(&m_pThreadData[t].csAllowStart); // No need for this info any more
m_pThreadData[t].pFunc = pFunc;
m_pThreadData[t].pParam = pParam;
EnterCriticalSection(&m_pThreadData[t].csAllowWorkEnd); // Before we allow work, we must prevent end of work.
LeaveCriticalSection(&m_pThreadData[t].csAllowWorkStart); // Now thread will be able to work but not to stop working.
return true;
}
}
}
return false; // If there are no available slots - Start will fail to assign job.
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Wait waits untill every thread is done with the work.
// Once that is done, threads are allowed to end the work but not to beggin a new work.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
void Wait()
{
for( int t=0; t<m_nThreadNumber; ++t )
{
if( m_pThreadData[t].pFunc ) // Active thread
{
EnterCriticalSection(&m_pThreadData[t].csAllowWait); // Wait until it stops working
LeaveCriticalSection(&m_pThreadData[t].csAllowWait); // OK we know that thread finished with work and we don't need that information any more.
m_pThreadData[t].pFunc = NULL; // Set empty data - no work
m_pThreadData[t].pParam = NULL; // Set empty data - no work
EnterCriticalSection(&m_pThreadData[t].csAllowWorkStart); // Prevent new work (until Start has something new to do)
LeaveCriticalSection(&m_pThreadData[t].csAllowWorkEnd); // Allow threads to end current work

}
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Exit waits till every thread is done and then orders them to finish.
// After 1s threads should be dead and memory is then released.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
void Exit()
{
Wait();
for( int t=0; t<m_nThreadNumber; ++t )
{
m_pThreadData[t].pFunc = NULL; // To mark end
LeaveCriticalSection(&m_pThreadData[t].csAllowWorkStart); // Kick one more time to force threads to exit
}
Sleep(1000); // Sleeping 1s should do it
for( int t=0; t<m_nThreadNumber; ++t )
{
DeleteCriticalSection(&m_pThreadData[t].csAllowWorkStart);
DeleteCriticalSection(&m_pThreadData[t].csAllowStart);
DeleteCriticalSection(&m_pThreadData[t].csAllowWait);
DeleteCriticalSection(&m_pThreadData[t].csAllowWorkEnd);
}
m_nThreadNumber = 0;
delete [] m_pThreadData;
m_pThreadData = NULL;
}
};





How to initialize core number of threads:


SYSTEM_INFO sysInfo;
GetSystemInfo( &sysInfo );
m_nCoreNumber = sysInfo.dwNumberOfProcessors;
m_ThreadPool.Init(m_nCoreNumber);





How to give work to the pool:


m_ThreadPool.Start(&Function1, (void*)&params1);
m_ThreadPool.Start(&Function1, (void*)&params2);
...
m_ThreadPool.Start(&FunctionN, (void*)&paramsN);
m_ThreadPool.Wait(); // Wait until all the work is done - chaining

m_ThreadPool.Start(&FunctionA, (void*)&paramsA);
m_ThreadPool.Start(&FunctionB, (void*)&paramsB);
m_ThreadPool.Wait(); // Wait until all the work is done - chaining

Share this post


Link to post
Share on other sites
thanks for all the ideas, it's been quite some time, but i finally came to write something.

i know i'm asking a lot, but i can't see why my threadpool destructor gets stuck after the frist ->join and the game doesn't quit clean, visual studio closes the window, but the process is still running. without threads evertyhing terminates well. as i said, the ThreadPool destructor calls join on the first thread and doesn't do anything anymore.

i guess i made some conceptional mistakes

here's my code:


typedef boost::shared_ptr<boost::thread> ThreadPtr;
typedef std::vector<ThreadPtr> ThreadArrayList;

template <typename JobType> class JobQueue : boost::noncopyable
{
public:
JobQueue() : bExit(false)
{}
~JobQueue()
{
{
boost::mutex::scoped_lock lock(MonitorMutex);
bExit = true;
}
}

void push(const JobType& newData)
{
boost::mutex::scoped_lock lock(MonitorMutex);
jobTypeQueue.push(newData);
NextAvailable.notify_one();
}

JobType pop()
{
boost::mutex::scoped_lock lock(MonitorMutex);
if (bExit)
{
return 0;
}

if(jobTypeQueue.empty())
{
NextAvailable.wait(lock);
}

JobType temp(jobTypeQueue.front());
jobTypeQueue.pop();

return temp;
}

private:
std::queue<JobType> jobTypeQueue;
boost::mutex MonitorMutex;
boost::condition_variable NextAvailable;
bool bExit;
};

typedef Framework::Utility::JobQueue<GameComponent*> GameComponentJobQueue;
typedef boost::shared_ptr<GameComponentJobQueue > GameComponentJobQueuePtr;

class WorkerThread;

typedef boost::shared_ptr<WorkerThread> WorkerThreadPtr;
typedef std::vector<WorkerThreadPtr> WorkerArrayList;

class WorkerThread
{
public:
WorkerThread(GameComponentJobQueuePtr gameJobQueue) : bExit(false), jobQueue(gameJobQueue)
{
}

~WorkerThread()
{
{
boost::mutex::scoped_lock lock(exitMutex);
bExit = true;
}
}

void ProcessJobs()
{
while(true)
{
{
boost::mutex::scoped_lock lock(exitMutex);
if (bExit)
return;
}

GameComponent* component = jobQueue->pop();

if (component == 0)
{
return;
}
else
{
component->Proceed();
}
}
this->~WorkerThread();
}

private:
GameComponentJobQueuePtr jobQueue;

boost::mutex exitMutex;
bool bExit;
};


class ThreadPool
{
public:
ThreadPool()
{
size = 0;
}

~ThreadPool()
{
if (threads.size() != 0)
{
for (unsigned int i=0; i < threads.size(); ++i)
{
if(threads[i] != 0)
{
threads[i]->join();
}
}
}
}

void Initialize(unsigned int numWorkerThreads, GameComponentJobQueuePtr gameJobQueue)
{
threads.resize(numWorkerThreads);

jobQueue = gameJobQueue;

for (unsigned int i=0; i < numWorkerThreads; ++i)
{
threads[i].reset(new boost::thread(boost::bind(&WorkerThread::ProcessJobs, new WorkerThread(gameJobQueue))));
}

size = numWorkerThreads;
}

unsigned int GetSize()
{
return this->size;
}

void AddJob(GameComponent* component)
{
this->jobQueue->push(component);
}

private:

unsigned int size;
ThreadArrayList threads;
GameComponentJobQueuePtr jobQueue;
};




and thats a sample of how it's launched:


boost::shared_ptr<Utility::ThreadPool> threadPool;
GameComponentJobQueuePtr jobQueue;

void init()
{
SYSTEM_INFO sysInfo;
GetSystemInfo( &sysInfo );

this->jobQueue.reset(new Framework::Utility::GameComponentJobQueue());
this->threadPool.reset(new ThreadPool());
this->threadPool->Initialize(sysInfo.dwNumberOfProcessors, this->jobQueue);
}

int main()
{
while(blabla gameloop)
{
someone { this->jobQueue->push(GameComponentThatNeedsProcessing) }
if (atsomepoint)
return 0;
}
}


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this