Sign in to follow this  
maya18222

Threadpool design

Recommended Posts

First off, Im not trying to reinvent the wheel, I'm just doing this as an exercise.

Below is my ThreadPool class and first of all I was looking for some feedback on it. Secondly, you'll notice that queued tasks can only be dispatched to available threads in 2 places, when they are queued and then in the wait call. How would I go about implementing it so that tasks get dispatched as soon as a thread is free? Im thinking I'll need to have the ThreadPool operating on a thread of its own.

Thread Pool Code
[code]

#pragma once

#include <windows.h>
#include <vector>

template< int N >
class ThreadPool
{
public:
ThreadPool()
{
for( int i = 0; i < N; ++i )
{
m_Threads[i] = CreateThread( 0, 0, Loop, &m_ThreadStates[i], 0, 0 );
}
}

~ThreadPool()
{
for( int i = 0; i < N; ++i )
{
TerminateThread( m_Threads[i], 0 );
CloseHandle( m_Threads[i] );
}
}

void Queue( LPTHREAD_START_ROUTINE func, void* pParams )
{
Task t = { func, pParams };
m_Tasks.push_back(t);

for( int i = 0; i < N; ++i )
{
if( !m_ThreadStates[i].Working )
{
const Task& t = m_Tasks.back();
m_ThreadStates[i].SetTask( t );
m_Tasks.pop_back();
return;
}
}
}

void WaitForAll()
{
while(true)
{
bool WorkStillinProcess = false;
for( int i = 0; i < N; ++i )
{
if( m_ThreadStates[i].Working )
{
WorkStillinProcess = true;
}
else if( !m_Tasks.empty() )
{
const Task& t = m_Tasks.back();
m_ThreadStates[i].SetTask( t );
m_Tasks.pop_back();
WorkStillinProcess = true;
}
}

Sleep(0);

if( !WorkStillinProcess )
return;
}
return;
}

private:

static DWORD WINAPI Loop( void* pParams )
{
ThreadState* pts = (ThreadState*)pParams;
while(true)
{
if( pts->HasTask() )
{
pts->DoTask();
pts->SetAvaliable();
}

Sleep(0);
}
}

struct Task
{
LPTHREAD_START_ROUTINE Workfunc;
void* pWorkParams;
};

struct ThreadState
{
ThreadState() : Working(0), Workfunc(0), pWorkParams(0) {}

bool HasTask()
{
return Workfunc != 0;
}

void SetTask( const Task& t )
{
Working = 1;
Workfunc = t.Workfunc;
pWorkParams = t.pWorkParams;
}

void DoTask()
{
Workfunc( pWorkParams );
}

void SetAvaliable()
{
Working = 0;
Workfunc = 0;
pWorkParams = 0;
}

int Working;
LPTHREAD_START_ROUTINE Workfunc;
void* pWorkParams;
};

ThreadState m_ThreadStates[N];
HANDLE m_Threads[N];
std::vector<Task> m_Tasks;
};[/code]

Usage
[code]

#include <iostream>
using namespace std;

#include "ThreadPool.h"

const unsigned int COUNT = 100000000;
const unsigned int THREADS = 4;

struct WorkParams
{
float begin;
float end;
unsigned int count;
float result;
float pad[16];
};

DWORD WINAPI ComputeAreaUnderCurve( void* pParams )
{
WorkParams* plParams = (WorkParams*)pParams;
double Area = 0.0f;
double dx = ( plParams->end - plParams->begin ) / plParams->count;
for( unsigned int i = 0; i < plParams->count; ++i )
{
double x = plParams->begin + (i * dx);
Area += dx * x * x;
}
plParams->result = (float)Area;
return 0;
}

int main()
{
ThreadPool<THREADS> threadpool;

while(true)
{
WorkParams wp[THREADS];
float Result = 0.0f;

for( int i = 0; i < THREADS; ++i )
{
wp[i].begin = i * (1.0f / THREADS);
wp[i].end = (i+1) * (1.0f / THREADS);
wp[i].count = (COUNT / THREADS);
threadpool.Queue( ComputeAreaUnderCurve, &wp[i] );
}

threadpool.WaitForAll();

for( int i = 0; i < THREADS; ++i )
{
Result += wp[i].result;
}

cout << "Result : " << Result << "\n";
}

cin.get();
}
[/code]



Share this post


Link to post
Share on other sites
Leaving everything else aside, its seems folly to use template parameters to determine the number of threads in the pool. Remember that templates are instantiated at compile time, one copy for each combination of template arguments. You don't really need separate copies for 1,2,4,8... capacity pools, and since you're kicking off (presumably) large, or at least non-trivial, jobs there's no argument against "overhead" involved in managing a dynamic number of threads.

From an API perspective, how does one set job priority? Thread affinity?

As for processing jobs as soon as a thread is able, you could use a job complete callback which would call on the job manager to kick off the next job as one way -- its non-deterministic though, which might be a problem in environments that expect less fluctuation (games, especially on consoles). The second approach is to just accept that the job manager operates on a fixed time step -- either along with some other fixed timestep (eg, framerate or physics time-step) or as a result of a timer-based callback or thread.

Share this post


Link to post
Share on other sites
The ThreadPool class needs to be non-copyable (e.g. add a private, non-implemented copy-constructor and assignment-operator).
The destructor should wait for the threads to finish instead of calling TerminateThread.

The Queue function is not thread-safe -- only one thread can call this safely, but this fact is not documented with a comment.

The ThreadState logic contains dangerous race-conditions and is currently not safe to use:

[code] Main thread calls:
m_ThreadStates[i].SetTask( t );
Which expands to:
Workfunc = t.Workfunc;//Has Task now begins returning true
pWorkParams = t.pWorkParams;//But DoTask is not safe to call until this line is finished

Worker Thread:
...
if( pts->HasTask() ) // returns Workfunc != 0
{
pts->DoTask(); // calls Workfunc( pWorkParams )[/code]

Share this post


Link to post
Share on other sites
Thanks for the comments.

Ravyne - Yeah, the template is unnecessary, that was actually being used at an earlier point for something else. I hadn't really thought about job priority as I was thinking more along the lines of important jobs will need to be added first.

_swx_ - Thanks for the link

Hodgeman - Sorry, I should have mentioned how the ThreadPool is to be used. The threads never exit as they contain an infinite loop, by calling "wait" in the "main thread" before the end of the ThreadPool objects scope the threads can be terminated safely as all work is complete. If the threads exit when no more tasks are *currently* available then the threadpool cant be used again . And yes, the Queue function is designed to only be called on the thread the ThreadPool is created in. And thanks for pointing out the race condition.

phantom - I'm curious now as to what was said.

Share this post


Link to post
Share on other sites
So... You create a bunch of threads (jobs) that never die until the entire pool does? And a pool dies when it runs out of work, and presumably you create new pools over time as needed? If that's all you're doing, what's the advantage of this over individual thread wrappers? Otherwise, what are we missing in your explanation?

Maybe I'm not well-read on the subject but my understanding is that most threadpools just manage a list of jobs, and that the jobs end. When one job is finished, another gets to take the slot -- also, jobs are typically managed in a priority-queue (usually determined by priority assignment, deadline, or both). Generally, a thread-pool is a persistent thing -- Otherwise, what if you create 2+ threadpools that overlap in time? Now you've got 2 times or more the "optimal" number of threads operating. The whole point of a thread-pool, as I understand it, is to have global knowledge about how many threads are allocated to job-processing, synchronize between them (dependency graph) and to prioritize the work.

Share this post


Link to post
Share on other sites
No. The ThreadPool dies when it goes out of scope, which could have the lifetime of the application. The threads stay alive until the ThreadPool is destructed. Tasks get queued. If there are tasks they get consumed by the threads. A thread gets a new task if its finished its current task and other tasks exist, else the threads just sleep.

Share this post


Link to post
Share on other sites
One issue I have with your design is that while you 'sleep(0)' you are still effectively busy waiting as all that will happen is the OS will give up the quantum and then, if you are the only active app, give it right on back again.

A better solution might be to use events of some sort to signal to a thread 'wake up and work'; you could even use three events;

- 'wake up and work'
- 'time to die'
- 'exiting'

So your thread pool class posts some work to a thread and then raises the 'wake up and work' event which the thread is waiting on, the thread then goes off and does the work as before.

On shut down your thread pool posts 'time to die' and the thread wakes up, optionally clears any work in the queue (depending on design), then exits it's loop and signals 'exiting'. In the mean time, having signalled all the threads to shut down your main thread is waiting on all their 'exiting' events and when they are all signalled it knows they have shut down and it can continue to shut down as well.

And by 'events' I mean OS primatives or some abstraction there of, not checking for bools etc :)

Share this post


Link to post
Share on other sites
What would you gain from using events though? If theres no work to be done, then you'd still just be iterating over some message loop, waiting for the WAKE_UP message.

[quote][color=#1C2837][size=2]And by 'events' I mean OS primatives or some abstraction there of, not checking for bools etc [/size][/color][/quote]

Can you give an example? And whats wrong with checking bools?

Share this post


Link to post
Share on other sites
Events let the OS put the thread to sleep, without burning CPU power, which can save energy and depending on the usage make your overall system more responsive. Basically instead of spinning up 3 thread and having them peg 3 cores of a 4 core system at 100% without doing any useful work they will sleep until woken up.

By 'OS primatives' I mean using something like, on Win32, CreateEvent which gives you back a handle you can wait on via a function such as WaitForSingleObject or WaitForMultipleObjects depending on your use case (in this case you'd want the latter as you'll be waiting for multiple thread events to signal).

This is better because, by using the OS functionality, you know it will do The Right Thing™. In threads you can get problems with non-atomic operations causing subtle and hard to track down bugs, such as situations where one thread starts to check something, gets swapped out, that value gets changed in memory but the thread doesn't see it.

Basically this is 'the right way' to do it, certainly when starting out as a common mistake people new to threading make is trying to create all manner of 'safety' features using bools, and operations on bools are not atomic and thus not thread safe :)

Windows sync functionality can be found here on MSDN; [url="http://msdn.microsoft.com/en-us/library/ms686360(v=VS.85).aspx"]http://msdn.microsoft.com/en-us/library/ms686360(v=VS.85).aspx[/url]

Share this post


Link to post
Share on other sites
[quote name='maya18222' timestamp='1317500191' post='4868066']
And whats wrong with checking bools?
[/quote]

Where is the bool in your computer's memory hierarchy? Or more to the point, which copy of the bool is being checked in the conditional? Depending on the OS scheduler, the processor's memory model, the number of physical cores in your system, the compiler and optimizations employed, the load of the system and the direction the wind is blowing, this may or may not work as you intended.

The methods suggested by phantom ensure proper synchronisation and visibility.

Share this post


Link to post
Share on other sites
[quote name='maya18222' timestamp='1317489934' post='4868015']The threads never exit as they contain an infinite loop, by calling "wait" in the "main thread" before the end of the ThreadPool objects scope the threads can be terminated safely as all work is complete. If the threads exit when no more tasks are *currently* available then the threadpool cant be used again . And yes, the Queue function is designed to only be called on the thread the ThreadPool is created in. And thanks for pointing out the race condition.[/quote]I'd still try to avoid using TerminateThread -- it's a very "last resort" kind of function, the threading equivalent of ctrl+alt+delete. Using it can cause resource leaks ([i]on XP it's a guaranteed leak of ~1MB RAM[/i]), which at worst can lead to deadlock bugs ([i]e.g. if the thread has locked a critical section and is then terminated without unlocking it[/i]).
I'd recommend changing your worker's loop check a thread-shared boolean ([i]and break/return if it's true[/i]) and have the pool's destructor set the bool to true before waiting for the threads to finish.

Regarding the race condition, you can't fix that bug by rearranging lines of code either. The easiest way is to wrap the accessing of all those variables in a critical section. e.g.[code]lock( t.mutex );
assert( t.workFunc == NULL );
t.workFunc = f;
t.workParams = p;
unlock( t.mutex );
...
lock( t.mutex );
bool isAvailable = (t.workFunc == NULL);
bool hasWork = (t.workFunc != NULL);
unlock( t.mutex );[/code][quote]How would I go about implementing it so that tasks get dispatched as soon as a thread is free? Im thinking I'll need to have the ThreadPool operating on a thread of its own.[/quote]To do this, you could share the queue between the workers, so they can do something like:[code]lock( pool.mutex );
const Task& t = pool.m_Tasks.back();
this->SetTask( t );
pool.m_Tasks.pop_back();
unlock( pool.mutex );
this->DoTask();[/code]

Share this post


Link to post
Share on other sites
Im going to redo the design using atomic operations as suggested from _swx_'s link. Phantom, I'm looking at the Wait() win32 calls and they don't mention anything about putting threads to sleep, but rather they just enter a wait state. However, at the moment I'm not really concerned with power efficiency.

Im going to have a good read through the win32 sync docs tonight.

Share this post


Link to post
Share on other sites
'Putting threads to sleep' and 'entering a wait state' is, in this case, the same thing... and well, bad use of terms on my part as it could be confused with 'sleep'.

Basically when a thread enters a 'wait state' it is no longer considered for scheduling by the kernel until either a) an event it is waiting for happens, b) a time out occurs and, iirc, c) a RPC is performed or an IOCP completes.

In this instance we are intrested in reason 'a'; the thread won't be considered for scheduling until an event it is waiting for is signaled. This is all handed by windows under the hood.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this