• Create Account

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

6 replies to this topic

### #1Craig_jb  Members

Posted 07 May 2012 - 12:04 AM

I've been working on a simple, cross-platform task/job scheduler, and I would like some help reviewing the code. Any feedback on general pitfalls or problems is appreciated!

I've posted the code on github, and below:
https://gist.github.com/2625036

/// This is not written to compile. I just copied everything into one file to make viewing easy
/// Here is a sample usage (task definitions and implementations omitted)
int main()
{
BOOL running = true
while (running)
{
scheduler.runUntilDone();
}
}
/// Abstract interface for all tasks run by scheduler
{
public:
/** Checks if all dependencies are met and the task can run
*  @returns true if scheduler can run task, false otherwise
*/
/// User task code goes here
virtual void run() = 0;
/// Checks whether the task is marked complete
BOOL complete() { return m_complete; }
/** Sets the complete status of the task
*  @param v Completetion status
*/
void setComplete(BOOL v) { m_complete = v; }
private:
BOOL m_complete;
};
{
public:
/** Create scheduler with default number of worker threads.
*  Default is 1 thread per 1 logical core (core or HT unit)
*/
/** Create scheduler with specified number of worker threads
*  @param n Number of worker threads (must be less than MAX_THREADS)
*/
/** Schedules a task for execection
*  The task is assigned to the first idle work thread if there is one;
*  execute immediately.
*/
void runUntilDone();
/** Gets number of worker threads
*  @return Number of worker threads
*/
private:
U32 m_workerCount;
BOOL m_running;
U32 m_nextPushIndex;
};
*  blocked by task dependencies. If a WorkerThread is truly out of tasks, it sleeps on a
*  condition variable until new tasks are available.
*/
{
public:
*  The system thread is started and enters idle, waiting on a condition variable.
*/
*  This method is called by the TaskScheduler from the main thread and uses
*  a spinlock to access task queue. Execution of the task can begin immediately after
*  the lock mutex is released.
*/
/** Checks whether the worker thread is currently idling
*  @return true if idling, false otherwise
*/
BOOL idling();
/** Blocks execution until the worker thread has completed all tasks in its queue
*  and can no longer find tasks to steal
*/
void blockUntilDone();
/** Wakes the thread and joins execution
*  TaskScheduler must have m_running = false or method will block indefinitely
*/
void stop();
private:
BOOL run();
void idle();
BOOL steal();
boost::mutex m_workerMutex;
boost::condition_variable m_wakeUp;
boost::condition_variable m_done;
BOOL m_idling;
BOOL m_dependencyBlocked;
};
//// IMPLEMENTATIONS ////
{
m_running = true;
// use number of 'logical cores' as default number of worker threads
// on CPUs with Hyperthreading, each core counts as 2
for (U32 i = 0; i < m_workerCount; i++)
m_nextPushIndex = 0;
}
{
m_workerCount = n;
for (U32 i = 0; i < m_workerCount; i++)
m_nextPushIndex = 0;
}
{
m_running = false;
for (U32 i = 0; i < m_workerCount; i++)
m_workers[i]->stop();
}
{
// try to find an idling worker thread
for (U32 i = 0; i < m_workerCount; i++)
{
if (m_workers[i]->idling())
{
return;
}
}

// push it onto the next worker thread round-robin style
// if there isn't an idling thread
if (m_nextPushIndex >= m_workerCount)
m_nextPushIndex = 0;
}
{
for (U32 i = 0; i < m_workerCount; i++)
m_workers[i]->blockUntilDone();
}
{
return m_workerCount;
}
{
m_scheduler = scheduler;
m_idling = false;
m_dependencyBlocked = false;
}
{
}
{
while (!m_workerMutex.try_lock());
m_idling = false;
m_workerMutex.unlock();

// wake up the thread if idling
m_wakeUp.notify_all();
}
{
return m_idling;
}
{
if (m_idling)
return;
// the m_done condition variable is notified when the thread enters idle,
// which only happens when it is out of tasks and can't find any to steal
boost::unique_lock<boost::mutex> lock(m_workerMutex);
while(!m_idling)
m_done.wait(lock);
}
{
// always idle when thread first starts because
// not all other worker threads are gauranteed to
// be initialized yet, and steal() relies on that.
// Also, there shouldn't be any tasks yet anyway.
idle();
for(;;)
{
if (!m_scheduler->m_running)
break;
// try to run a task. If there are not tasks available,
// try to steal tasks.  If there are no tasks to steal
// and the task queue is empty, then idle. If there are
if (!run())
{
if (!steal() && m_taskCount <= 0)
idle();
}
}
}
{
return false;

// Try to find a task to run. If all tasks in the queue
// have been checked, then return false so we can try to steal.
// If we have tasks waiting on dependencies, set m_dependencyBlocked
// so other WorkerThreads know not to steal from this one.
m_workerMutex.lock();
m_dependencyBlocked = false;
U32 numPops = 0;
{
{
m_workerMutex.unlock();
t->run();
return true;
}
numPops++;
}
if (numPops > 0)
m_dependencyBlocked = true;
m_workerMutex.unlock();
return false;
}
{
m_workerMutex.lock();
m_idling = true;
m_workerMutex.unlock();
// If another thread is waiting on us to finish execution,
// notify that this thread is done now
// Used by blockUntilDone()
m_done.notify_all();
boost::unique_lock<boost::mutex> lock(m_workerMutex);
while (m_idling)
m_wakeUp.wait(lock);
}
{
// check each worker thread for extra work
U32 workerCount = m_scheduler->m_workerCount;
for (U32 i = 0; i < workerCount; i++)
{
if (wt == this || wt->m_dependencyBlocked)
continue;
{
if (stealFromWorker(wt))
return true;
}
}
return false;
}
{
while (!wt->m_workerMutex.try_lock());

// rounding up. If they don't have tasks
// (the check in steal() is not guaranteed because,
// it does not lock the mutex), then return false so
// we can try another worker thread.
U32 numToSteal;
{
wt->m_workerMutex.unlock();
return false;
}
else
numToSteal = (wt->m_taskCount + 1) / 2;
m_workerMutex.lock();
for (U32 i = 0; i < numToSteal; i++)
{
}
wt->m_workerMutex.unlock();
m_workerMutex.unlock();
return true;
}
{
m_workerMutex.lock();
m_idling = false;
m_workerMutex.unlock();
m_wakeUp.notify_all();
}
//// IMPLEMENTATIONS ////


### #2ApochPiQ  Moderators

Posted 07 May 2012 - 12:26 AM

There are several race conditions surrounding your use of state variables (e.g. m_idle). Also, spinning on try_lock() seems a bit pointless; you should instead use a blocking lock() call so that your thread can be turned off by the OS scheduler.
Wielder of the Sacred Wands

### #3Craig_jb  Members

Posted 07 May 2012 - 02:10 AM

I've changed the WorkerThread implementation to avoid the race conditions. It passes testing through Helgrind at least.

https://gist.github.com/2625036

Edited by Craig_jb, 07 May 2012 - 02:40 AM.

### #4Oolala  Members

Posted 07 May 2012 - 05:07 AM

These sorts of things really aren't easy to get running cleanly. There are however a whole lot of existing implementations that you might want to just use, rather than deal with this whole can of worms [unless you're doing it for practice].

Lets say you have a processor with 4 threads. You push 5 tasks into your system, where each task N blocks on a task being done by task N+1, except for the last task which is free to complete unimpeded. Your system assigns the first 4 tasks, which in turn eventually block on the completion of the 5th, which never gets scheduled.

Posted 07 May 2012 - 05:40 AM

@oolala:
Deadlocks are always a problem of multithreaded applications anyway how you manage the actiivites.

@Craig_jb:
I would maintain a special running list besides the list of tasks.
You need not searching for the running tasks anymore. On state change they move into and out of the list.
I do that for a timer list, because I have to handle some thousands of timers in parallel.

### #6Oolala  Members

Posted 07 May 2012 - 06:06 AM

@oolala:
Deadlocks are always a problem of multithreaded applications anyway how you manage the actiivites.

True, but this is an instance where the user of the library has designed parallel code that does not contain a deadlock, but the implementation of the scheduler induces deadlock. That's a big, big distinction.

### #7Antheus  Members

Posted 07 May 2012 - 08:13 AM

I'm not a fan of such design or current implementation thereof.

There's still race conditions. Race conditions occur when two atomic operations occur one after another, yet they should be atomic as a unit (or transaction).

This, for example:
m_workerMutex.lock();
m_workerMutex.unlock();

<-- I steal a task here

m_idleMutex.lock();
m_idling = false;
m_idleMutex.unlock();
<-- I have no tasks, but I'm not idle

There is too much interlocked or duplicated state:

Then there are m_workerMutex, m_idleMutex, m_wakeUp, m_done, m_idling, m_dependencyBlocked.

A thread is a state machine. All of the above can be converted into a single variable or a synchronization primitive. Draw a graph on paper on how worker works, which states it can be in and under which conditions it can transition. Such formalization also allows proof of correctness.

Querying for idle() is highly undesirable. It's serial action ("if (idle()) "). Conditions and serial execution harm the throughput. Worker knows exactly when it's idle (it's either executing a task, or it's waiting on a condition variable).

To manage dependencies, look into topological sort. Single responsibility principle and all that - workers work and no more. Deciding what and how to run is the job of scheduler. They only submit tasks once they know they can run.

Reason:
Tasks a, b;

}
}

As far as concurrency goes, here's a hard rule: Always acquire shared resources in same order.

If scheduler is supposed to resolve dependencies, it should do so at task level. Or, instead of opaque ready(), do something like this:
struct Task {
}