Thread pools and exception handling

Started by
8 comments, last by Kitt3n 17 years, 12 months ago
I'm currently working on a thread pool implementation and I've now reached the point where I want to start handling exception. Here's a quick overview of how it works so far..... A ThreadPool instance is constructed with a min/max thread count and a thread expiration time (time a thread sits idle before being destroyed). The min thread count is automatically started. Each thread runs a loop that pulls the next available job (a job is 'available' when all of the jobs it depends on have completed) off a queue and runs it. If no jobs are available the thread will block until the expiration time has elapsed or a job becomes available. If the expiration time elapses and there are more than the min thread count the thread is destroyed. Jobs can depend on 0 or more other jobs and since dependencies can only be specified when the job is queued, circular dependencies are impossible. Now that I have all this working, I'd like to gracefully handle exceptions that occur while executing a job and notifying the ThreadPool's client that the exception has occurred. So far I've thought of the following approaches: 1) The user queries the job handle after the job's completed to see if an exception was thrown. The exception is handled and a status flag marked in the job handle to indicate the error. If the exception is derived from std::exception then a copy of the exception object is made for later reference. 2) When the job is queued the user specifies a callback that is called when the exception occurs. This occurs on the same thread that the job was executing and will then requiring handling the possibility that the callback could also throw. I'm not sure how I'd handle this. 3) Same as (2), except the callback is called on a different thread. This in in case the job's thread has been trashed somehow by the exception. This shouldn't be a problem with software exceptions, but hardware exceptions (eg. access violation) may have corrupted thread-specific data. Of course, it may have also corrupted memory *anywhere* in the app so perhaps there's no point in creating the new thread. 4) Just let the poor thing die. I've seen some people advocate this over having one big try/catch block in main() since there's probably not much you can do anyway. This probably isn't applicable here though since the rest of an application may be able to continue even if 1 job dies. At the very least I would still have to handle the exception and rethrow it, other threads could be waiting on a job that throws and they still need to be signalled regardless. I'm using boost::threads to implement the thread pool, which wraps the thread function in a try/catch block that doesn't rethrow the exception (PITA when debugging). Unless someone can think of a good reason otherwise, this immediately rules out (4) since the thread would die silently and the rest of the app would unwittingly continue on it's perilous journey. I was wondering what approaches other people have taken and how they've panned out, as well as any suggestions, criticisms or even preferences as to which they think would make for a better API. Also as an aside, I'm having a small issue where some threads could still be running when main() returns and the CRT starts cleaning up (obviously does bad things to the threads still running). The problem stems from the fact that the ThreadPool object can be destroyed, but any partially complete jobs are left to complete and the threads clean themselves immediately afterwards. This is a deliberate design choice as (A) just 'killing' a thread would leave whatever the job was doing in an inconsistant state, and (B) I don't want the thread destroying the pool to have to block until the remaining jobs are completed. The problems arise when the thread pool is destroyed just before main() returns, the CRT starts cleaning up and some necessary bits (eg. synchronisation objects) get destroyed. If this might happen the client could simply call waitAll(), but if anyone knows how I could do this automatically (ie. tell the compiler NOT to cleanup until all other threads have completed) that'd be even better. [Edited by - joanusdmentia on April 30, 2006 6:35:12 PM]
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
Advertisement
*bump*
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
I'm assuming C++ due to boost.
Quote:
Also as an aside, I'm having a small issue where some threads could still be running when main() returns and the CRT starts cleaning up (obviously does bad things to the threads still running).

Very much so and the threads could hold a sync lock.
Quote:
The problem stems from the fact that the ThreadPool object can be destroyed, but any partially complete jobs are left to complete and the threads clean themselves immediately afterwards.

So the object (thread pool) does not create the threads / or have at least a record of them?
Quote:
This is a deliberate design choice as (A) just 'killing' a thread would leave whatever the job was doing in an inconsistant state,

true true
Quote:and (B) I don't want the thread destroying the pool to have to block until the remaining jobs are completed.

Why not? is this trying to stop a dependency?

The problems arise when the thread pool is destroyed just before main() returns, the CRT starts cleaning up and some necessary bits (eg. synchronisation objects) get destroyed. If this might happen the client could simply call waitAll(), but if anyone knows how I could do this automatically (ie. tell the compiler NOT to cleanup until all other threads have completed) that'd be even better.

Quote:
...If this might happen the client could simply call waitAll()...

How do you mean "client"?

I got a very simular thread-pooling system - putting tasks in the
taskmanager-singleton, and assigning tasks (interface-class) to taskthreads (deriving from a thread-interface class).

Concerning the exceptions I'm not really an expert on exceptions so take
this for what it's worth; imho exceptions should be thrown when something
really bad happens (as opposed to using exceptions as return values), so
when I throw an exception let my program die gracefully with a nice error-box.

I'm not sure if I would let the thread die, since it's more or less
independant of the task - and if a task can't complete for whatever reason,
the thread can just move on to the next task (and maybe do a callback 'failed' or set a value which can be retrieved by the client-app later).

>I'm having a small issue where some threads could still be running when main()
>returns and the CRT starts cleaning up (obviously does bad things to the
>threads still running)
Every task I have is derived from my task-interface-class, which has
a function 'abort'. Default-behaviour is to force the task to finish (ie wait until it completes), but derived classes could overide it to do something
more special (in the calculation part, return a false to the thread
running the task, telling it to stop processing this task and execute the next).

When the program ends, it calls the taskmanager's destructor, which
will make sure all queued tasks are removed and the running tasks are ended
in whatever way they support it.

In my case, tasks might access other subsystems (eg singleton-manager classes),
so my taskmgr is more or less the first thing which gets killed before any
other subsystem is destroyed.

Regards
visit my website at www.kalmiya.com

{quote]
I'm currently working on a thread pool implementation and I've now reached the point where I want to start handling exception. ... ...This is a deliberate design choice
Did you really design the thread pool?
I mean did you draw pretty uml's, state diagrams ,flows ... or just sit down and write it? This is not to sound insulting in any way.
Quote:Original post by Anonymous Poster
I'm assuming C++ due to boost.

Yep. Whenever I mention client, I'm meaning client code. That is, code that uses this class.

Quote:Very much so and the threads could hold a sync lock.

Not really. First issue is that the thread might be in the middle of a job and doesn't have a chance to lock until it's too late. Other issue is that if the ThreadPool object has been destroyed what is the client going to sync against?

Quote:So the object (thread pool) does not create the threads / or have at least a record of them?

While it doesn't explicitly keep track of the individual threads (although I could if it turned out to be necessary), the threads all have access to a common set of variables and are self-managing in that they will destroy themselves when no longer required. This is all working, once the ThreadPool object has been destroyed all the threads will terminate immediately after completing their current job (or immediately if they aren't working on a job).

Quote:
Quote:and (B) I don't want the thread destroying the pool to have to block until the remaining jobs are completed.

Why not? is this trying to stop a dependency?

Not at all, jobs that haven't started executing will be cancelled (and flagged as such) so it doesn't matter if their dependencies aren't met. I decided to go this path to allow the greatest flexability. The client code still has access to the jobs themselves so they can see the job results and poll them for completion (jobs are managed using a boost::shared_ptr and are self-sufficient in that they remain valid after the ThreadPool is destroyed) so there's no reason to force the thread destroying the ThreadPool to be blocked until the jobs complete.
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
Quote:Original post by Kitt3n
Concerning the exceptions I'm not really an expert on exceptions so take
this for what it's worth; imho exceptions should be thrown when something
really bad happens (as opposed to using exceptions as return values), so
when I throw an exception let my program die gracefully with a nice error-box.

I'm not sure if I would let the thread die, since it's more or less
independant of the task - and if a task can't complete for whatever reason,
the thread can just move on to the next task (and maybe do a callback 'failed' or set a value which can be retrieved by the client-app later).


Because of they way boost handle's threads letting things die gracefully isn't an option, boost will just eat the exception and let the thread die silently. I'm fairly sure I'll go with the threads continuing execution of the next job if the current throws, but the part I'm not sure on is how best to report this back to the client code.

Quote:Every task I have is derived from my task-interface-class, which has
a function 'abort'. Default-behaviour is to force the task to finish (ie wait until it completes), but derived classes could overide it to do something
more special (in the calculation part, return a false to the thread
running the task, telling it to stop processing this task and execute the next).


Interesting, at the moment I only allow jobs to be cancelled before execution starts. I don't know if I'd make the default wait for the task to complete (it wouldn't really be cancelling/aborting then, would it), but asking the job itself if it can be cancelled mid-execution seems like a good idea.

Quote:When the program ends, it calls the taskmanager's destructor, which
will make sure all queued tasks are removed and the running tasks are ended
in whatever way they support it.


So my reply to the AP above as to why I'd rather let the jobs complete when the ThreadPool is destroyed. Of course, if someone can provide a compelling reason why this is a bad idea then I'm all ears....

Quote:In my case, tasks might access other subsystems (eg singleton-manager classes), so my taskmgr is more or less the first thing which gets killed before any other subsystem is destroyed.


This is why I'm not overly concerned about the bug where threads are still executing while the CRT is cleaning up. If I make it obvious in the documentation then the client code has the option of calling waitAll() just before destroying the ThreadPool if there's a possibility that this might happen.
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
Quote:Original post by Anonymous Poster
Did you really design the thread pool?
I mean did you draw pretty uml's, state diagrams ,flows ... or just sit down and write it? This is not to sound insulting in any way.


Design doesn't have to mean draw pretty pictures..... sure, I use UML when I think it's appropriate for larger-scale design issues (eg. engine architecture), but I find it to be serious overkill for designing single classes.
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
Hmm.. so if I get it right, you want the user to have the option to destroy
the threadpool in the middle of execution and clientcode to still access
to tasks which continue being executed.
Another option the user has would be to destroy the threadpool at the end
of the program, where however you run into a problem because threads
running and accessing stuff which is already destroyed.

>If I make it obvious in the documentation then the client code has the
>option of calling waitAll() just before destroying the ThreadPool if there's
>a possibility that this might happen.
I would make it the other way around, by default tasks are ended (either
forced or just calculated to the end) when your mgr shuts down - so that
your average dumb user can use your pool easily.
This would also solve the problem on crt-end (end the tasks/threads in
the taskmgr-destructor).

If you advanced-user wants to shutdown the pool in the middle of runtime
(which is rather a special case imo), I would give him a
function "continueTasksAfterShutdown()" which he has to call prior to
destroying the threadpool and warn him explicitely about the dangers
(eg don't do this before shutting down).

>Interesting, at the moment I only allow jobs to be cancelled before execution
>starts. I don't know if I'd make the default wait for the task to complete (it
>wouldn't really be cancelling/aborting then, would it), but asking the job
True - but it's meant as a failsafe way to ensure a task is 'cancelled' -
it's the tasks responsibility to provide better behaviour (if it can) in
the derived function.

>itself if it can be cancelled mid-execution seems like a good idea.
The other way around is also interesting, I have a "forceTask" member.
In my case, I'm doing some prediction of what will probably be needed
in the not-too-far future - now take the situation where eg my camera
jumped forward and I need the result of the task _now_).
Here I force/wait-for the task to be finished so I can get the results
I need (obviously this should be avoided if possible :)
visit my website at www.kalmiya.com
Actually now that I think a bit more about it... all you need
it to automagically call a function on exit (didn't stl provide
a function for that?)...

Anyway, you could always make a helper-class, sth like this:

// taskmanager cpp
class ShutdownHelper
{
ShutdownHelper() {}
~ShutdownHelper() { waitAll(); }
};
// create one instance
static ShutdownHelper helperInstance;

You'd have to ensure that your sync-objects aren't destroyed
before your waitAll is called though (your shutdown-helper
could manage the sync-objects to avoid that)

Anyway, it's getting really late now - off to Sleep I go :)
visit my website at www.kalmiya.com

This topic is closed to new replies.

Advertisement