Thread pool (basic idea)

Started by
8 comments, last by Laval B 11 years, 10 months ago
Hello everyone

I'm going to need some sort worker threadpool in my project so i can queue independent jobs and let the pool takes care of finding a thread to execute the job. The pool would run a fix number of threads though it could have a minimum and a maximum number of threads (i have no idea how to manage growing the pool).

What i'm thinking so far is this :

- A FIFO queue that will receive the jobs to be executed (from external threads).
- A mutex (critical section on windows or a pthread_mutext_t on POSIX systems) to synchronize access to the queue.
- An event that would be managed by the Pop/Push operations of the queue (event api on windows or pthread_cond_wait on POSIX systems).

The algorithm would be somthing like this :

All threads are sitting on the event. When a job is added to the queue, the event is fired. All the threads (that aren't busy doing executing some job) will try to dequeue the item (only one will because of the mutex) then the event is reset.

That's the basic idea, but alot of things seem wrong with this design. All the threads firing at the same time doesn't sound good. I'm also worried about contention.

I would be interested in having your ideas.

Thanks alot in advance.

P.S.

I am working with C++ and the code must compile on non C++11 compilers. Furthermore, it would be nice to avoid using Boost (not that it's bad quite on the contrary).
We think in generalities, but we live in details.
- Alfred North Whitehead
Advertisement
Assuming you've ruled out things like TBB for good reason...

There are many things you can experiment with. I'd recommend going ahead with your plan but designing the interface in such a way as to allow alternative implementations. If your tasks are relatively meaty, you might find that contention isn't much of an issue.

One common way of avoiding contention in lock-based queues is to use a mutex at each end and a dummy node, as described by this paper (PDF). Anthony Williams' new book, C++ Concurrency in Action also describes the technique.

If you're on Windows, you could experiment with IO Completion Ports. They're the Windows-native way of providing a way of spreading work arriving via a single queue across multiple threads.

Another simple approach is to give each thread it's own queue and assign work in a round-robin fashion (you'll need an atomic index to decide which queue to insert work in to next). If the work being enqueued is heterogeneous (i.e. high variance in task duration), you could also experiment with a simple work-stealing mechanism for threads that run out of work (the simplest being to try to steal a task from a randomly chosen queue N times before waiting).

Assuming you've ruled out things like TBB for good reason...


I haven't ruled out Intel TBB library but i still need to study it, looks like a big thing at first glance.

Thank you very much for the reference on the double lock queue.

I find the idea of using a queue for each thread very interesting though more complicated, especially the part on stealing task for load balancing (but it is very appealing). I must admit that programming lockfree stuff scares me a bit even after reading Herb Sutter's articles on Dr Dobb's.

As for I/O completion port, i already use it for multiplexing network I/O. I also used a fix number of threads to manage those asynchronous I/O completions, for Windows of course (on Solaris, i'll probaly use /dev/poll). I want to use another threadpool mostly to process the requests (database queries, file lookup or transfer, etc) in order to avoid the I/O threads to block too long when a request takes too long to process.

So i'll experiment with what have now and see how it goes.
We think in generalities, but we live in details.
- Alfred North Whitehead
In case you didn't know, it appears Solaris 10 and later also has completion ports.

In case you didn't know, it appears Solaris 10 and later also has completion ports.


Well, i didn't know and the article i had was only talking about /dev/poll. Thank you again, much appriciated. The code will have to work on Solaris 10.
We think in generalities, but we live in details.
- Alfred North Whitehead
As for I/O completion port, i already use it for multiplexing network I/O. I also used a fix number of threads to manage those asynchronous I/O completions, for Windows of course (on Solaris, i'll probaly use /dev/poll). I want to use another threadpool mostly to process the requests (database queries, file lookup or transfer, etc) in order to avoid the I/O threads to block too long when a request takes too long to process.[/quote]

Considering boost::asio does exactly that, using same API, on all these platforms, inclduing Solaris, has been tested over years by tens of thousands of people, is available under no restriction...

That includes asynchronous file IO, asynhronous callbacks, strands for interoperability with non-threaded APIs, asynchronous networking, thread pooling, ....

The code will have to work on Solaris 10.[/quote]

Boost and asio are supported on Solaris.


Thread scheduler in asio went through several rewrites, so did other parts. It really is a hard problem, even when foremost experts in platforms and C++ come together.

Thread scheduler in asio went through several rewrites, so did other parts. It really is a hard problem, even when foremost experts in platforms and C++ come together.


You are totally right. Looking at asio (non-boost version) is in my plans but i think getting my hands dirty with theses things will help me understand the way it works better even if i endup using a framework like asio, especially since i have enough time allocated for it.

Thank you for the advice, i appreciate.
We think in generalities, but we live in details.
- Alfred North Whitehead

you'll need an atomic index to decide which queue to insert work in to next


I can always get away with incrementing the index atomically using _InterlockedIncrement on Windows (or [color=#444444][font=Arial,]__sync_add_and_fetch on gcc[/font]) but since there would be a fix number of queues, the index needs to go back to zero when it is incremented from the value size - 1. I'm not sure how to do it with the atomic primitives i have ... there must be a way though.
We think in generalities, but we live in details.
- Alfred North Whitehead
Just use the result of the atomic increment modulo the number of queues. There may be a tiny blip in the uniformity of the work distribution once every 4 billion-ish tasks if you don't have a number of queues that's a power of two, but that's probably not going to cause you much trouble in the long run smile.png

If you do decide to do it this way, it might be worth checking that those intrinsics do indeed wrap to 0. I'm pretty sure they do, though.

If you do decide to do it this way, it might be worth checking that those intrinsics do indeed wrap to 0. I'm pretty sure they do, though.


Yes, even if one of the task attribution isn't exact once in a while, the overall task distribution shoudn't be too bad.

I did a quick test on Windows (i don't have a Solaris box at home), and yes

[background=rgb(250, 251, 252)]_InterlockedIncrement wraps. The only problem is that all the atomic increment functions (at least on windows) work on signed integer types which wraps to a negative value. sad.png[/background]



Edit :

Stupid me, i just have to cast and unsigned int into an int. The fonction wraps to zero.
By the way, the paper you reffered me to earlier is really interesting.
We think in generalities, but we live in details.
- Alfred North Whitehead

This topic is closed to new replies.

Advertisement