Jump to content

  • Log In with Google      Sign In   
  • Create Account


Thread pool (basic idea)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 Laval B   GDNet+   -  Reputation: 4550

Like
2Likes
Like

Posted 26 May 2012 - 09:59 AM

Hello everyone

I'm going to need some sort worker threadpool in my project so i can queue independent jobs and let the pool takes care of finding a thread to execute the job. The pool would run a fix number of threads though it could have a minimum and a maximum number of threads (i have no idea how to manage growing the pool).

What i'm thinking so far is this :

- A FIFO queue that will receive the jobs to be executed (from external threads).
- A mutex (critical section on windows or a pthread_mutext_t on POSIX systems) to synchronize access to the queue.
- An event that would be managed by the Pop/Push operations of the queue (event api on windows or pthread_cond_wait on POSIX systems).

The algorithm would be somthing like this :

All threads are sitting on the event. When a job is added to the queue, the event is fired. All the threads (that aren't busy doing executing some job) will try to dequeue the item (only one will because of the mutex) then the event is reset.

That's the basic idea, but alot of things seem wrong with this design. All the threads firing at the same time doesn't sound good. I'm also worried about contention.

I would be interested in having your ideas.

Thanks alot in advance.

P.S.

I am working with C++ and the code must compile on non C++11 compilers. Furthermore, it would be nice to avoid using Boost (not that it's bad quite on the contrary).
We think in generalities, but we live in details.
 
Alfred North Whitehead

Sponsor:

#2 e‍dd   Members   -  Reputation: 2105

Like
2Likes
Like

Posted 26 May 2012 - 11:22 AM

Assuming you've ruled out things like TBB for good reason...

There are many things you can experiment with. I'd recommend going ahead with your plan but designing the interface in such a way as to allow alternative implementations. If your tasks are relatively meaty, you might find that contention isn't much of an issue.

One common way of avoiding contention in lock-based queues is to use a mutex at each end and a dummy node, as described by this paper (PDF). Anthony Williams' new book, C++ Concurrency in Action also describes the technique.

If you're on Windows, you could experiment with IO Completion Ports. They're the Windows-native way of providing a way of spreading work arriving via a single queue across multiple threads.

Another simple approach is to give each thread it's own queue and assign work in a round-robin fashion (you'll need an atomic index to decide which queue to insert work in to next). If the work being enqueued is heterogeneous (i.e. high variance in task duration), you could also experiment with a simple work-stealing mechanism for threads that run out of work (the simplest being to try to steal a task from a randomly chosen queue N times before waiting).

Edited by edd², 26 May 2012 - 11:27 AM.


#3 Laval B   GDNet+   -  Reputation: 4550

Like
2Likes
Like

Posted 26 May 2012 - 11:53 AM

Assuming you've ruled out things like TBB for good reason...


I haven't ruled out Intel TBB library but i still need to study it, looks like a big thing at first glance.

Thank you very much for the reference on the double lock queue.

I find the idea of using a queue for each thread very interesting though more complicated, especially the part on stealing task for load balancing (but it is very appealing). I must admit that programming lockfree stuff scares me a bit even after reading Herb Sutter's articles on Dr Dobb's.

As for I/O completion port, i already use it for multiplexing network I/O. I also used a fix number of threads to manage those asynchronous I/O completions, for Windows of course (on Solaris, i'll probaly use /dev/poll). I want to use another threadpool mostly to process the requests (database queries, file lookup or transfer, etc) in order to avoid the I/O threads to block too long when a request takes too long to process.

So i'll experiment with what have now and see how it goes.
We think in generalities, but we live in details.
 
Alfred North Whitehead

#4 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 26 May 2012 - 12:31 PM

In case you didn't know, it appears Solaris 10 and later also has completion ports.

#5 Laval B   GDNet+   -  Reputation: 4550

Like
0Likes
Like

Posted 26 May 2012 - 12:38 PM

In case you didn't know, it appears Solaris 10 and later also has completion ports.


Well, i didn't know and the article i had was only talking about /dev/poll. Thank you again, much appriciated. The code will have to work on Solaris 10.
We think in generalities, but we live in details.
 
Alfred North Whitehead

#6 Antheus   Members   -  Reputation: 2397

Like
1Likes
Like

Posted 26 May 2012 - 01:21 PM

As for I/O completion port, i already use it for multiplexing network I/O. I also used a fix number of threads to manage those asynchronous I/O completions, for Windows of course (on Solaris, i'll probaly use /dev/poll). I want to use another threadpool mostly to process the requests (database queries, file lookup or transfer, etc) in order to avoid the I/O threads to block too long when a request takes too long to process.


Considering boost::asio does exactly that, using same API, on all these platforms, inclduing Solaris, has been tested over years by tens of thousands of people, is available under no restriction...

That includes asynchronous file IO, asynhronous callbacks, strands for interoperability with non-threaded APIs, asynchronous networking, thread pooling, ....

The code will have to work on Solaris 10.


Boost and asio are supported on Solaris.


Thread scheduler in asio went through several rewrites, so did other parts. It really is a hard problem, even when foremost experts in platforms and C++ come together.

Edited by Antheus, 26 May 2012 - 01:22 PM.


#7 Laval B   GDNet+   -  Reputation: 4550

Like
1Likes
Like

Posted 26 May 2012 - 01:37 PM

Thread scheduler in asio went through several rewrites, so did other parts. It really is a hard problem, even when foremost experts in platforms and C++ come together.


You are totally right. Looking at asio (non-boost version) is in my plans but i think getting my hands dirty with theses things will help me understand the way it works better even if i endup using a framework like asio, especially since i have enough time allocated for it.

Thank you for the advice, i appreciate.

Edited by Laval B, 26 May 2012 - 01:38 PM.

We think in generalities, but we live in details.
 
Alfred North Whitehead

#8 Laval B   GDNet+   -  Reputation: 4550

Like
0Likes
Like

Posted 26 May 2012 - 04:15 PM

you'll need an atomic index to decide which queue to insert work in to next


I can always get away with incrementing the index atomically using _InterlockedIncrement on Windows (or __sync_add_and_fetch on gcc) but since there would be a fix number of queues, the index needs to go back to zero when it is incremented from the value size - 1. I'm not sure how to do it with the atomic primitives i have ... there must be a way though.

Edited by Laval B, 26 May 2012 - 04:18 PM.

We think in generalities, but we live in details.
 
Alfred North Whitehead

#9 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 26 May 2012 - 04:26 PM

Just use the result of the atomic increment modulo the number of queues. There may be a tiny blip in the uniformity of the work distribution once every 4 billion-ish tasks if you don't have a number of queues that's a power of two, but that's probably not going to cause you much trouble in the long run Posted Image

If you do decide to do it this way, it might be worth checking that those intrinsics do indeed wrap to 0. I'm pretty sure they do, though.

Edited by edd², 26 May 2012 - 04:27 PM.


#10 Laval B   GDNet+   -  Reputation: 4550

Like
1Likes
Like

Posted 26 May 2012 - 04:45 PM

If you do decide to do it this way, it might be worth checking that those intrinsics do indeed wrap to 0. I'm pretty sure they do, though.


Yes, even if one of the task attribution isn't exact once in a while, the overall task distribution shoudn't be too bad.

I did a quick test on Windows (i don't have a Solaris box at home), and yes

_InterlockedIncrement wraps. The only problem is that all the atomic increment functions (at least on windows) work on signed integer types which wraps to a negative value. Posted Image


Edit :

Stupid me, i just have to cast and unsigned int into an int. The fonction wraps to zero.
By the way, the paper you reffered me to earlier is really interesting.

Edited by Laval B, 26 May 2012 - 05:14 PM.

We think in generalities, but we live in details.
 
Alfred North Whitehead




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS