[C++] At what point do threads become helpful?

Started by
9 comments, last by TheBuzzSaw 13 years, 10 months ago
It's hard to word the question correctly. Here is my dilemma: I understand what threads do, how they work, and how to create them. I understand how to protect data using mutexes, semaphores, etc. The only thing holding me back from using threads right now is knowing whether a particular task is big enough to warrant its own thread. In other words, at what point does the overhead from creating/managing a thread exceed the benefits gained from concurrent processing? Sometimes, it is obvious. For instance, many have recommended putting the audio engine into its own thread. AI and Pathfinding can be threaded easily enough. But let's break it down a bit. Imagine I am multiplying two large matrices (a classic "threadable" problem). At what size is it helpful to use threads? How many instructions make a thread worth creating? My problem stems from the fact that I often can pick odd jobs here and there in my code that could be put into a thread, but I cannot bring myself to believe it would be worth it because the thread would start and then stop fairly quickly. I generally try to hunt for big jobs that can run independent of other code (which usually results in not threading them because of too much data dependency on the main thread). Help? Thoughts? Insights? Guidelines? Links to epic articles?
Amateurs practice until they do it right.Professionals practice until they never do it wrong.
Advertisement
Well, I could imagine problems like raytracing receive great benefit from threading.

Instead of casting for 800x600 pixels in a single thread, you could do 200x150 pixels in 4 threads.

Then again, I have never done a practical ray-tracer, and since the shader approach seems to the latest fad on topic, this particular problem might have better solutions to it than threading.

Given the future direction of computing, threads will be playing an ever larger part in programming of any sort.
Overhead costs of threading are very design dependent. Multiplying large matricies could be a good multithreaded task or complete horror(overhead that causes it to be slower than the single threaded version). It entirely depends on how you code it. For an idea of fine grain threading check out TBB or similar libraries. Course grain threading aka AI as its own thread works reasonably well it, doesn't scale up or down(if you have fewer processors then you have threads then you have overhead without benefit, and more processors then threads don't get used) but that may not be necessary. Dr. Doobs has several good articles as does Herb Sutters website.
The cost of thread creation can be reduced by the usage of thread pools. These consist of a number of threads that are suspended until needed.
My main problem is usually the synchronization part. The job should be able to grind on it's own data for a while or at least be isolated enough that mutexes aren't needed within the time critical code. Another problem to keep in mind is cache trashing, which means that threads should not write to memory addresses next to each other (like N threads writing to every Nth element of an array).
So in the end, the decision is not a matter of instructions, but whether or not it can be isolated and whether or not you can do something useful while waiting for the jobs to finish.
Quote:Original post by Ohforf sake

Another problem to keep in mind is cache trashing, which means that threads should not write to memory addresses next to each other (like N threads writing to every Nth element of an array).


That is false sharing. Cache trashing is anything that makes poor utilization of cache.
I like the idea of passing "jobs" to a persistent thread. Basically, I can optimize my app for dual core functionality by having the worker thread idle (watching for abstract jobs in the queue) until a job class is inserted.

The way I currently idle my app is by sleeping for 1 millisecond. Is that ideal for this worker thread (when it doesn't have a job)?
Amateurs practice until they do it right.Professionals practice until they never do it wrong.
What you'd want the worker thread to do is:
*Lock the job queue mutex, look for a job.
*If there is one, unlock the queue, and run the job.
*If there isn't one, use a condition variable+the locked mutex to wait.
*When adding a job to the queue, you'd lock the mutex and add the job, then unlock it and signal the CV to wake up your idle worker thread.

In a more complicated case, you'd want to look into "work stealing" or other load balancing schedulers. Multiple threads fighting over the job queue quickly limits the scalability of your worker threads. The smaller your chunks of work, the fewer threads you need before you start to see the problem.
I like that design! In my first iteration, I'd focus on having only one worker thread (outside the main program thread). That one thread would just do jobs as it receives them. While I could certainly add threads all over the place, I'm focusing on a "dual core design".
Amateurs practice until they do it right.Professionals practice until they never do it wrong.
Or you could just use something like Intel's threading building blocks or, VS2010/Windows only, MS's Concurrency Runtime to take care of all the threaded magic leaving you to focus on task based design.
TheBuzzsaw, I think you will like that setup. I use it in my application. The only tricky thing is handling things when they are done -- if this is necessary. Since I'm working on a GUI app, getting the results to the GUI thread from the worker thread was kind of a pain. But, otherwise, task-based parallelism is great. Look up Java's Executor framework to get an idea of various features that exist.

Implementation-wise, if you're on Windows, you can use I/O completion ports to actually implement the queue. Writing a good work queue on Win32 is not a trivial task due to the lack of condition variables. Larry Osterman has a blog entry about this that demonstrates I/O completion ports. Alternatively, you can use a thread message queue. On Linux or OS X, you can use a condition variable when writing the queue.

This topic is closed to new replies.

Advertisement