How to program in parallel using C++?

Started by
7 comments, last by forsandifs 13 years, 6 months ago
I have an enourmous, heavy duty for loop that I would like to parallelize.

How can I do this in C++?

I've read that there is a parallel Class with a for Method, which would run the thread in parallel automatically?

I've also read that having more threads than cores is bad because you get sequential inefficiency creeping in. So parallelizing every element in my for loop would be a bad idea... So, I guess using a parallel.for loop would not be efficient? Or does parallel.for automatically optimize the parallelization by assigning the appropriate number of elements to each thread so that there are not more threads than cores?

Further, would it really be as simple as:

Parallel.For(0, NumElementsInForLoop, delegate(int i)   {    //Insert code in body of for loop here  });


?
Advertisement
What you've got there in your code example is C#, not C++.

For simple parallelization in C++ (assuming you didn't really mean C#), OpenMP is a straightforward, fairly widely applicable solution.
Quote:Original post by Sneftel
What you've got there in your code example is C#, not C++.

For simple parallelization in C++ (assuming you didn't really mean C#), OpenMP is a straightforward, fairly widely applicable solution.


Ah right thanks. Yeah, I do mean C++.

I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL? If not I will look into OpenMP. Thanks again.
The closest to the C# code you posted is probably Microsoft's PPL.

Portable solutions include OpenMP as Sneftel suggested and Intel's Threading Building Blocks.

Quote:I've also read that having more threads than cores is bad because you get sequential inefficiency creeping in

I'm not sure what you mean by sequential inefficiency, but having more *busy* threads than cores can lead to performance degradation due to additional context switching.

However, a sensible parallel library/runtime will not create a thread for each each iteration of a loop as thread creation is typically rather expensive. Instead, they will submit tasks (perhaps one task per K loop iterations) to an execution service where those tasks are distributed over the active threads.

In fact, I've found Microsoft's OpenMP implementation to be a little bit naive in this respect, as it tends to generate a number of coarse grained tasks equal to the number of cores available. So if one iteration takes a longer amount of time than the others, the system can end up waiting around with N-1 idle cores at the end of a parallel-for, for example.

Quote:
Further, would it really be as simple as:
[...]


That depends if you have any data dependencies been subsequent loop iterations.

EDIT:

Quote:I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL?

No. Not yet, anyway. C++ doesn't even have a standardized notion of threads. That will be delivered with the next standard and will follow the boost threads library quite closely. But an actual parallelization framework is unlikely to be delivered in the standard, in the near future.
Quote:Original post by forsandifs
I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL? If not I will look into OpenMP. Thanks again.
OpenMP isn't really a library, at least not a separate one. It's built into the compiler.
Quote:Original post by the_edd
In fact, I've found Microsoft's OpenMP implementation to be a little bit naive in this respect, as it tends to generate a number of coarse grained tasks equal to the number of cores available. So if one iteration takes a longer amount of time than the others, the system can end up waiting around with N-1 idle cores at the end of a parallel-for, for example.


I have only used gcc's implementation of OpenMP, but I believe dynamic scheduling is available in all implementations of OpenMP, and it fixes the problem you are describing.
OpenMP is great and EASY to implement a parallel for loop.
It really only requires you to add a preprocessor directive right before your for loop.
Quote:Original post by alvaro
I have only used gcc's implementation of OpenMP, but I believe dynamic scheduling is available in all implementations of OpenMP, and it fixes the problem you are describing.


Ah yes, you're probably right.

I think I had heard that before and was meaning to revisit my experiments. I'd obviously since forgotten about it until now. Thanks for the reminder.
Thank you very much guys!

I settled for PPL because it seemed to have the simplest intro to parallelising a for loop that I could see (one page on msdn as opposed to a pdf or two covering many subjects).

Parallising my for loop was as simple as:

#include <ppl.h>Concurrency::parallel_for(int(0), size, [&](int i){      //Body of for loop goes here, where i is the iterator.});


The performance gain was huge. Makes my cpu go like mad though, kinda scary D: Its like its running Prime95 or something O.o

Thanks again!

This topic is closed to new replies.

Advertisement