Sign in to follow this  
forsandifs

How to program in parallel using C++?

Recommended Posts

I have an enourmous, heavy duty for loop that I would like to parallelize.

How can I do this in C++?

I've read that there is a parallel Class with a for Method, which would run the thread in parallel automatically?

I've also read that having more threads than cores is bad because you get sequential inefficiency creeping in. So parallelizing every element in my for loop would be a bad idea... So, I guess using a parallel.for loop would not be efficient? Or does parallel.for automatically optimize the parallelization by assigning the appropriate number of elements to each thread so that there are not more threads than cores?

Further, would it really be as simple as:

Parallel.For(0, NumElementsInForLoop, delegate(int i) 
{
//Insert code in body of for loop here
});


?

Share this post


Link to post
Share on other sites
What you've got there in your code example is C#, not C++.

For simple parallelization in C++ (assuming you didn't really mean C#), OpenMP is a straightforward, fairly widely applicable solution.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
What you've got there in your code example is C#, not C++.

For simple parallelization in C++ (assuming you didn't really mean C#), OpenMP is a straightforward, fairly widely applicable solution.


Ah right thanks. Yeah, I do mean C++.

I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL? If not I will look into OpenMP. Thanks again.

Share this post


Link to post
Share on other sites
The closest to the C# code you posted is probably Microsoft's PPL.

Portable solutions include OpenMP as Sneftel suggested and Intel's Threading Building Blocks.

Quote:
I've also read that having more threads than cores is bad because you get sequential inefficiency creeping in

I'm not sure what you mean by sequential inefficiency, but having more *busy* threads than cores can lead to performance degradation due to additional context switching.

However, a sensible parallel library/runtime will not create a thread for each each iteration of a loop as thread creation is typically rather expensive. Instead, they will submit tasks (perhaps one task per K loop iterations) to an execution service where those tasks are distributed over the active threads.

In fact, I've found Microsoft's OpenMP implementation to be a little bit naive in this respect, as it tends to generate a number of coarse grained tasks equal to the number of cores available. So if one iteration takes a longer amount of time than the others, the system can end up waiting around with N-1 idle cores at the end of a parallel-for, for example.

Quote:

Further, would it really be as simple as:
[...]


That depends if you have any data dependencies been subsequent loop iterations.

EDIT:

Quote:
I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL?

No. Not yet, anyway. C++ doesn't even have a standardized notion of threads. That will be delivered with the next standard and will follow the boost threads library quite closely. But an actual parallelization framework is unlikely to be delivered in the standard, in the near future.

Share this post


Link to post
Share on other sites
Quote:
Original post by forsandifs
I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL? If not I will look into OpenMP. Thanks again.
OpenMP isn't really a library, at least not a separate one. It's built into the compiler.

Share this post


Link to post
Share on other sites
Quote:
Original post by the_edd
In fact, I've found Microsoft's OpenMP implementation to be a little bit naive in this respect, as it tends to generate a number of coarse grained tasks equal to the number of cores available. So if one iteration takes a longer amount of time than the others, the system can end up waiting around with N-1 idle cores at the end of a parallel-for, for example.


I have only used gcc's implementation of OpenMP, but I believe dynamic scheduling is available in all implementations of OpenMP, and it fixes the problem you are describing.

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
I have only used gcc's implementation of OpenMP, but I believe dynamic scheduling is available in all implementations of OpenMP, and it fixes the problem you are describing.


Ah yes, you're probably right.

I think I had heard that before and was meaning to revisit my experiments. I'd obviously since forgotten about it until now. Thanks for the reminder.

Share this post


Link to post
Share on other sites
Thank you very much guys!

I settled for PPL because it seemed to have the simplest intro to parallelising a for loop that I could see (one page on msdn as opposed to a pdf or two covering many subjects).

Parallising my for loop was as simple as:

#include <ppl.h>

Concurrency::parallel_for(int(0), size, [&](int i)
{
//Body of for loop goes here, where i is the iterator.
});






The performance gain was huge. Makes my cpu go like mad though, kinda scary D: Its like its running Prime95 or something O.o

Thanks again!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this