The closest to the C# code you posted is probably
Microsoft's PPL.
Portable solutions include OpenMP as Sneftel suggested and Intel's
Threading Building Blocks.
Quote:I've also read that having more threads than cores is bad because you get sequential inefficiency creeping in
I'm not sure what you mean by sequential inefficiency, but having more *busy* threads than cores can lead to performance degradation due to additional context switching.
However, a sensible parallel library/runtime will not create a thread for each each iteration of a loop as thread creation is typically rather expensive. Instead, they will submit tasks (perhaps one task per K loop iterations) to an execution service where those tasks are distributed over the active threads.
In fact, I've found Microsoft's OpenMP implementation to be a
little bit naive in this respect, as it tends to generate a number of coarse grained tasks equal to the number of cores available. So if one iteration takes a longer amount of time than the others, the system can end up waiting around with N-1 idle cores at the end of a parallel-for, for example.
Quote:
Further, would it really be as simple as:
[...]
That depends if you have any data dependencies been subsequent loop iterations.
EDIT:
Quote:I'd rather not use another library API to do parallel programming with. Aren't there any methods or functions that are native to C++ or part of the STL?
No. Not yet, anyway. C++ doesn't even have a standardized notion of threads. That will be delivered with the next standard and will follow the boost threads library quite closely. But an actual parallelization framework is unlikely to be delivered in the standard, in the near future.