Is C++11's library a magic bullet for ALL multicore/multithread programming?

Started by
8 comments, last by JoeJ 7 years, 10 months ago

I am under an impression that std::async and std::mutex can solve every basic-to-intermediate problem for multi-threading.

The multi-threading (in this scope) is to

1. increase performance of multi-core (to use all CPU-core)

2. avoid CPU stall e.g. L2 cache miss, especially when execute a ton of indirection (a->b->c->d)

Here is an example :-


   std::vector<std::future> v;
   v.push_back(std::async( &A::doThis, pointerToAInstance  )); 
   v.push_back(std::async( &B::doThis, pointerToBInstance  )); 
   v.get(0).wait();
   v.get(1).wait();

This can be easily applied to a scenario that I have a long array of objects that need to be expensively processed.

If I want to split jobs again and again, I can std::async within std::async within std::async ...... look nice.

By the way, I heard that OpenMP is a popular library for multicore programming (https://en.wikipedia.org/wiki/OpenMP).

However, as I am researching, I also found that most of its feature can be replaced by simplier std::async and std::mutex.

OpenMP is a heavy weight library that just provides syntactic sugar, right?

I feel that I missed something.

Edit

@Josh Petrie and @SeanMiddledtich , thank for useful information.

Here is my specific experiment :-

My specific application is:-

I created my component-based-architecture game-engine.

I have little experience in multi-threading, but the architecture tend to support it inherently - it divided jobs into many systems (about 13 systems).

Each and every system manages and accesses at most 2-3 components. (whole game = 20 components)

Therefore, most of systems can be executed in parallel.

Step 1

After I learned a bit of std::async and std::mutex, I use these two functions to make it parallel.

I created std::future from each system, then wait() at the end of each timestep.

The modification is very conservative and highly-aware of concurrent modification and the order of execution, e.g. physic cannot be grouped with graphic.

Just the rough modification (4 chosen systems run in parellels, other systems were executed one by one as before.) can increase my program's performance from 40->50 fps.

Step 2

When I found a bottleneck in a certain system, I divided all components of a certain type into many groups (about 4), then process each group as a thread.

If it has to access some share data, which is not so often, I used std::mutex.

Again, after modification in the single most CPU-intensive function, I gained overall performance from 50->65 fps.

These two experiments created feeling that it is free for me. It is so easy ... that looks like a trap.

The reason might be that I had tried to optimize my program by using Pooling for a few weeks, performance increase only 5%, while this multi-core takes only 1 day to learn & code.

Edit2: Yay, thank everyone for great knowledge!, I should post this sooner. spamming +1

Advertisement

No.

Nothing is a magic bullet for all classes of any problem domain. Especially something design to be fairly general purpose, like a good chunk of stuff in the C++ standard library. Specialized libraries and solutions will always exist for niche scenarios and they will often be able to outperform in those niche scenarios. You will need to profile and analyze to be sure; you haven't provided enough information about your specific problem domain.

Just to provide some details, here's some of the very important things that C++'s current standard library is missing with regards to concurrency and parallelism:

- thread pools / worker threads
- concurrent data structures
- smart parallel-for
- coroutines / cooperative scheduling
- message channels
- semaphores / benaphores
- smart spinlocks
- heterogeneous architecture dispatch
- stack size control
- processor affinity
- thread debug naming

I'm sure there's others that I've missed. The above are just the things that a "typical" AAA game in 2016 might require, btw.

While you can using std::async or the like to emulate various of the above features, you'll find it rather difficult to do so with acceptable performance.

Sean Middleditch – Game Systems Engineer – Join my team!

:) Thank! I have edited OP to provide more detail.

https://www.threadingbuildingblocks.org/ this is pretty cool library for parallelism. A year ago there was a post that states that the library is no longer under GPL license and can be freely used for commercial projects, however I cannot find a version of that library on the the internet with such license.

Otherwise i really like tbb ^_^

They have added the libstdc++ runtime exception. From how I understand it that should allow you to link everything together (even completely statically) without any license side effects on your code.
In respect of std::async, you should be aware that it does not guarantee that it executes in a separate thread. The function is executed asynchronously, that means not at the time its name appears in the source code, it does not stall the program's flow until it has completed. That's all. That does not necessarily mean that execution takes place concurrently to the main thread or concurrently to any other async functions. Strictly speaking, it is legal for std::async to synchronously call the function when you call wait() -- which does stall the program's flow. Indeed that is a good thing, too. Naively throwing a sheer infinite number of threads at a problem is almost never a good solution (it works on a GPU, which is explicitly made for that purpose), and it often reduces performance rather than improve it. Context switches and caches are two things to think of here, but also thread creation if you do not pull them from a pool. A good implementation of std::async would arguably work off its functions from a queue using a pool of workers at a concurrency level less than number-of-cores. But you don't havea a guarantee for that, it might as well simply spawn a thread every time, it might spawn one extra thread regardless of how many tasks there are, or it might execute inside wait(), or any other combination.

Also be aware that just because a library provides support for threading primitves means that it takes care of all the little gotchas when it comes to multithreaded programming.

I've had a similar thread a while back: http://www.gamedev.net/topic/679252-multithreading-c11-vs-openmp/

The little Job system i made there may have some advantages over your current approach:

You divide Stuff in four 'groups' - but what if one group finishes early and another takes very long? The Job system balances itself here (if the jobs are small enough).

Also, if those groups do very different things, they need to access differnt memory. With the Job system it's easy to parallelize a single workload to utilize cache better.

JoeJ, great link (I am printing your code.)

I agree that my approach (OP) is sub-optimal, it is only a prototype to taste the multicore power. :D

Edit: I took a look at JoeJ's topic http://www.gamedev.net/topic/679252-multithreading-c11-vs-openmp/

.... Using system-asynchronous calls is typically the easiest route to go, and the least error prone. .... (by frob)

Does it mean std::async?

This topic is closed to new replies.

Advertisement