Jump to content
  • Advertisement
Sign in to follow this  
hyyou

Is C++11's library a magic bullet for ALL multicore/multithread programming?

This topic is 827 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am under an impression that std::async and std::mutex can solve every basic-to-intermediate problem for multi-threading.

 

The multi-threading (in this scope) is to

 

1. increase performance of multi-core (to use all CPU-core)

2. avoid CPU stall e.g. L2 cache miss, especially when execute a ton of indirection (a->b->c->d)

 

Here is an example :-

   std::vector<std::future> v;
   v.push_back(std::async( &A::doThis, pointerToAInstance  )); 
   v.push_back(std::async( &B::doThis, pointerToBInstance  )); 
   v.get(0).wait();
   v.get(1).wait();

This can be easily applied to a scenario that I have a long array of objects that need to be expensively processed.

 

If I want to split jobs again and again, I can std::async within std::async within std::async ...... look nice.

 

By the way, I heard that OpenMP is a popular library for multicore programming (https://en.wikipedia.org/wiki/OpenMP).

However, as I am researching, I also found that most of its feature can be replaced by simplier std::async and std::mutex.

OpenMP is a heavy weight library that just provides syntactic sugar, right?

 

I feel that I missed something.

 

Edit

 

@Josh Petrie and @SeanMiddledtich , thank for useful information.  

Here is my specific experiment :-

 

My specific application is:-

 

I created my component-based-architecture game-engine.  

I have little experience in multi-threading, but the architecture tend to support it inherently - it divided jobs into many systems (about 13 systems).

Each and every system manages and accesses at most 2-3 components. (whole game = 20 components) 

Therefore, most of systems can be executed in parallel.

 

Step 1

 

After I learned a bit of std::async and std::mutex, I use these two functions to make it parallel.

I created std::future from each system, then wait() at the end of each timestep.

 

The modification is very conservative and highly-aware of concurrent modification and the order of execution, e.g. physic cannot be grouped with graphic.

Just the rough modification (4 chosen systems run in parellels, other systems were executed one by one as before.) can increase my program's performance from 40->50 fps.

 

Step 2

 

When I found a bottleneck in a certain system, I divided all components of a certain type into many groups (about 4), then process each group as a thread.

If it has to access some share data, which is not so often, I used std::mutex.

Again, after modification in the single most CPU-intensive function, I gained overall performance from 50->65 fps.

 

These two experiments created feeling that it is free for me.   It is so easy ... that looks like a trap.

The reason might be that I had tried to optimize my program by using Pooling for a few weeks, performance increase only 5%, while this multi-core takes only 1 day to learn & code.

 

Edit2: Yay, thank everyone for great knowledge!, I should post this sooner.  spamming +1

Edited by hyyou

Share this post


Link to post
Share on other sites
Advertisement

https://www.threadingbuildingblocks.org/ this is pretty cool library for parallelism. A year ago there was a post that states that the library is no longer under GPL license and can be freely used for commercial projects, however I cannot find a version of that library on the the internet with such license. 

Otherwise i really like tbb ^_^

Edited by imoogiBG

Share this post


Link to post
Share on other sites
They have added the libstdc++ runtime exception. From how I understand it that should allow you to link everything together (even completely statically) without any license side effects on your code.

Share this post


Link to post
Share on other sites
In respect of std::async, you should be aware that it does not guarantee that it executes in a separate thread. The function is executed asynchronously, that means not at the time its name appears in the source code, it does not stall the program's flow until it has completed. That's all. That does not necessarily mean that execution takes place concurrently to the main thread or concurrently to any other async functions. Strictly speaking, it is legal for std::async to synchronously call the function when you call wait() -- which does stall the program's flow. Indeed that is a good thing, too. Naively throwing a sheer infinite number of threads at a problem is almost never a good solution (it works on a GPU, which is explicitly made for that purpose), and it often reduces performance rather than improve it. Context switches and caches are two things to think of here, but also thread creation if you do not pull them from a pool. A good implementation of std::async would arguably work off its functions from a queue using a pool of workers at a concurrency level less than number-of-cores. But you don't havea a guarantee for that, it might as well simply spawn a thread every time, it might spawn one extra thread regardless of how many tasks there are, or it might execute inside wait(), or any other combination.

Share this post


Link to post
Share on other sites

Also be aware that just because a library provides support for threading primitves means that it takes care of all the little gotchas when it comes to multithreaded programming.

Share this post


Link to post
Share on other sites

I've had a similar thread a while back: http://www.gamedev.net/topic/679252-multithreading-c11-vs-openmp/

 

The little Job system i made there may have some advantages over your current approach:

You divide Stuff in four 'groups' - but what if one group finishes early and another takes very long? The Job system balances itself here (if the jobs are small enough).

Also, if those groups do very different things, they need to access differnt memory. With the Job system it's easy to parallelize a single workload to utilize cache better.

Share this post


Link to post
Share on other sites

JoeJ, great link (I am printing your code.) 

I agree that my approach (OP) is sub-optimal, it is only a prototype to taste the multicore power.   :D

Edit: I took a look at JoeJ's topic http://www.gamedev.net/topic/679252-multithreading-c11-vs-openmp/

.... Using system-asynchronous calls is typically the easiest route to go, and the least error prone.   .... (by frob)

Does it mean std::async?  

Edited by hyyou

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!