Sign in to follow this  
noatom

vector push_back creates error when multi threading

Recommended Posts

noatom    927
void doit(std::vector<int>& d){
	std::mutex m;
	{
		std::lock_guard<std::mutex> guard(m);
		for (int i = 0; i < 10000;i++)
			d.push_back(15);
	}
	
	
}

If I execute that function on 2 threads, with a vector from the main app thread as a reference, then I get all kinds of heap & memory errors.

But why?  The vector d is locked when that push_back takes place. 

 

Share this post


Link to post
Share on other sites
BitMaster    8651
No. Every time you call doit, the function creates its own mutex. As written, it is completely useless and just wastes CPU cycles. No blocking will ever happen.

You need to do something like this:

std::mutex my_global_doit_mutex;

void doit(std::vector<int>& d)
{
   std::lock_guard<std::mutex> guard(my_global_doit_mutex);
   for (int i = 0; i < 10000;i++)
      d.push_back(15);
}
This is the simplest possible example.

Share this post


Link to post
Share on other sites
SmkViper    5396

I'm not too much of an expert on threading, but essencially it should be better in your above example to handle the lock like this:



void doit(std::vector<int>& d){
    static std::mutex m;
    {
        std::vector<int> dTemp;
        for (int i = 0; i < 10000;i++)
            dTemp.push_back(15);
            
        std::lock_guard<std::mutex> guard(m);
        d += dTemp;
    }
}

Side note on this - if you're using a MS compiler older than 2015, or any other not-C++11-conforming compiler this will not work as the initialization of the mutex will not be correctly handled if two threads call the function for the first time at the same time.

(Look for your compiler's implementation of "Magic Statics")

On non-conforming compilers you'll want to move the variable out of the function entirely.

Share this post


Link to post
Share on other sites
swiftcoder    18437


It would all depend on how much contention there is for the doit function in real code. The threads could be doing a billion other things not controlled by that (or any) lock.

Yes, but the point is that the caller won't know whether or not anyone else is calling (because they can't control who has a reference to the mutex). And even beyond that, your minimum unit of parallelisation is now 1.

 

If you really have a solitary resource (and very few resources are actually singletons in practice), then a more nuanced approach would be to provide a queue or requests, and a worker thread executing on the queue, so that at least you can let N threads past the call site (where N is the maximum length of the queue).

Share this post


Link to post
Share on other sites
iMalc    2466

And ideally, you pass the mutex as a parameter.

 

Statics/globals are the enemy of multi-threading, and having that singular mutex pretty much ensures that you are not, in fact parallelising much of anything.

Or the better option would be for this doit function to be a member function instead, and the mutex is a member variable.

That way the caller can't pass in the wrong mutex.

Share this post


Link to post
Share on other sites
swiftcoder    18437

Or the better option would be for this doit function to be a member function instead, and the mutex is a member variable.
That way the caller can't pass in the wrong mutex.

(we're pretty far off-topic at this point, and I should probably create a new thread, but with apologies to the OP...)

You've now gone the classic OOP route, and encapsulated the mutex such that the caller can never know who else has access to it. This might be fine for an inner class, or a class which is used by a single consumer. For any shared object though, this is about as bad as the global mutex.

Data is important, people. Always know where your data is, who has access to it, what operations they are performing on it. Hiding it all away inside a shared object is like sticking your head in the sand.

Share this post


Link to post
Share on other sites
SmkViper    5396

Or the better option would be for this doit function to be a member function instead, and the mutex is a member variable.
That way the caller can't pass in the wrong mutex.

(we're pretty far off-topic at this point, and I should probably create a new thread, but with apologies to the OP...)

You've now gone the classic OOP route, and encapsulated the mutex such that the caller can never know who else has access to it. This might be fine for an inner class, or a class which is used by a single consumer. For any shared object though, this is about as bad as the global mutex.

Data is important, people. Always know where your data is, who has access to it, what operations they are performing on it. Hiding it all away inside a shared object is like sticking your head in the sand.


I'm confused.

If you already are sharing the data, why aren't you bundling the mutex with it? I can't think of any legitimate use for keeping them separate since protecting the same memory location with two separate mutexes might as well not be locked at all.

Unless I've misread something entirely.

Share this post


Link to post
Share on other sites
swiftcoder    18437


If you already are sharing the data, why aren't you bundling the mutex with it? I can't think of any legitimate use for keeping them separate since protecting the same memory location with two separate mutexes might as well not be locked at all.

If you bundle both the lock and the data together in one object, you've just said that "anyone with a reference to this object can access this at any time, but you can block indefinitely depending on the number of references".

 

That's perfectly fine if that is in fact your intent. But it's a very weird intent, and probably not what most people think of when they are trying to parallelise their system.

 

As you correctly surmise, the way out of this quandary is to stop sharing your data with the world. And what's the best way to enforce that? Quit sharing the mutex with the world.

 

if you encapsulate the mutex in the object, it's very easy to make a mistake and lock the mutex from unexpected places. If you have to explicitly pass both the object and the mutex, and the client has to explicitly lock the mutex... Suddenly you have a clear audit trail of who is taking locks and why.

Share this post


Link to post
Share on other sites
SmkViper    5396

If you already are sharing the data, why aren't you bundling the mutex with it? I can't think of any legitimate use for keeping them separate since protecting the same memory location with two separate mutexes might as well not be locked at all.

If you bundle both the lock and the data together in one object, you've just said that "anyone with a reference to this object can access this at any time, but you can block indefinitely depending on the number of references".
 
That's perfectly fine if that is in fact your intent. But it's a very weird intent, and probably not what most people think of when they are trying to parallelise their system.
 
As you correctly surmise, the way out of this quandary is to stop sharing your data with the world. And what's the best way to enforce that? Quit sharing the mutex with the world.
 
if you encapsulate the mutex in the object, it's very easy to make a mistake and lock the mutex from unexpected places. If you have to explicitly pass both the object and the mutex, and the client has to explicitly lock the mutex... Suddenly you have a clear audit trail of who is taking locks and why.


I'm still not understanding you - though we're off-topic, so I'll take it to PMs

Share this post


Link to post
Share on other sites
Juliean    7077


I'm still not understanding you - though we're off-topic, so I'll take it to PMs

 

I don't think this is too far OT, and/or at least someone coming across this some time later should understand it.

 

So (if even I understand it correctly) what he means is that since the function already operates on independant data, the mutex should not be shared. Meaning that you can do stuff like:

int functionA()
{
    std::vector<int> vSomeData;
    std::thread worker(doit(vSomeData)); 
}

int functionB()
{
    std::vector<int> vSomeOtherData;
    std::thread worker(doit(vSomeOtherData);
}

Now say you have the mutex static/global/class instance, whether. functionA and functionB might not know each other, they might lie in completely different parts of the program or in a different module for all that matters. In the case of a "shared" mutex, the thread of A and B could still block due to the other thread locking the shared mutex. Now if you do this:

int functionA()
{
    std::mutex someMutex;
    std::vector<int> vSomeData;
    std::thread worker(doit(vSomeData, someMutex)); 
}

int functionB()
{
    std::mutex otherMutex;
    std::vector<int> vSomeOtherData;
    std::thread worker(doit(vSomeOtherData, otherMutex);
}

A and B run entirely independantly of each other without blocking. You need to have a shared mutex if the function operates on shared data, but this is appearently better to be avoided. Please correct me if I'm wrong, I also heard of this from here first (though it makes sense to me).

Share this post


Link to post
Share on other sites
SmkViper    5396

A and B run entirely independantly of each other without blocking. You need to have a shared mutex if the function operates on shared data, but this is appearently better to be avoided. Please correct me if I'm wrong, I also heard of this from here first (though it makes sense to me).


Yeah, that's basically what I asked for clarification on.

Avoiding locking is great if you can get away with it, but I prefer to bundle locks with the shared data they lock rather than trying to make sure I passed the right parameter. (Assuming you can figure out the correct lock granularity)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this