vector push_back creates error when multi threading

Started by
13 comments, last by SmkViper 8 years, 6 months ago

void doit(std::vector<int>& d){
	std::mutex m;
	{
		std::lock_guard<std::mutex> guard(m);
		for (int i = 0; i < 10000;i++)
			d.push_back(15);
	}
	
	
}

If I execute that function on 2 threads, with a vector from the main app thread as a reference, then I get all kinds of heap & memory errors.

But why? The vector d is locked when that push_back takes place.

Advertisement

Make the mutex global/static. The way you have it, the mutex exists in the scope of this function only, which is not shared between different threads, meaning there is no actual locking happening at all cross-threads.

Also, unless this is an example for how to produce this kind of errors - having a lock like this makes threads literally worthless. Any one thread that grabs the lock the first time will push its 10000 elements, while the other one has to wait. You could have literally used sequential coding in this case. Just though I add, don't know if you are aware of this.

To further explain, mutex is not a magical thing that exists to solve all threading problems, but a simple language construct. Essentially, mutex is equivalent to:


void doit(std::vector<int>& d){
    static bool locked = false;
    {
        while(locked)
        {
            sleep(1);
        }
        
        locked = true; // atomic
        for (int i = 0; i < 10000;i++)
            d.push_back(15);
        locked = false;
    }

Only that it is implemented more efficiently, for example by having the lock be atomic, so that two threads cannot interfere. This should make it pretty clear already why first of all you have to share the mutex explicitely between threads. Also you have to make sure you do not deadlock your other threads by locking too often or in the wrong places. I'm not too much of an expert on threading, but essencially it should be better in your above example to handle the lock like this:


void doit(std::vector<int>& d){
    static std::mutex m;
    {
        std::vector<int> dTemp;
        for (int i = 0; i < 10000;i++)
            dTemp.push_back(15);
            
        std::lock_guard<std::mutex> guard(m);
        d += dTemp;
    }
}

This allows all threads to do their work in parallel, and only write their output once the thread is finished. It might not be better, depending on how the appending of the array is implemented, but in more real-world applications (ie. the more complicated the work you do inside the threads is becoming), from my understanding this should yield better results. Someone correct me if I'm wrong.

No. Every time you call doit, the function creates its own mutex. As written, it is completely useless and just wastes CPU cycles. No blocking will ever happen.

You need to do something like this:


std::mutex my_global_doit_mutex;

void doit(std::vector<int>& d)
{
   std::lock_guard<std::mutex> guard(my_global_doit_mutex);
   for (int i = 0; i < 10000;i++)
      d.push_back(15);
}
This is the simplest possible example.

thanks for the answer. And no, im not using this code, it was just a very simple replication of the problem.

I'm not too much of an expert on threading, but essencially it should be better in your above example to handle the lock like this:




void doit(std::vector<int>& d){
    static std::mutex m;
    {
        std::vector<int> dTemp;
        for (int i = 0; i < 10000;i++)
            dTemp.push_back(15);
            
        std::lock_guard<std::mutex> guard(m);
        d += dTemp;
    }
}

Side note on this - if you're using a MS compiler older than 2015, or any other not-C++11-conforming compiler this will not work as the initialization of the mutex will not be correctly handled if two threads call the function for the first time at the same time.

(Look for your compiler's implementation of "Magic Statics")

On non-conforming compilers you'll want to move the variable out of the function entirely.

And ideally, you pass the mutex as a parameter.

Statics/globals are the enemy of multi-threading, and having that singular mutex pretty much ensures that you are not, in fact parallelising much of anything.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

having that singular mutex pretty much ensures that you are not, in fact parallelising much of anything


It would all depend on how much contention there is for the doit function in real code. The threads could be doing a billion other things not controlled by that (or any) lock.

Sean Middleditch – Game Systems Engineer – Join my team!


It would all depend on how much contention there is for the doit function in real code. The threads could be doing a billion other things not controlled by that (or any) lock.

Yes, but the point is that the caller won't know whether or not anyone else is calling (because they can't control who has a reference to the mutex). And even beyond that, your minimum unit of parallelisation is now 1.

If you really have a solitary resource (and very few resources are actually singletons in practice), then a more nuanced approach would be to provide a queue or requests, and a worker thread executing on the queue, so that at least you can let N threads past the call site (where N is the maximum length of the queue).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

And ideally, you pass the mutex as a parameter.

Statics/globals are the enemy of multi-threading, and having that singular mutex pretty much ensures that you are not, in fact parallelising much of anything.

Or the better option would be for this doit function to be a member function instead, and the mutex is a member variable.

That way the caller can't pass in the wrong mutex.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

Or the better option would be for this doit function to be a member function instead, and the mutex is a member variable.
That way the caller can't pass in the wrong mutex.

(we're pretty far off-topic at this point, and I should probably create a new thread, but with apologies to the OP...)

You've now gone the classic OOP route, and encapsulated the mutex such that the caller can never know who else has access to it. This might be fine for an inner class, or a class which is used by a single consumer. For any shared object though, this is about as bad as the global mutex.

Data is important, people. Always know where your data is, who has access to it, what operations they are performing on it. Hiding it all away inside a shared object is like sticking your head in the sand.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement