Thread safe array

Started by
13 comments, last by RDragon1 11 years, 5 months ago
I've been trying to learn about multithreaded programming and I came to an interesting problem. How would you make a thread safe array? Similar to the standard library's vector. I tried wrapping it in locks like this:
[source lang="cpp"]template <typename T>
class SafeArray : public IObject
{
public:

void pushBack(const T& obj)
{
m_lock.lock();
m_memory.push_back(obj);
m_lock.unlock();
}

void pushFront(const T& obj)
{
m_lock.lock();
m_memory.push_front(obj);
m_lock.unlock();
}

T popBack()
{
m_lock.lock();

T ret = m_memory.back();
m_memory.pop_back();

m_lock.unlock();

return ret;
}

///@brief This function attempts to pop an element from the back of the array. It returns false if the element couldn't be popped, and true otherwise.
bool tryPop(T& out)
{
if ( m_lock.tryLock() )
{
out = m_memory.back();
m_memory.pop_back();

m_lock.unlock();

return true;
}

return false;
}

///@brief Element access by index.
T operator[](int idx)
{
m_lock.lock();//Might be able to get rid of these locks
T ret = m_memory[idx];
m_lock.unlock();
return ret;
}

///@brief Attempts to access the object at the index.
///@param idx The index of the object you want to access
///@param obj A pointer to the object you wanted to access
///@return Returns false if the array was locked and could not be accessed
///in this case out will be set to NULL. True if the memory could be accessed.
bool tryAccess(int idx, T* out)
{
out = NULL;

if ( m_lock.tryLock() )
{
out = m_memory[idx];

m_lock.unlock();

return true;
}

return false;
}

///@brief Returns the size of the array.
///Don't use this to iterate over the array if elements might have been removed during iterating.
unsigned int size()
{
m_lock.lock();
unsigned int ret = m_memory.size();
m_lock.unlock();
return ret;
}

///@brief Deletes all elements in the array.
void clear()
{
m_lock.lock();
m_memory.clear();
m_lock.unlock();
}

///@brief Erases one element from the array at the given index.
void erase(int idx)
{
m_lock.lock();
m_memory.erase(m_memory.begin+idx);
m_lock.unlock();
}

///@return True if the array contains the value passed in. False Otherwise.
bool contains(T val)
{
bool ret=false;

m_lock.lock();

for (unsigned int i=0; i<m_memory.size(); ++i)
{
if ( m_memory == val )
{
ret = true;
break;
}
}

m_lock.unlock();

return ret;
}

///@brief Searches through the array and removes the passed value.
///@param val The value to search the array for and remove.
///@return True if succesfull. False otherwise.
bool remove(T val)
{
bool ret=false;

m_lock.lock();

std::vector<T>::iterator i;
for (i = m_memory.begin(); i!=m_memory.end(); ++i)
{
if ( *i == val )
{
ret = true;
m_memory.erase(i);
break;
}
}

m_lock.unlock();

return ret;
}

private:

ThreadLock m_lock;

std::vector<T> m_memory;
};[/source]
But obviously that has major problems. If an object is added to the array, or removed from the array while a thread is iterating over it then it could end up causing problems. I know about thread safe queues and how they work (mostly). But if you have objects stored in an array that can't just be popped off of the queue every time you use them, what do you do? I did read something about an array that worked by keeping it's own internal array that threads just copied off of so that they could get kind of a snapshot of the array's contents at the time they copied, and the array could still be updated during other threads iterating.
Advertisement
It's very tough to build data structures that are thread safe in every conceivable use case and allow a wide variety of functionality.

If you really need to iterate over the array data, while other threads might be modifying it, then you have two options:
First, like you already said, make a copy and iterate over that. This won't work in all cases though, just imagine an array that stores pointers or references to objects. Those might not be valid anymore. Actually this is also an issue with all the methods that return elements of the array.

Secondly, acquire the array lock from the outside, iterate, release the lock. This works in any case, but requires the users of that array to know what they are doing...
Some solutions in order of preference:
A) Restructure the problem so you don't have different threads reading and writing to the array at the same time. Have each thread read/write it's own array.
B) Break the problem into passes, where many threads write at once, then there's a clear break, then many threads read once, etc... Then you don't have to worry about the array changing while someone is iterating through it.
C) Also add a lock to each element to the array. When iterating, you've got to lock the currently visited item before reading it, and other threads are unable to remove an item while it's locked for reading. You can use a "readers/writer lock" for this, where either multiple readers can lock it at once, or only 1 writer can lock it.

As above though, a general structure like this isn't a very useful piece of code. Instead you should deal with specific problems instead of general solutions. In parallel programming, general structures that can be used by every problem are always ugly and slow.
Do you have a specific problem in mind for this array?
You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.

You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.


Are you sure it has pop_back() and erase()? If they are implemented, they likely aren't thread-safe in any meaningful fashion. Many C++ lock-free 'vector's are implemented as ragged arrays, where each sub-array is twice as large as its predecessor (or similar). The other approach I've seen is a tree with a very high branch factor (e.g. 32).

Implementing concurrent element removal is fundamentally incompatible with these designs as far as I can tell, especially in C++ where value-based programming is the default.

There are also additional constraints on element types such as no-throw copy construction. Obviously there are other differences compared to regular vectors, such as non-contiguous elements (though locality is still good).

So, make sure you're using the container correctly!
if a threat manipulates the array, thus poping, leavinbg freeing. You must asure it will not afect an other thread (maybe serving freed data)!

Lock the array in a thread by lock king word, that so thread will hang unless other thread leaves out its lock keyword block.

This way you sunchronize manupulation of data by threads paralel.
Some additional notes about your existing implementation:

  • It's not exception safe. Using the scoped-locking idiom would solve most of the problems.
  • There's no way of asking it to 'atomically' pop the back element if there is one, else return false. tryPop() almost does this, except where there's contention. I also can't do "if (size() != 0) pop_back()", as I may be racing with another thread which does exactly the same thing.
  • Another example: "sz = size(); if (sz) erase(sz - 1);". There's no way I can ever make this code safe with your interface.

In general, you can't really make a data structure thread safe just by internally locking all methods. It's almost always the case that the interface must change, or at the very least additional assumptions/constraints on usage must be documented.

The problems that arise are related to the issues surrounding lock granularity. For example, even though you take a lock in each method, there's still no way of 'atomically' transferring an element from one SafeVector to another while keeping the sum of their size()s constant, which might be an invariant required for the correct implementation of a client class. In that case a lock would need to be shared between both vectors. And I'm not advocating passing in a mutex parameter to the constructor.

I'm in agreement with Hodgman that restructuring code/algorithms to make them unnecessary is often a better idea. But if/when the need for locks arise, I prefer to use something like this:


guarded<std::vector<X> > gx;
{
scoped_lock_ptr<std::vector<X> > p(gx); // the vector in gx can only be accessed through a scoped_lock_ptr

// while in this scope, the lock in gx is held.
p->push_back(X());
}


Now, I can just as easily create a structure containing two vectors and put one of those inside a guarded<>, allowing me to protect invariants spread across multiple data structures, if needed. In other words, we have taken granularity control away from the data structure and moved it to the algorithm, where it usually should be.
@Hodgman I'm trying to make sure that the objects in my scene graph can be accessed from other threads and have objects added to them. But I want to use locks as little as possible. Although, I guess adding child objects might be something that doesn't happen very often.

I don't ever use exceptions. I replaced my SafeArray with a queue class in all the places I could. I've got a thread pool system that uses it. It has a schedule function where you can schedule tasks that will be added to whichever thread's queue has the fewest. And then the worker thread just pops from the queue until it's empty then sleeps until it's woken for more work.
If you don't use exceptions then your class has a fundamental problem: it has no way of reporting errors in the majority of your member functions. In that case you shouldn't be using std::vector as the underlying layer for your container as std::vector uses exceptions for it's error signalling. It's hard to call a container "safe" if you can't be sure any member function you call on it succeeded or not.
Oh, I hadn't thought about the std::vector exceptions... I guess I'd better write a scoped lock class.

This topic is closed to new replies.

Advertisement