Sign in to follow this  
3DModelerMan

Thread safe array

Recommended Posts

3DModelerMan    1173
I've been trying to learn about multithreaded programming and I came to an interesting problem. How would you make a thread safe array? Similar to the standard library's vector. I tried wrapping it in locks like this:
[source lang="cpp"]template <typename T>
class SafeArray : public IObject
{
public:

void pushBack(const T& obj)
{
m_lock.lock();
m_memory.push_back(obj);
m_lock.unlock();
}

void pushFront(const T& obj)
{
m_lock.lock();
m_memory.push_front(obj);
m_lock.unlock();
}

T popBack()
{
m_lock.lock();

T ret = m_memory.back();
m_memory.pop_back();

m_lock.unlock();

return ret;
}

///@brief This function attempts to pop an element from the back of the array. It returns false if the element couldn't be popped, and true otherwise.
bool tryPop(T& out)
{
if ( m_lock.tryLock() )
{
out = m_memory.back();
m_memory.pop_back();

m_lock.unlock();

return true;
}

return false;
}

///@brief Element access by index.
T operator[](int idx)
{
m_lock.lock();//Might be able to get rid of these locks
T ret = m_memory[idx];
m_lock.unlock();
return ret;
}

///@brief Attempts to access the object at the index.
///@param idx The index of the object you want to access
///@param obj A pointer to the object you wanted to access
///@return Returns false if the array was locked and could not be accessed
///in this case out will be set to NULL. True if the memory could be accessed.
bool tryAccess(int idx, T* out)
{
out = NULL;

if ( m_lock.tryLock() )
{
out = m_memory[idx];

m_lock.unlock();

return true;
}

return false;
}

///@brief Returns the size of the array.
///Don't use this to iterate over the array if elements might have been removed during iterating.
unsigned int size()
{
m_lock.lock();
unsigned int ret = m_memory.size();
m_lock.unlock();
return ret;
}

///@brief Deletes all elements in the array.
void clear()
{
m_lock.lock();
m_memory.clear();
m_lock.unlock();
}

///@brief Erases one element from the array at the given index.
void erase(int idx)
{
m_lock.lock();
m_memory.erase(m_memory.begin+idx);
m_lock.unlock();
}

///@return True if the array contains the value passed in. False Otherwise.
bool contains(T val)
{
bool ret=false;

m_lock.lock();

for (unsigned int i=0; i<m_memory.size(); ++i)
{
if ( m_memory[i] == val )
{
ret = true;
break;
}
}

m_lock.unlock();

return ret;
}

///@brief Searches through the array and removes the passed value.
///@param val The value to search the array for and remove.
///@return True if succesfull. False otherwise.
bool remove(T val)
{
bool ret=false;

m_lock.lock();

std::vector<T>::iterator i;
for (i = m_memory.begin(); i!=m_memory.end(); ++i)
{
if ( *i == val )
{
ret = true;
m_memory.erase(i);
break;
}
}

m_lock.unlock();

return ret;
}

private:

ThreadLock m_lock;

std::vector<T> m_memory;
};[/source]
But obviously that has major problems. If an object is added to the array, or removed from the array while a thread is iterating over it then it could end up causing problems. I know about thread safe queues and how they work (mostly). But if you have objects stored in an array that can't just be popped off of the queue every time you use them, what do you do? I did read something about an array that worked by keeping it's own internal array that threads just copied off of so that they could get kind of a snapshot of the array's contents at the time they copied, and the array could still be updated during other threads iterating.

Share this post


Link to post
Share on other sites
Hodgman    51237
Some solutions in order of preference:
A) Restructure the problem so you don't have different threads reading and writing to the array at the same time. Have each thread read/write it's own array.
B) Break the problem into passes, where many threads write at once, then there's a clear break, then many threads read once, etc... Then you don't have to worry about the array changing while someone is iterating through it.
C) Also add a lock to each element to the array. When iterating, you've got to lock the currently visited item before reading it, and other threads are unable to remove an item while it's locked for reading. You can use a "[i]readers/writer lock[/i]" for this, where either multiple readers can lock it at once, or only 1 writer can lock it.

As above though, a general structure like this isn't a very useful piece of code. Instead you should deal with specific problems instead of general solutions. In parallel programming, general structures that can be used by every problem are always ugly and slow.
Do you have a specific problem in mind for this array?

Share this post


Link to post
Share on other sites
Ripiz    539
You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.

Share this post


Link to post
Share on other sites
the_edd    2109
[quote name='Ripiz' timestamp='1350067545' post='4989550']
You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.
[/quote]

Are you sure it has pop_back() and erase()? If they are implemented, they likely aren't thread-safe in any meaningful fashion. Many C++ lock-free 'vector's are implemented as ragged arrays, where each sub-array is twice as large as its predecessor (or similar). The other approach I've seen is a tree with a very high branch factor (e.g. 32).

Implementing concurrent element removal is fundamentally incompatible with these designs as far as I can tell, especially in C++ where value-based programming is the default.

There are also additional constraints on element types such as no-throw copy construction. Obviously there are other differences compared to regular vectors, such as non-contiguous elements (though locality is still good).

So, make sure you're using the container correctly!

Share this post


Link to post
Share on other sites
JohnnyCode    1046
if a threat manipulates the array, thus poping, leavinbg freeing. You must asure it will not afect an other thread (maybe serving freed data)!

Lock the array in a thread by lock king word, that so thread will hang unless other thread leaves out its lock keyword block.

This way you sunchronize manupulation of data by threads paralel.

Share this post


Link to post
Share on other sites
the_edd    2109
Some additional notes about your existing implementation:
[list]
[*]It's not exception safe. Using the scoped-locking idiom would solve most of the problems.
[*]There's no way of asking it to 'atomically' pop the back element if there is one, else return false. tryPop() almost does this, except where there's contention. I also can't do "if (size() != 0) pop_back()", as I may be racing with another thread which does exactly the same thing.
[*]Another example: "sz = size(); if (sz) erase(sz - 1);". There's no way I can ever make this code safe with your interface.
[/list]
In general, you can't really make a data structure thread safe just by internally locking all methods. It's almost always the case that the interface must change, or at the very least additional assumptions/constraints on usage must be documented.

The problems that arise are related to the issues surrounding lock granularity. For example, even though you take a lock in each method, there's still no way of 'atomically' transferring an element from one SafeVector to another while keeping the sum of their size()s constant, which might be an invariant required for the correct implementation of a client class. In that case a lock would need to be shared between both vectors. And I'm not advocating passing in a mutex parameter to the constructor.

I'm in agreement with Hodgman that restructuring code/algorithms to make them unnecessary is often a better idea. But if/when the need for locks arise, I prefer to use something like this:

[code]
guarded<std::vector<X> > gx;
{
scoped_lock_ptr<std::vector<X> > p(gx); // the vector in gx can only be accessed through a scoped_lock_ptr

// while in this scope, the lock in gx is held.
p->push_back(X());
}
[/code]

Now, I can just as easily create a structure containing two vectors and put one of those inside a guarded<>, allowing me to protect invariants spread across multiple data structures, if needed. In other words, we have taken granularity control away from the [i]data[/i] [i]structure[/i] and moved it to the [i]algorithm[/i], where it usually should be[i].[/i]

Share this post


Link to post
Share on other sites
3DModelerMan    1173
@Hodgman I'm trying to make sure that the objects in my scene graph can be accessed from other threads and have objects added to them. But I want to use locks as little as possible. Although, I guess adding child objects might be something that doesn't happen very often.

I don't ever use exceptions. I replaced my SafeArray with a queue class in all the places I could. I've got a thread pool system that uses it. It has a schedule function where you can schedule tasks that will be added to whichever thread's queue has the fewest. And then the worker thread just pops from the queue until it's empty then sleeps until it's woken for more work.

Share this post


Link to post
Share on other sites
SiCrane    11839
If you don't use exceptions then your class has a fundamental problem: it has no way of reporting errors in the majority of your member functions. In that case you shouldn't be using std::vector as the underlying layer for your container as std::vector uses exceptions for it's error signalling. It's hard to call a container "safe" if you can't be sure any member function you call on it succeeded or not.

Share this post


Link to post
Share on other sites
Hodgman    51237
[quote name='3DModelerMan' timestamp='1350138980' post='4989777']
@Hodgman I'm trying to make sure that the objects in my scene graph can be accessed from other threads and have objects added to them. But I want to use locks as little as possible. Although, I guess adding child objects might be something that doesn't happen very often.
[/quote]If objects can only be added, but not removed, then things are a bit simpler. You can allocate the new object from a thread-safe pool, initialize the new object, and then atomically set a pointer to it in the parent object.
...however, now if someone is iterating the graph at the same time that someone is adding nodes, it's random as to whether the new nodes will be iterated or not. So I'd still recommend you break your program into different passes/stages, e.g. a read stage and a modify stage. Edited by Hodgman

Share this post


Link to post
Share on other sites
3DModelerMan    1173
Well if I break it up into different stages then wouldn't that be basically the same as having a serial program? Or do you mean something like: a node has a list of children, but when you call addChild it would instead add the child to another list that gets merged with the main "update" list in the beginning of the node's update function? And then I could do the same thing for removals too right? So any modifications to the list would be queued up and deferred until the beginning of the object's update, before any iterating was done in the frame. Or by breaking it into stages do you mean I need to radically alter the entire architecture of my engine?

Share this post


Link to post
Share on other sites
Hodgman    51237
[quote name='3DModelerMan' timestamp='1350224511' post='4990038']
Well if I break it up into different stages then wouldn't that be basically the same as having a serial program? Or do you mean something like: a node has a list of children, but when you call addChild it would instead add the child to another list that gets merged with the main "update" list in the beginning of the node's update function? And then I could do the same thing for removals too right? So any modifications to the list would be queued up and deferred until the beginning of the object's update, before any iterating was done in the frame. Or by breaking it into stages do you mean I need to radically alter the entire architecture of my engine?
[/quote]Sorry I missed this reply.
Yes, queueing up modifications instead of performing them immediately is a good way to break up processing into several stages and reduce the amount of communication between threads.

Also, breaking algorithms into serial stages isn't the same as a serial program -- often many threads can contribute to each stage, and different threads can be working on different problems at the same time.
e.g. say we've got a single-threaded function, C, and two functions A & B that can be completed by parallel worker threads. Let's also say that A & B can also be split into 2 stages, and the code we're trying to execute looks like:
result = C( A(), B() )
Given 3 worker threads, their progress over time (vertical) could look like:
[code]#0 #1 #2
A1 A1 B1
B1 B1 A1
A2 A2 B2
B2 B2 A2
C .wait.[/code]

Share this post


Link to post
Share on other sites
iMalc    2466
I once read a wise statement on a forum saying that a container generally cannot make itself threadsafe on behalf of its client.

The thread safety generally needs to be done by the code using the container, because it inevitably needs to lock the container whilst performing more than one action with it. Thus this is a flawed endevaour, a "fool's errand" so to speak.

Share this post


Link to post
Share on other sites
RDragon1    1205
To write scalable parallel code, the answer isn't to take serial code and replace the data structures with 'thread-safe' versions that do the same operations. If you're resorting to using locks, then you're already down the wrong path. The right path is to create algorithms that don't need read+write access to shared data, or to constrain those stages of your algorithm to as small a piece as possible, but still extract parallelism where you can. The data transform you're performing dictates the data structures and algorithms, and a std::vector with locks in every member function is likely a terrible structure.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this