Jump to content

  • Log In with Google      Sign In   
  • Create Account

Thread safe array


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 3DModelerMan   Members   -  Reputation: 1001

Like
0Likes
Like

Posted 12 October 2012 - 09:01 AM

I've been trying to learn about multithreaded programming and I came to an interesting problem. How would you make a thread safe array? Similar to the standard library's vector. I tried wrapping it in locks like this:
[source lang="cpp"]template <typename T> class SafeArray : public IObject { public: void pushBack(const T& obj) { m_lock.lock(); m_memory.push_back(obj); m_lock.unlock(); } void pushFront(const T& obj) { m_lock.lock(); m_memory.push_front(obj); m_lock.unlock(); } T popBack() { m_lock.lock(); T ret = m_memory.back(); m_memory.pop_back(); m_lock.unlock(); return ret; } ///@brief This function attempts to pop an element from the back of the array. It returns false if the element couldn't be popped, and true otherwise. bool tryPop(T& out) { if ( m_lock.tryLock() ) { out = m_memory.back(); m_memory.pop_back(); m_lock.unlock(); return true; } return false; } ///@brief Element access by index. T operator[](int idx) { m_lock.lock();//Might be able to get rid of these locks T ret = m_memory[idx]; m_lock.unlock(); return ret; } ///@brief Attempts to access the object at the index. ///@param idx The index of the object you want to access ///@param obj A pointer to the object you wanted to access ///@return Returns false if the array was locked and could not be accessed ///in this case out will be set to NULL. True if the memory could be accessed. bool tryAccess(int idx, T* out) { out = NULL; if ( m_lock.tryLock() ) { out = m_memory[idx]; m_lock.unlock(); return true; } return false; } ///@brief Returns the size of the array. ///Don't use this to iterate over the array if elements might have been removed during iterating. unsigned int size() { m_lock.lock(); unsigned int ret = m_memory.size(); m_lock.unlock(); return ret; } ///@brief Deletes all elements in the array. void clear() { m_lock.lock(); m_memory.clear(); m_lock.unlock(); } ///@brief Erases one element from the array at the given index. void erase(int idx) { m_lock.lock(); m_memory.erase(m_memory.begin+idx); m_lock.unlock(); } ///@return True if the array contains the value passed in. False Otherwise. bool contains(T val) { bool ret=false; m_lock.lock(); for (unsigned int i=0; i<m_memory.size(); ++i) { if ( m_memory[i] == val ) { ret = true; break; } } m_lock.unlock(); return ret; } ///@brief Searches through the array and removes the passed value. ///@param val The value to search the array for and remove. ///@return True if succesfull. False otherwise. bool remove(T val) { bool ret=false; m_lock.lock(); std::vector<T>::iterator i; for (i = m_memory.begin(); i!=m_memory.end(); ++i) { if ( *i == val ) { ret = true; m_memory.erase(i); break; } } m_lock.unlock(); return ret; } private: ThreadLock m_lock; std::vector<T> m_memory; };[/source]
But obviously that has major problems. If an object is added to the array, or removed from the array while a thread is iterating over it then it could end up causing problems. I know about thread safe queues and how they work (mostly). But if you have objects stored in an array that can't just be popped off of the queue every time you use them, what do you do? I did read something about an array that worked by keeping it's own internal array that threads just copied off of so that they could get kind of a snapshot of the array's contents at the time they copied, and the array could still be updated during other threads iterating.

Sponsor:

#2 Rattenhirn   Crossbones+   -  Reputation: 1749

Like
5Likes
Like

Posted 12 October 2012 - 09:17 AM

It's very tough to build data structures that are thread safe in every conceivable use case and allow a wide variety of functionality.

If you really need to iterate over the array data, while other threads might be modifying it, then you have two options:
First, like you already said, make a copy and iterate over that. This won't work in all cases though, just imagine an array that stores pointers or references to objects. Those might not be valid anymore. Actually this is also an issue with all the methods that return elements of the array.

Secondly, acquire the array lock from the outside, iterate, release the lock. This works in any case, but requires the users of that array to know what they are doing...

#3 Hodgman   Moderators   -  Reputation: 30351

Like
3Likes
Like

Posted 12 October 2012 - 09:48 AM

Some solutions in order of preference:
A) Restructure the problem so you don't have different threads reading and writing to the array at the same time. Have each thread read/write it's own array.
B) Break the problem into passes, where many threads write at once, then there's a clear break, then many threads read once, etc... Then you don't have to worry about the array changing while someone is iterating through it.
C) Also add a lock to each element to the array. When iterating, you've got to lock the currently visited item before reading it, and other threads are unable to remove an item while it's locked for reading. You can use a "readers/writer lock" for this, where either multiple readers can lock it at once, or only 1 writer can lock it.

As above though, a general structure like this isn't a very useful piece of code. Instead you should deal with specific problems instead of general solutions. In parallel programming, general structures that can be used by every problem are always ugly and slow.
Do you have a specific problem in mind for this array?

#4 Ripiz   Members   -  Reputation: 529

Like
0Likes
Like

Posted 12 October 2012 - 12:45 PM

You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.

#5 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 12 October 2012 - 12:59 PM

You could try to see how Intel TBB/Visual Studio 2012 has it done. It's huge mess of templates but concurrency::concurent_vector<> never failed me. push_back, pop_back, erase, iterators, nothing ever breaks.


Are you sure it has pop_back() and erase()? If they are implemented, they likely aren't thread-safe in any meaningful fashion. Many C++ lock-free 'vector's are implemented as ragged arrays, where each sub-array is twice as large as its predecessor (or similar). The other approach I've seen is a tree with a very high branch factor (e.g. 32).

Implementing concurrent element removal is fundamentally incompatible with these designs as far as I can tell, especially in C++ where value-based programming is the default.

There are also additional constraints on element types such as no-throw copy construction. Obviously there are other differences compared to regular vectors, such as non-contiguous elements (though locality is still good).

So, make sure you're using the container correctly!

#6 JohnnyCode   Members   -  Reputation: 214

Like
-1Likes
Like

Posted 12 October 2012 - 10:26 PM

if a threat manipulates the array, thus poping, leavinbg freeing. You must asure it will not afect an other thread (maybe serving freed data)!

Lock the array in a thread by lock king word, that so thread will hang unless other thread leaves out its lock keyword block.

This way you sunchronize manupulation of data by threads paralel.

#7 e‍dd   Members   -  Reputation: 2105

Like
3Likes
Like

Posted 13 October 2012 - 08:10 AM

Some additional notes about your existing implementation:
  • It's not exception safe. Using the scoped-locking idiom would solve most of the problems.
  • There's no way of asking it to 'atomically' pop the back element if there is one, else return false. tryPop() almost does this, except where there's contention. I also can't do "if (size() != 0) pop_back()", as I may be racing with another thread which does exactly the same thing.
  • Another example: "sz = size(); if (sz) erase(sz - 1);". There's no way I can ever make this code safe with your interface.
In general, you can't really make a data structure thread safe just by internally locking all methods. It's almost always the case that the interface must change, or at the very least additional assumptions/constraints on usage must be documented.

The problems that arise are related to the issues surrounding lock granularity. For example, even though you take a lock in each method, there's still no way of 'atomically' transferring an element from one SafeVector to another while keeping the sum of their size()s constant, which might be an invariant required for the correct implementation of a client class. In that case a lock would need to be shared between both vectors. And I'm not advocating passing in a mutex parameter to the constructor.

I'm in agreement with Hodgman that restructuring code/algorithms to make them unnecessary is often a better idea. But if/when the need for locks arise, I prefer to use something like this:

guarded<std::vector<X> > gx;
{
    scoped_lock_ptr<std::vector<X> > p(gx); // the vector in gx can only be accessed through a scoped_lock_ptr

    // while in this scope, the lock in gx is held.
    p->push_back(X());
}

Now, I can just as easily create a structure containing two vectors and put one of those inside a guarded<>, allowing me to protect invariants spread across multiple data structures, if needed. In other words, we have taken granularity control away from the data structure and moved it to the algorithm, where it usually should be.

#8 3DModelerMan   Members   -  Reputation: 1001

Like
0Likes
Like

Posted 13 October 2012 - 08:36 AM

@Hodgman I'm trying to make sure that the objects in my scene graph can be accessed from other threads and have objects added to them. But I want to use locks as little as possible. Although, I guess adding child objects might be something that doesn't happen very often.

I don't ever use exceptions. I replaced my SafeArray with a queue class in all the places I could. I've got a thread pool system that uses it. It has a schedule function where you can schedule tasks that will be added to whichever thread's queue has the fewest. And then the worker thread just pops from the queue until it's empty then sleeps until it's woken for more work.

#9 SiCrane   Moderators   -  Reputation: 9594

Like
2Likes
Like

Posted 13 October 2012 - 09:02 AM

If you don't use exceptions then your class has a fundamental problem: it has no way of reporting errors in the majority of your member functions. In that case you shouldn't be using std::vector as the underlying layer for your container as std::vector uses exceptions for it's error signalling. It's hard to call a container "safe" if you can't be sure any member function you call on it succeeded or not.

#10 3DModelerMan   Members   -  Reputation: 1001

Like
0Likes
Like

Posted 13 October 2012 - 12:05 PM

Oh, I hadn't thought about the std::vector exceptions... I guess I'd better write a scoped lock class.

#11 Hodgman   Moderators   -  Reputation: 30351

Like
1Likes
Like

Posted 13 October 2012 - 08:26 PM

@Hodgman I'm trying to make sure that the objects in my scene graph can be accessed from other threads and have objects added to them. But I want to use locks as little as possible. Although, I guess adding child objects might be something that doesn't happen very often.

If objects can only be added, but not removed, then things are a bit simpler. You can allocate the new object from a thread-safe pool, initialize the new object, and then atomically set a pointer to it in the parent object.
...however, now if someone is iterating the graph at the same time that someone is adding nodes, it's random as to whether the new nodes will be iterated or not. So I'd still recommend you break your program into different passes/stages, e.g. a read stage and a modify stage.

Edited by Hodgman, 13 October 2012 - 08:44 PM.


#12 3DModelerMan   Members   -  Reputation: 1001

Like
0Likes
Like

Posted 14 October 2012 - 08:21 AM

Well if I break it up into different stages then wouldn't that be basically the same as having a serial program? Or do you mean something like: a node has a list of children, but when you call addChild it would instead add the child to another list that gets merged with the main "update" list in the beginning of the node's update function? And then I could do the same thing for removals too right? So any modifications to the list would be queued up and deferred until the beginning of the object's update, before any iterating was done in the frame. Or by breaking it into stages do you mean I need to radically alter the entire architecture of my engine?

#13 Hodgman   Moderators   -  Reputation: 30351

Like
0Likes
Like

Posted 04 November 2012 - 08:22 PM

Well if I break it up into different stages then wouldn't that be basically the same as having a serial program? Or do you mean something like: a node has a list of children, but when you call addChild it would instead add the child to another list that gets merged with the main "update" list in the beginning of the node's update function? And then I could do the same thing for removals too right? So any modifications to the list would be queued up and deferred until the beginning of the object's update, before any iterating was done in the frame. Or by breaking it into stages do you mean I need to radically alter the entire architecture of my engine?

Sorry I missed this reply.
Yes, queueing up modifications instead of performing them immediately is a good way to break up processing into several stages and reduce the amount of communication between threads.

Also, breaking algorithms into serial stages isn't the same as a serial program -- often many threads can contribute to each stage, and different threads can be working on different problems at the same time.
e.g. say we've got a single-threaded function, C, and two functions A & B that can be completed by parallel worker threads. Let's also say that A & B can also be split into 2 stages, and the code we're trying to execute looks like:
result = C( A(), B() )
Given 3 worker threads, their progress over time (vertical) could look like:
#0 #1 #2
A1 A1 B1
B1 B1 A1
A2 A2 B2
B2 B2 A2
C  .wait.


#14 iMalc   Crossbones+   -  Reputation: 2306

Like
1Likes
Like

Posted 05 November 2012 - 01:25 AM

I once read a wise statement on a forum saying that a container generally cannot make itself threadsafe on behalf of its client.

The thread safety generally needs to be done by the code using the container, because it inevitably needs to lock the container whilst performing more than one action with it. Thus this is a flawed endevaour, a "fool's errand" so to speak.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

#15 rdragon1   Crossbones+   -  Reputation: 1200

Like
3Likes
Like

Posted 05 November 2012 - 02:16 AM

To write scalable parallel code, the answer isn't to take serial code and replace the data structures with 'thread-safe' versions that do the same operations. If you're resorting to using locks, then you're already down the wrong path. The right path is to create algorithms that don't need read+write access to shared data, or to constrain those stages of your algorithm to as small a piece as possible, but still extract parallelism where you can. The data transform you're performing dictates the data structures and algorithms, and a std::vector with locks in every member function is likely a terrible structure.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS