Parallelizing animation blending

Started by
18 comments, last by B_old 14 years, 2 months ago
Thanks for the replies, guys!

Quote:Original post by the_edd
The parallel_for_each code doesn't require any locks (assuming the actors are stored in something like a std::vector where random access is both available and thread-safe).

Of course, there might still be locks in the thread-pool, but it has a better chance at avoiding bottlenecks when spreading the load (through work stealing and having distinct queues for each thread, for example).

Also, the scoped locking idiom is highly recommended.

Actually I'm trying to make a thread-pool that can be re-used for different purposes. I need to lock when I get a new actor. How could it possibly work without?

The best thing to do probably is to consult a profiler and maybe try to use TBB for comparison.

Advertisement
Just a small update:

I increased the amount of animations that are blended per actor, without increasing the amount of actors. That way the difference between 2 and 3 threads became clearly visible, although its still not as big a jump as 1 to 2. But I guess it could mean that I am hitting another bottleneck here.
Quote:Original post by B_old
Actually I'm trying to make a thread-pool that can be re-used for different purposes. I need to lock when I get a new actor. How could it possibly work without?

Get a new actor from where? What do you mean?

Quote:The best thing to do probably is to consult a profiler and maybe try to use TBB for comparison.

A good plan.
Quote:Original post by the_edd
Get a new actor from where? What do you mean?

What I meant is that I have a std::vector of actors and enter a critical section when I pick an actor from the list and increment the iterator. Than I process that actors animation, while other threads are free to get an actor.

That isn't a great way to do it; ideally each thread in your thread group is allocate a non-overlapping range from that vector to work on so that you don't have to lock and unlock things in order to get at the next chunk of data.

So, if you had 4 threads and 100 actors in your vector then each thread would work on a range of 25 actors each, so they can access the vector directly without worrying about locking or unlocking.

It gets more complex when you introduce work stealing but the above remove the need for locks.
Quote:Original post by phantom
That isn't a great way to do it; ideally each thread in your thread group is allocate a non-overlapping range from that vector to work on so that you don't have to lock and unlock things in order to get at the next chunk of data.

So, if you had 4 threads and 100 actors in your vector then each thread would work on a range of 25 actors each, so they can access the vector directly without worrying about locking or unlocking.

It gets more complex when you introduce work stealing but the above remove the need for locks.

Hmm, I think I understand.

How does one handle cases where the actors are stored in a non-random-access container (std::list maybe)?

One advantage of acquiring one actor at a time is that the load is more or less automatically spread over the threads (if there are a few particularly "hefty" actors around...). I guess that is when work stealing comes into play, but I have yet to look that term up.
Quote:Original post by B_old
How does one handle cases where the actors are stored in a non-random-access container (std::list maybe)?

With more difficulty :)

It depends on the relative cost of incrementing the iterator to processing the task associated with each element. There's no one hard-and-fast rule for this, unfortunately.

If you have bidirectional iterators it can sometimes even by worth creating a vector of pointers to the elements so that you have a random access range. EDIT: actually, forward iterators should be work too.

Quote:
One advantage of acquiring one actor at a time is that the load is more or less automatically spread over the threads (if there are a few particularly "hefty" actors around...).

You also get this when you split the range in to N chunks for the N threads in your thread pool, so there's no advantage to doing this one element at a time for a random access range.

Actually, if the range is particularly large and you're not using work stealing, it's often a good idea to split up the range in to k*N pieces instead which can spread the load more evenly if some tasks are much heftier than others.

I think perhaps you have this idea in your head that your container of actors should itself act as a queue in to which threads can grab stuff to work on. If that's the case, I don't think it's a particularly good idea. A queue may not be the best way to think of the data. A pool is more appropriate in this specific case, I think.

[Edited by - the_edd on February 20, 2010 6:35:20 AM]
Quote:Original post by the_edd
Quote:Original post by B_old
One advantage of acquiring one actor at a time is that the load is more or less automatically spread over the threads (if there are a few particularly "hefty" actors around...).

You also get this when you split the range in to N chunks for the N threads in your thread pool, so there's no advantage to doing this one element at a time for a random access range.

If the first (or any) thread gets all the heftiest tasks, the other threads will have to wait for it?

Quote:Original post by the_edd
I think perhaps you have this idea in your head that your container of actors should itself act as a queue in to which threads can grab stuff to work on. If that's the case, I don't think it's a particularly good idea. A queue may not be the best way to think of the data. A pool is more appropriate in this specific case, I think.

My idea is this: I have an object that provides an interface to fetch tasks. I pass that object to the thread pool and the threads start fetching tasks and processing them. Is that what you mean by the idea in my head? :)
I like the idea of a pool as well, I just needed a way to ensure that different threads never fetch the same task.

Quote:Original post by B_old
If the first (or any) thread gets all the heftiest tasks, the other threads will have to wait for it?

Yes, that's definitely a worry. But see my comment about k*N chunks. Also, now's a good time to look up work stealing :)

It all comes down to balancing the costs of:

- task creation
- sequence iteration
- task submission
- task execution
- per-task processing overhead

Different strategies will work better for different kinds of work. If you know that some tasks will take much more time than others, then small range-chunk sizes are certainly something to try.

Quote:
My idea is this: I have an object that provides an interface to fetch tasks. I pass that object to the thread pool and the threads start fetching tasks and processing them. Is that what you mean by the idea in my head? :)

More or less :)

Quote:I like the idea of a pool as well, I just needed a way to ensure that different threads never fetch the same task.

Express it as an algorithm and let it worry about that e.g. parallel_for_each().

The parallel_for_each function we use at work looks something like this (pseudo code):

// Ignoring any special cases for non-random-access iteratorstemplate< /* blah */ >void parallel_for_each(thread_pool threads, RAIter begin, RAIter end, const UnaryFunctor &f, std::size_t granularity = deduce_granularity){    task_pool tasks;    range_chunker rc(begin, end, threads.size(), granularity);    while ((chunk = rc.next())        tasks.add(make_task(chunk.begin, chunk.end, f));    future<void> completion = threads.take(tasks);    completion.check();}


The algorithm itself ensures that tasks are distributed correctly. Surely you don't want to have to do that manually every time?
Thanks for the reply!

My approach goes in the general direction you outlined. Apart from being less general/flexible and the locking that is...

I'd say I wasn't that far off the mark and you guys me gave some useful inspiration. Thanks!

This topic is closed to new replies.

Advertisement