Thread safe to read only?

Started by
8 comments, last by hrmmm 10 years, 7 months ago

Hi

I am having trouble with animating thousands of skeletons using a single thread. It just takes too much time. I have megabytes of animation keys and each skeleton may want random access to any key as some skeletons use the same track.

I wanted to spread the work on a thread pool. To simplyfy I want to grant each thread direct access to read the same data (key frames).

Is it safe to allow multiple threads to read from the same data location without any use of mutexes?

Advertisement

As long as no-one is writing at the same time and the data isn't moving around then yes.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

+1 on what Paradigm Shifter said. No writers and only readers is always safe.

However, it is not necessarily faster, or not necessarily as much faster as you may think. Not by just throwing threads at the problem, anyway. A bit of consideration is advisable.

First, many random accesses in a huge data set from several threads will cause more cache misses on a shared-cache architecture. It may be worthwhile to sort the skeletons in this case (so accesses to the same memory cell stay "close together"). Adding extra work by sorting may seem nonsensical, but depending on how many cache misses you have, this may be very much worth it. Sorting may also work in favour of branch prediction, if you have a lot of branches.

Second, on NUMA architectures (think for example Opteron servers), reading from a location that doesn't belong to your node is much slower. In such a scenario, you will want to make a copy of your data, one for each NUMA node.

Third, you need to be sure that the data which you write out does not compete on cache lines, both in respect of true and false sharing.

As with what Paradigm said being true, there are some tricks here. Animation is a two step process, first step build potentially a lot of transform matric's to represent a key in an animation (potentially a tween). The second step is using those matrices modify all the vertex data in the model appropriately from the T pose. All of this is exceptionally viable for threading but, as implied by the others, you need to control your data access properly.

Overall though, computing matrices should be done in a fully 'const' manner which means you can thread this with as many cores as available. The output of the computations are also const, read only, data when being rendered and also of course thread compatible.

It is a complicated subject but overall, animation is inherently threadable if you understand the basic concepts.

The animation tracks are scattered in RAM. Would it be wise to merge them all in a huge continuous array?

Is it wise to have all the skeletons 4096*128*sizeof(D3DXMATRIX) saved in a big array and let the threads write to that array simultaneously?

If any skeleton can play any animtaion track, it is probably not wise to move megabytes of animation data around on a per frame basis?

Also, you don't need to update the animations for every drawn frame. You could update the animation data at speed of 10-30fps depending on the distance from camera and then interpolate the between frames. This saves already lot of work since interpolation of animation frames is pretty fast.

[edit] you could store rotations as quaternions and a position vector (and maybe scale if needed). This saves some memory and interpolation of quaternions is simple.

Cheers!

The animation tracks are scattered in RAM. Would it be wise to merge them all in a huge continuous array?

Since an animation track is probably much, much larger than a cache line, and you'll access different positions inside each track, it is unlikely that you gain much from that. You do gain some if you sort the skeletons by track and time, though.

That way, the animation tracks are still scattered "randomly", but you access them in a quite non-random pattern, and there is a chance that the next access will be on the same cache line. Also, if there aren't thousands of different animation tracks that you hop between, there's a chance that the automatic prefetcher kicks in as you scan over them. Certainly the auto-prefetcher will not pick up all of them, but maybe for some if you're lucky.

Is it wise to have all the skeletons 4096*128*sizeof(D3DXMATRIX) saved in a big array and let the threads write to that array simultaneously?

It probably won't do wonders, but it doesn't hurt, and it may save you some memory due to alignment. Which, in turn, reduces the number of page faults.

For the cache, it probably makes little or no difference, since a single D3DXMATRIX is 64 bytes, so packing them versus not packing them is the same thing.

For more about the performance impact, read about the Single Writer Principle:

The Single Writer Principle is that for any item of data, or resource, that item of data should be owned by a single execution context for all mutations.

http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html

I am having trouble with animating thousands of skeletons using a single thread.

Why?

you can get pointer to matricies to use to render with at a frame with cost of just 3-4 operations, down from current miliseconds to needed matricies.

This topic is closed to new replies.

Advertisement