Jump to content
  • Advertisement
Sign in to follow this  
Tispe

Thread safe to read only?

This topic is 1919 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi

 

I am having trouble with animating thousands of skeletons using a single thread. It just takes too much time. I have megabytes of animation keys and each skeleton may want random access to any key as some skeletons use the same track.

 

I wanted to spread the work on a thread pool. To simplyfy I want to grant each thread direct access to read the same data (key frames).

 

Is it safe to allow multiple threads to read from the same data location without any use of mutexes?

Share this post


Link to post
Share on other sites
Advertisement

+1 on what Paradigm Shifter said. No writers and only readers is always safe.

 

However, it is not necessarily faster, or not necessarily as much faster as you may think. Not by just throwing threads at the problem, anyway. A bit of consideration is advisable.

 

First, many random accesses in a huge data set from several threads will cause more cache misses on a shared-cache architecture. It may be worthwhile to sort the skeletons in this case (so accesses to the same memory cell stay "close together"). Adding extra work by sorting may seem nonsensical, but depending on how many cache misses you have, this may be very much worth it. Sorting may also work in favour of branch prediction, if you have a lot of branches.

 

Second, on NUMA architectures (think for example Opteron servers), reading from a location that doesn't belong to your node is much slower. In such a scenario, you will want to make a copy of your data, one for each NUMA node.

 

Third, you need to be sure that the data which you write out does not compete on cache lines, both in respect of true and false sharing.

Share this post


Link to post
Share on other sites
Guest Hiwas

As with what Paradigm said being true, there are some tricks here.  Animation is a two step process, first step build potentially a lot of transform matric's to represent a key in an animation (potentially a tween).  The second step is using those matrices modify all the vertex data in the model appropriately from the T pose.  All of this is exceptionally viable for threading but, as implied by the others, you need to control your data access properly.

 

Overall though, computing matrices should be done in a fully 'const' manner which means you can thread this with as many cores as available.  The output of the computations are also const, read only, data when being rendered and also of course thread compatible.

 

It is a complicated subject but overall, animation is inherently threadable if you understand the basic concepts.

Share this post


Link to post
Share on other sites

The animation tracks are scattered in RAM. Would it be wise to merge them all in a huge continuous array?

 

Is it wise to have all the skeletons 4096*128*sizeof(D3DXMATRIX) saved in a big array and let the threads write to that array simultaneously?

 

If any skeleton can play any animtaion track, it is probably not wise to move megabytes of animation data around on a per frame basis?

Share this post


Link to post
Share on other sites

Also, you don't need to update the animations for every drawn frame. You could update the animation data at speed of 10-30fps depending on the distance from camera and then interpolate the between frames. This saves already lot of work since interpolation of animation frames is pretty fast. 

 

[edit] you could store rotations as quaternions and a position vector (and maybe scale if needed). This saves some memory and interpolation of quaternions is simple. 

 

Cheers!

Edited by kauna

Share this post


Link to post
Share on other sites
The animation tracks are scattered in RAM. Would it be wise to merge them all in a huge continuous array?

 

 

Since an animation track is probably much, much larger than a cache line, and you'll access different positions inside each track, it is unlikely that you gain much from that. You do gain some if you sort the skeletons by track and time, though.

That way, the animation tracks are still scattered "randomly", but you access them in a quite non-random pattern, and there is a chance that the next access will be on the same cache line. Also, if there aren't thousands of different animation tracks that you hop between, there's a chance that the automatic prefetcher kicks in as you scan over them. Certainly the auto-prefetcher will not pick up all of them, but maybe for some if you're lucky.

 

Is it wise to have all the skeletons 4096*128*sizeof(D3DXMATRIX) saved in a big array and let the threads write to that array simultaneously?

 

It probably won't do wonders, but it doesn't hurt, and it may save you some memory due to alignment. Which, in turn, reduces the number of page faults.

For the cache, it probably makes little or no difference, since a single D3DXMATRIX is 64 bytes, so packing them versus not packing them is the same thing.

Share this post


Link to post
Share on other sites

you can get pointer to matricies to use to render with at a frame with cost of just 3-4 operations, down from current miliseconds to needed matricies.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!