Multithreaded loading messed (very slightly) with my floating point

Started by
8 comments, last by polyfrag 10 years, 10 months ago

I had one of the worst debugging sessions In my life.

I am currently working on animations, and wanting optimal performance, I shifted loading multithreaded. When I shifted my refpose to loading multithreaded, however, something wierd happened. Suddenly my animated model collapsed after a few animations cycles and was completely distorted.

I tried to change back to non-multithreaded, and there was no error. Then I wrote out everything to files: Animations, boneweights, animationpose, refpose - everything. And I could see no difference between the multithreaded and the non-multithreaded.

After a long time of digging down to isolate the problem, I finally found out, that the bones of final pose, which are in matrix form, had its members _44 distorted. Since all the math takes place with quaternion+position, except an inverse-value that I have pre-calculated on each refpose-bone, the final bone-matrices would never have their members _44 overwritten through the animation, except when the inverse-matrix of the refpose would be continously multiplying with it, slowly skewing it.

I could solve this by resetting the final bones every iteration before using them, but I found it a bit unsatisfactory, that I would even have to do that, because it works fine when loading single-threaded.

Another solution was to set all the _44-values in the main thread once it gets delivered from the loading-thread.

- is it a normal problem, that floats created on one thread lose accuracy when transferred to another?

- should I reset the finalbones-matrices from iteration to iteration no matter what? Is it too unsafe to rely on a 100% good float?

PS: This debugging session must have taken me some 20 hours in all, what a nightmare.

Advertisement

- is it a normal problem, that floats created on one thread lose accuracy when transferred to another?

Huh? That makes no sense.

It sounds like you have multiple threads reading/writing from the same chunk of memory at the same time. What techniques are you using to serialize access to your matrices?

Floating point values change from CPU-precision (80 bits) to float precision (32 bits) for many reason. However, it should not drop below 32 bit precision except through an error in your code.

I agree; it looks very strongly like a data synchronization bug. The fact that it is the last entry that gets corrupted suggests quite a few families of bugs.

Multiprocessing can be very hard. It takes planning and forethought, then it takes extreme care in execution, then comes the nightmarish bugs like the one describe above where either the code broke the roles or the rules were flawed.

The thing is just, that the change in value was so extremely subtle. The value still showed as 1.0000 in both file output as well as in my VS-environment.

I have created my own framework for thread loading and I use that framework everywhere, loading many different kinds of assets, and never have I encountered a problem. I will look through my program again, but I don't think it is a syncronization problem, as I have been quite aggressive in protecting with mutexes. And as I said, since this is done standardized (a baseclass takes care of it across all my loaders), I believe I would have encountered the problem before.

Right now there is only one class that asks for the resource, and once asked, it will have to sit there and wait until the library gets back to it. This means, that all communication takes place between the library and the thread, where the library plays a passive role, until the thread delivers the asset. The library will run on the CPU when it delivers the asset, so no outsiders need to think about syncing.

I will have another look at my structure and see if I can find anything. Thanks for the input so far.

phil_t:

I guess I was a bit imprecise in my expression. The memory is shared, so me talking about "transferring" doesn't make sense, you are right about that.

So I guess the question would have to be rephrased to something like if there can be a difference how threads load.

It could be that your original thread or your new thread have different FPU flags set. This is something that shouldn't happen, but sometimes a library will fiddle with _control_fp without you knowing, which affects the floating-point behaviour of all your code on that thread...

e.g. if you create a D3D9 device without passing D3DCREATE_FPU_PRESERVE, then it disables double-precision on that thread... sad.png

I have dug a bit deeper, and I think I have some new information. The values for the inverse matrix is calculated in the loading. The values are in quaternion+position form. From that a matrix is built, and then the matrix is set to its inverse.

The SetInverse()-method does math on all members, including _44, and this math may have skewed the value slightly.

I tested by running the D3DXMatrixInverse() method instead, just to see if it did it better, but it didn't. Although the problem was opposite: _44 was slightly higher than 1.0f, not slightly lower as with my method.

Anyway, it must be my loading thread (Im using boost::thread) that creates a slightly different result compared to the main thread.

Taking Hodgman's post into consideration, is it likely that the loading thread was created with different values than the main, and that is what alters the result?

I need to guard against floating point errors in any case, so it was good to catch this one (who knows what hardware the program is going to run on anyway?). Still, I like to understand what went wrong.

should I reset the finalbones-matrices from iteration to iteration no matter what?

If you don't keep the original points and instead continually transform the same points, then you can expect some degree of inaccuracy to creep in. It's really just a question of how many iterations before the cumulative inaccuracy is relevent, i.e. noticeable.

Did you read Hodgman's response? I've been burnt by that sort of thing before.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

Yeah, I experienced the same issue in shadow mapping: on successive frames shadow maps would change slightly (few pixels here and there).

The reason was that sometimes the shadow map camera setup task was being executed on the main thread, sometimes in worker threads. Main thread would have double-precision disabled by Direct3D9, worker threads didn't. Personally I don't need double-precision, so now I explicitly disable FPU double-precision on each thread I create, but I don't necessarily recommend to do that.

This topic is closed to new replies.

Advertisement