Floats and determinism

Started by
9 comments, last by alvaro 10 years, 3 months ago

Hi,

I'm experimenting some with determinism. I know the basics of floats work and why they are unable to represent certain numbers as well as why you should never use equality to compare floats due to rounding errors etc in computation. What I'm not as certain about is how deterministic they are. If I do the same computation using floats/doubles on two different machines or even different architectures, will the output be the same?

Advertisement

There's no guarantee that floats will have the same physical representation in a different architecture. The point is not to reply on its representation. If you need to be absolutely sure of an operation's result, then you can use an arbitrary precision library, but beware that there will be numbers it cannot represent. Just like all numeric systems. For instance, how do you represent (10 / 3) in decimal?

If you do the same computations on different machines, they'll be in the ballpark of each other; 2 * 2 won't suddenly become 5. However, if you're relying on it being the same number down to the last bit, then you're doing something wrong.

The problem is that I need to do hashing on the binary data, so I'm indeed relying on the results being equal down to the last bit. I don't technically need to use floats, it's just that it would make my life easier in some cases.

IEEE floating point is well defined and deterministic. On a conforming implementation, The result of any specific computation has a /specific result.

High level languages often do not require IEEE floating point, however the IEEE standard for floating point numbers does dominate actual computer architectures.

ARM with NEON, does not comply with IEEE. It uses a faster floating point implementation, with slightly different rules. As far as I know, this is the only modern outlier.

The inaccuracy of floating point numbers comes from the fact that they are effectively encoding all numbers as 1.x * 2^y, and that allows imprecision in the representation of the result of any particular operation. And you rarely need to do only one operation on a number. So the error of doing a series of operation quickly grows to make using the equality operator useless for floating point numbers.

In fact the determinism of floating point calculations has a consequence: it means compilers cannot reorder floating point calculations as freely as they can integer calculations. This is doubly true when floating point variables can be non-finite.
[edit] I missed the ".NET" tag on this originally. My summary changes thusly: don't fucking bother. You can't even reliably control the machine control-word register in .NET code, which means you can't guarantee consistency of any significant nature. I'll leave the rest of this here for posterity; read it under the assumption that I'm talking about real systems languages like C, C++, Rust, etc.[/edit]


Floating-point determinism is such a tangled problem that even some of the best experts in the field disagree on what is possible. I'm sure you've already done your Google homework and found the dozens of archived conversations on this subject, so I'll just distill things down to my personal experience:

You can get determinism on modern hardware if you stick to the same CPU architectures and bitness. You also need to write code very, very carefully to achieve this result.

Changing between 32-bit and 64-bit is a recipe for disaster, as is swapping between x87 FPU instructions and SIMD instruction sets such as SSE. Not carefully guiding the compiler is going to introduce subtle and incredibly hard-to-pinpoint desyncs.

You need to understand non-commutative arithmetic. You need to understand a fair bit of numerical analysis to help isolate sources of error. You need to know a lot of assembly-level instructions for whatever processor architecture you're choosing, and you need to be very, very comfortable writing and debugging assembly language. You also need to take a proactive stance against desyncs, such as interleaving your program with "sync checks" that hash all existing state of the program and compare it across running instances to ensure they all hash to the same value. You need to be exceedingly careful with what external libraries and code you use, as poorly-designed code can easily destroy determinism. Last but not least, you need to make sure to avoid known implementation-dependent functions and operations, such as estimated reciprocal, reciprocal square root, all your trigonometric transcendental functions, and other things that IEEE does not precisely specify.

In short: it is possible, but good fucking luck :-)

If you have the luxury, I'd strongly recommend an arbitrary-precision library or fixed-point arithmetic. It's far easier to get determinism from those models.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

[edit] I missed the ".NET" tag on this originally. My summary changes thusly: don't fucking bother. You can't even reliably control the machine control-word register in .NET code, which means you can't guarantee consistency of any significant nature. I'll leave the rest of this here for posterity; read it under the assumption that I'm talking about real systems languages like C, C++, Rust, etc.[/edit]


.Net requires strict width restrictions on floating point mathematical operations. Or, in other words, it will always force a truncation of a floating point operation to the minimum bitsize required to store it in the destination. For 64bit .net code, this really has no meaning, as its all done using SIMD registers. For 32 bit .net though, where you're dealing with the FPU it means that most operations are accompanied by a store and load.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Yes, but .Net doesn't guarantee (AFAIK) how certain floating-point code is lowered to machine instructions, especially when JITting, so you can still get drift because of order-of-operations when those truncations occur.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

We do binary hashing of assets (which includes lots and lots of floating point data) as part of our content build system, and we usually don't have any problems since it's restricted to x64 builds. The only problems we've had has been with debug builds emitting different code due to optimizations, which resulted in different results compared to release version in some rare cases. We ended up just coming up with a simple test framework to test for differences, and then fixed the problem cases manually using SSE intrinsics. Of course this is all in C++, so I don't know how you could fix such problems in a .NET language.

The real problem we had was with compiler-inserted padding in structures. That happened all over the place, and fixing it was a nightmare.

FWIW, I achieved deterministic floating point computations in .net between x86 PCs and the Xbox 360 (Power PC architecture). There were a few gotchas that were difficult to diagnose. But in the end, I was able to record player inputs for my physics-based game on the Xbox, and "replay" them perfectly on the PC. So it's possible.

There is a fairly lengthy and detailed article on the subject at http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ which is C/C++ based.

Here's a wonderful demonstration from there of one of the potential pitfalls:


// CPUID is for wimps:
__m128 input = { -997.0f };
input = _mm_rcp_ps(input);
int platform = (input.m128_u32[0] >> 16) & 0xf;
switch (platform)
{
   case 0×0: printf("Intel.\n"); break;
   case 0×7: printf("AMD Bulldozer.\n"); break;
   case 0×8: printf("AMD K8, Bobcat, Jaguar.\n"); break;
   default: printf("Dunno\n"); break;
}

I suspect .NET would be even more tricky to make deterministic. For example the jitter might decide to generate different instructions based on what instructions your CPU supports, which may give slightly different results.

This topic is closed to new replies.

Advertisement