Floating point consistancy

Started by
7 comments, last by Rockoon1 17 years, 1 month ago
I have this impression that floating point operations on different processors can yield slightly varying results? Is this true or am I misguided? Also how about using C#?
Advertisement
You are right. Not only do different computers produce different results, but executing the same program, with the same values, on a single computer can yield different values based on when the kernel interrupts execution and flushes the FP registers.

The problem is that this comes from the differing hardware approaches to floating-point computations. The only alternative is to emulate floating-point yourself in a deterministic way. C# provides decimal, which has some precise and portable rules for approximation and precision, but it has a price.

A possibility is to give up floating-point arithmetic and use fixed-point instead, if it adapts to your problem well.
It is possible for two processors to generate different floating point in the extreme lower order bits, however most x86 processors use the same FPU design. As such the chances of them producing different results that are a significant delta off from that of another processor is extremely small.
As far as .NET goes, the CLR standard has this to say about floating point operations:
Quote:Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either R4 or R8, but its value can be represented internally with additional range and/or precision. The size of the internal floating-point representation is implementation-dependent, can vary, and shall have precision at least as great as that of the variable or expression being represented. An implicit widening conversion to the internal representation from float32 or float64 is performed when those types are loaded from storage. The internal representation is typically the native size for the hardware, or as required for efficient implementation of an operation. The internal representation shall have the following characteristics:

· The internal representation shall have precision and range greater than or equal to the nominal type.

· Conversions to and from the internal representation shall preserve value.

[Note: This implies that an implicit widening conversion from float32 (or float64) to the internal representation, followed by an explicit conversion from the internal representation to float32 (or float64), will result in a value that is identical to the original float32 (or float64) value. end note]

[Rationale: This design allows the CLI to choose a platform-specific high-performance representation for floating-point numbers until they are placed in storage locations. For example, it might be able to leave floating-point variables in hardware registers that provide more precision than a user has requested. At the same time, CIL generators can force operations to respect language-specific rules for representations through the use of conversion instructions. end rationale]

When a floating-point value whose internal representation has greater range and/or precision than its nominal type is put in a storage location, it is automatically coerced to the type of the storage location. This can involve a loss of precision or the creation of an out-of-range value (NaN, +infinity, or ‑infinity). However, the value might be retained in the internal representation for future use, if it is reloaded from the storage location without having been modified. It is the responsibility of the compiler to ensure that the retained value is still valid at the time of a subsequent load, taking into account the effects of aliasing and other execution threads (see memory model section). This freedom to carry extra precision is not permitted, however, following the execution of an explicit conversion (conv.r4 or conv.r8), at which time the internal representation must be exactly representable in the associated type.

[Note: To detect values that cannot be converted to a particular storage type, a conversion instruction (conv.r4, or conv.r8) can be used, followed by a check for a non-finite value using ckfinite. Underflow can be detected by converting to a particular storage type, comparing to zero before and after the conversion. end note]

[Note: The use of an internal representation that is wider than float32 or float64 can cause differences in computational results when a developer makes seemingly unrelated modifications to their code, the result of which can be that a value is spilled from the internal representation (e.g., in a register) to a location on the stack. end note]


You can also see some results of what the above means by checking out my last few entries on .net performance in my journal.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

So I'm making a RTS which will have to run completely synchronous with other computers which means the same commands will have to return the exact same commands. So basically for this I can't use floating point, right? Because I'm planing to return a hash for the game state every once in a while to ensure that everyone is synced.
Quote:Original post by Axiverse
So I'm making a RTS which will have to run completely synchronous with other computers which means the same commands will have to return the exact same commands. So basically for this I can't use floating point, right? Because I'm planing to return a hash for the game state every once in a while to ensure that everyone is synced.


Yikes, I have the exact same problem right now. I'm also leaning towards using fixed-point math, just to be safe.
It's entirely possible to run in lockstep on multiple computers even with floating point. At least one major development studio that makes RTS games does things this way.

Perhaps you'd run into issues with entirely different CPU architectures (older G5 based Macs playing in the same game as x86 Windows machines), but so long as all your players are based on one architecture/OS and you are careful to keep your floating point settings the same across all the computers playing, it will work.

If I remember correctly, most floating point units work with 80-bit extended precision internally. So there would have to be a *very* significant error before the 32 bits that you care about would be different on different machines.

j
It's well known in number theory that perfectly innocuous floating
point operations can produce results with zero digits of precision
in the result. For example a perfectly innocuous looking matrix with
all elements with epsilon of 1 can be virtually impossible to invert.
Also, smart compiler optimizers making use of well known mathematical
identities such as associativity and distributivity will change the result
in great or small ways, given the appropriate "worst case" inputs.

As a rule of thumb, never, ever use = to compare two floating point
numbers, and always use forgiving constructions in which "any other
case" is the last case, rather than assert that several computations
will independently cover all the cases.

---visit my game site http://www.boardspace.net - free online strategy games

Quote:Original post by ddyer
As a rule of thumb, never, ever use = to compare two floating point
numbers

And don't use '==' either [wink].

Admiral
Ring3 Circus - Diary of a programmer, journal of a hacker.
Quote:Original post by ToohrVyk
You are right. Not only do different computers produce different results, but executing the same program, with the same values, on a single computer can yield different values based on when the kernel interrupts execution and flushes the FP registers.


Are you sure? You are implying that a context switch is storing 64-bit Doubles instead of 80-bit Extended Doubles.

Since these switches are actualy rare (less than one hundred times per second) there is no performance consideration which would lead me to believe that the full registers are not stored.

I do not have any technical info to refute your claim, but I find it highly dubious none-the-less.

This topic is closed to new replies.

Advertisement