Floating point accuracy across computers?

Started by
6 comments, last by TheComet 9 years, 10 months ago

I read the 1500 Archers paper on AoE, which specifies a networking model for RTS games known as "lockstepping", where only the actions and mouse positions each player issues are sent over the network. Contrary to the server-client networking model, this requires all clients to be able to simulate the game the exact same way.

How viable is it to use floating point numbers for things like unit positions?

Do all computers (including different operating systems) compute floats the same way, so that every client gets the same floating point error after every operation? What are potential pitfalls to look out for if trying to synchronise floating point arithmetic over multiple computers?

Would it be wise to go down the floating point route, or do you think it would be safer to keep all critical variables as integers?

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty
Advertisement

This page appears to be a great resource with lots of information collected from different sources (including GameDev.net): http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/

The gist appears to be that if you're very very very careful, yes, you can get a deterministic floating point simulation across multiple machines. Using the same compiler, instruction set, and runtime settings seems to be critical, though, so I suspect it would be a significant pain if you ever expected an instance of the game running on Linux or Mac to connect with one running on Windows, for example. (Depending on your build environment, of course; some setups, like using GCC, might be a bit less maddening for cross-platform consistency.)

Myself personally? I just decided to use a fixed point representation built on top of a 64-bit integer and avoid having to worry about the floating point mess. We'll see how that goes (if I ever get to the point of implementing multiplayer with a lockstep model like I envision), but I suspect that there would at least be far fewer surprises and subtle gotchas.

"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke

I would do everything with integers. Read this if you want to know why: http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/

EDIT: Ninja'd.

It's possible, but it's a fucking nightmare. If you absolutely need high-precision determinism, go fixed point or similar, it'll save you a ton of pain.

If you're insane and VERY good at low-level debugging, PM me and we can swap horror stories about how to do it with floats.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

You get six decimal digits of precision, and you can count on erratic results to accumulate in the lower bits of precision. Those bits of accumulated error can spread quickly, as described by the paper linked to twice above.

Note that spanning processors is not necessary, you get different results even on the same computer.

You can have math operations in one location, and exactly the same math operations somewhere else, run the code and get different results. The compiler might notice something different even when the programmer doen't. The compiler might inline the function, which means the variables don't get truncated and passed as parameters. Or maybe quirky things like optimizations get done differently when the compiler generates code, perhaps moving the values from FPU registers out to memory and then loading them back, or register coloring means fewer values are kept in reigisters in one instance, or other tiny little things might happen.

There are many good documents explaining floating point numbers. That article on Gaffer on Games is a good one. So is "What Every Computer Scientist Should Know About Floating Point Numbers."



Always remember: Floating point is an approximation.

Do not rely on floating point when an approximation is unacceptable.

Do not rely on floating point when the error that accumulates in the approximation can grow to become significant.

Do not rely on floating point to give you an exact answer. They are inherently inaccurate to within 1/2 of the last bit value, and that is in the ideal case. In real-world cases they are frequently off by more than one bit in the last place, and they lose accuracy quickly when mixed with numbers of differing magnitude.

You can rely on floating point to drift, and for the last-bit error to accumulate in all subsequent operations.

You can rely on different results for the same values used in the same code. Even identical code that is compiled in different locations of a file can be optimized differently.

You can rely on functions that use floating point to be valid only within their limits. Don't use trig functions or other functions beyond their stated boundaries.

You can rely on floating point to propagate errors and unexpected answers, including NaN. If a function can possibly return NaN, INF, or other special results, handle them properly.


If you're insane and VERY good at low-level debugging, PM me and we can swap horror stories about how to do it with floats.

We had to do something like that on a non-game project I was on about 12 years ago. We realized the flaw early, then took the smart route and downloaded a software-based floating point implementation. No fancy FPU optimizations, no hardware floating point functions, but also no different sizes between FPU registers and memory sizes, no automatic truncation of the math formulas, no other processes switching FPU flags. We named the class "Real", for real numbers. To help ensure it didn't face any accidental optimizations, it was interface-only headers with the implementation details safely locked in a separate library with only integers in the interface.

It is also important to point out that both the IEEE floating point standards and the various processor vendors like Intel are both quick to point out certain operations are not guaranteed to be exact. IEEE FPU standards include the "inexact" flag, and trig operations like sine and cosine are well documented as having differences that are within the numerical tolerances but not identical.

You are correct that relying on floats for anything beyond an approximation is insane. Like 'lock that person in a rubber room and straightjacket' type of insane. I don't know if it is even theoretically possible to rely on floats to be identical across multiple machines since so many FPU operations take assorted undocumented shortcuts based on FPU state, but even if it is theoretically possible, it is something you should never ever do if you value your life.

A certain PC RTS title of years past tried this, including sending raw floats over the wire. They hit issues between Intel and AMD, and after sorting some of those out, between Debug and Release. They tried the usual compiler options and floating point control word magic (that still needed resetting after every D3D call).

We got to port it to Linux, and tried very hard to keep it netplay compatible. All of the above applied, plus the fun of Visual Studio vs GCC when it came to fp codegen behavior. Rounding everything to ~3 decimal places mostly dealt with it. but not all of it. In particular, the AI code had some float comparisons lying around, on data that was never sent over the wire, that could change the number of calls to the RNG, and that *was* state that was tracked closely.

I managed to come up with a method that definitively solved the compiler issues -- eyeball the VC output assembly, and reimplement the function on the GCC side with the VC floating point translated to AT&T syntax, pasted in, and add some shim code around it to fix up differences in the calling convention. This is not how one should define C++ class methods, but such was life.

It even worked, and solved it definitively for that case. The next case that came up was the same sort of thing, two steps higher on the callstack. At that point I gave up, because we did not have the time to rewrite the entire AI system in assembly, as that was clearly going to be the end result.

This way lies madness. Stick to fixed point for anything that actually matters to the game simulation. You should probably also make sure your system is set up to be able to detect synchronization loss as immediately as possible, and even better, have a mechanism for resynchronizing. Otherwise you're in for debugging issues that only happen in 5+ player games, after 2 hours, with the bulk of the useful data being gigs upon gigs of value logs and callstack traces.

Thanks for the great information thus far!

@ApochPiQ - If I ever knock on your PM front door with a wall of text on how gay floats are, you will know I have failed and can be safe to assume I was weeping in a fetal position in the dustiest corner of my room as I typed it.

@Frob - That "Real" class you talked about, is that publicly available?

From what I gather, the safest approach would be to use a fixed point implementation for all critical data. I would have done this with two integers representing a "value" and a "factor". I suppose converting to and from floating point values is determent so long as the floating point error is insignificant enough. Is something like the following a good approach?


#include <iostream>

template <class I, class F>
class Real
{
public:

    inline Real(const I factor) : m_Factor(factor)
    {
    }
    inline Real(const I factor, const F value) : m_Factor(factor), m_Value(static_cast<I>(factor*value))
    {
    }
    inline Real(const I factor, const I value) : m_Factor(factor), m_Value(factor*value)
    {
    }

    // assignment
    inline Real& operator=(const Real& other)
    {
        m_Value = other.m_Value;
        m_Factor = other.m_Factor;
        return *this;
    }
    inline Real& operator=(const F f)
    {
        m_Value = static_cast<I>(m_Factor * f);
        return *this;
    }
    inline Real& operator=(const I i)
    {
        m_Value = m_Factor*i;
        return *this;
    }
    inline F operator=(const Real& r) const
    {
        return static_cast<F>(r.m_Value) / r.m_Factor;
    }

    // binary arithmetic
    inline Real& operator+=(const Real& rhs)
    {
        m_Value += rhs.m_Value * m_Factor / rhs.m_Factor;
        return *this;
    }
    inline Real& operator+=(const F f)
    {
        m_Value += static_cast<I>(f * m_Factor);
        return *this;
    }
    inline Real& operator+=(const I i)
    {
        m_Value += m_Factor * i;
    }
    inline friend Real operator+(Real lhs, const Real& rhs)
    {
        lhs += rhs;
        return lhs;
    }
    inline friend Real operator+(Real lhs, const F rhs)
    {
        lhs += rhs;
        return lhs;
    }
    inline friend Real operator+(Real lhs, const I rhs)
    {
        lhs += rhs;
        return lhs;
    }

    // ... got lazy

    friend std::ostream& operator<<(std::ostream& os, const Real& r)
    {
        std::cout << "Real(" << r.operator=(r) << ")" << std::endl;
    }
private:

    I m_Value;
    const I m_Factor;
};

Example usage:


#include <fpp.hxx>

#define REAL Real<unsigned long, float>

int main()
{
    REAL test1(1024, 6.0f); // two numbers with different fixed-point factors
    REAL test2(2048, 6.5f);

    test1 += test2;  // test1 = 12.5
    test1 += 7.7f;    // test1 = 20.2

    std::cout << test1 << std::endl;
    return 0;
}

Lots of overloads for the operators, though...

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

Never mind, I found a nice looking fixed-point math library: http://www.codeproject.com/Articles/37636/Fixed-Point-Class

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

This topic is closed to new replies.

Advertisement