Sign in to follow this  
JL4453

Deterministic simulation using floats?

Recommended Posts

Is this feasible/possible? I read in the gamesutra article "1500 archers" (googlable) that the way Age of Empires did multiplayer is to have every player simulate the game separately & just transmit player orders/commands to the other clients. No synchronization at all so far as I can tell. This relies 100% on a completely deterministic simulation, with very strict regulation of calls to random number generator etc. But I have also read that floating point numbers are represented differently on different architectures, with hidden precision bits & stuff. Is it possible to get a completely deterministic sim with floats? Or is there some other way to do this? Thanks for your advice.

Share this post


Link to post
Share on other sites
If you need to support multiple architectures then you're pretty much screwed. You might try a software floating point implementation, and some systems have nice language support for this kind of thing (e.g. Java's strictfp modifier) but the performance will still be atrocious on many platforms.

Share this post


Link to post
Share on other sites
Tecnically a computer is a deterministic state machine.
Isn't there an IEEE standard for float?

Besides that, the internal representation of the value should not affect the way it behaves from external viewpoint.

Like two different objects performing a same task. Both have same accessories but both are completely differently implemented.

I would put my uneducated guess on yes.


Edit: Java of course would offer emulated solution. Are we by the way talking about real implementations, or is this just a theoretical discussion?

Share this post


Link to post
Share on other sites
Actually, most platforms are compliant with IEEE 754, which is the standard for floating point representations. So regardless of what you're running on, if it does IEEE 754, you know the layout of your floats and the available precision, what rules are being followed to do the math, etc.

Despite all that, you may get inconsistent results due to mismatched rounding modes, slight implementation differences, etc.

If you're on weird embedded devices, they may not have IEEE float support, which will be a problem.

In general, though, what you need to do is add a mechanism to detect inconsistencies and resynchronizes with the server or other players. If you're doing an RTS then minor floating point imprecisions aren't going to be a big deal, since most game logic is pretty granular. If you design your game logic to avoid finicky floating point calculations then you should be able to get 100% consistent results. Contrariwise, if you do weird things like update arrow positions per-frame in tiny increments, then you'll get tons of indeterminate results.

You don't need to write your own math routines to solve this problem. Just write smart, robust code and don't push the limits of the system's floating point implementation.

Share this post


Link to post
Share on other sites
Unfortunatly Stoo, that just isn't the case. The internal precision of the float determines the outcome of the answer that gets
rounded into your 32bit value. This means that error accumulates into your floatingpoint calculations faster on some machines than others.
Differing operations (add, multiply etc) will all cause rounding errors to show up at different rates on different machines.
All in all this leads to your simulation quickly diverging between different computers.
Mostly this means you need to switch to fixpoint math for any calculation that the simulation relies on.
(there was a thread just recently though that mentioned some assembly op you could put in to force the floatingpoint unit into
some compatability mode, but I never looked into what that really does)

Share this post


Link to post
Share on other sites
There is synchronization:

Each order is delayed by some amount. If the order doesn't get to every client in that delay period, maintaining the simulation properly is very difficult, and often you get things like "out of sync". Similar things happen if people cheat in certain ways.

Note that this approach has other advantages: replays are a matter of recording user input.

This also lets you record a game session that (say) causes an infinite loop in your game logic, which makes many things far easier to track down during development!

Share this post


Link to post
Share on other sites
I was afraid of that... I might just see if I can rely on fixed point arithmetic using integers. I don't want to rely on resynchronization because first of all it would be a lot of data & have already worked into my design a way to do a deterministic sim, & I want to be able to rely on checksums in the deterministic sims to detect hacking/cheating.

I found a library called STREFLOP which claims to standardize the rounding/representation of floating point formats in c++, but at first pass it seems overly complicated for what I'm after (seems more aimed at scientific experiments which need pinpoint reproducible accuracy) & I might just rather use fixed point... I don't need accuracy just reproducibility :)

Thanks for the commentary guys.

Share this post


Link to post
Share on other sites
I'd put my money on IEEE 754. Most platforms follow this standard, so you'll have optimal speed on them. You can worry about emulation when you actually need to port your game to obscure platform XY, but I doubt that need will ever arise.

Still, I've always wondered how exactly these games manage to keep the states in sync.

Of course, you need to use fixed time frames (eg. always advance the game state in 1/100th of a second steps) so that the rounding errors are the same for all players. Otherwise, due to different frame deltas between fast and slow PCs, the game states would inevitably drift apart.

But how do you ensure a command is executed in the same time frame on all connected players? Does the peer issuing a command ask everyone "I'd like to send this tank there in time frame 142345" (where 142345 is calculated as now + network latency * 2 or something) and then everyone queues up the command for execution at exactly that time frame?

Share this post


Link to post
Share on other sites
Even if two PCs are running on the exact same hardware, and are executing the exact same simulation, you can still get different floating point results from them.

Floats can be represented differently in CPU registers than in RAM, which means that errors may be introduced when moving between these two types of memory.

Initially, you wouldn't think that this is a problem, as both computers are going to be moving between CPU and RAM at the same times right?
However, factors outside of your control may affect when data is copied from CPU and RAM. For example, a virus-scanner might be running on the computer, causing it to perform CPU context-switches more often, increasing the chances of CPU registers having to be copied to RAM, increasing the chances that you might lose a few bits of accuracy in one of your variables....

For example, the value of z in this following snippet depends on how much work happens at the ... section! If neither x or y have been swapped out of the CPU yet, z will likely be true, but if only one was swapped out to RAM and back then z will likely be false.
float x = 100.0f / 9.0f;
float y = x;
...
bool z = (y==x);



Despite all of this, I've heard that RTS games have shipped before using floats for unit positions with absolutely no code written to take care of floating-point errors, or to correct "lost synch" errors. Apparently in some games the errors are too small to actually impact the simulation. YMMV.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
Even if two PCs are running on the exact same hardware, and are executing the exact same simulation, you can still get different floating point results from them.

Floats can be represented differently in CPU registers than in RAM, which means that errors may be introduced when moving between these two types of memory.

That sounds highly unlikely, and I can't replicate your scenario. Got a source? It doesn't matter how many times you copy a floating point number; as long as no operations are made to actually change the value of the number, it is not going to change. Every floating point number has one and exactly one representation in computer memory - one and only one sequence of bits. That exact sequence can be copied back and forth as much as you want but as long as no extraordinary error occurs, it is going to stay unchanged.

The problem with floating point precision is not that values magically change when you copy them, it's that not all numbers are possible to represent with floating point data. Sometimes (often, in fact) the number you want will fall somewhere between two numbers that are possible to represent, and rounding will occur. That's the problem, and that doesn't get affected in the least by how often you copy the data.

It also shouldn't be a problem as long as the platforms conform to the same standard as mentioned. In practice, however, there are often control variables introduced for speed, flexibility or whatever that can make the environment violate the standard. Such variables are usually documented, but you have to know what to look for to disable them. Once any such special circumstances have been eliminated, each connected system will give identical result to every floating point operation.

Share this post


Link to post
Share on other sites
Quote:
Original post by Cygon
I'd put my money on IEEE 754. Most platforms follow this standard, so you'll have optimal speed on them. You can worry about emulation when you actually need to port your game to obscure platform XY, but I doubt that need will ever arise.
It doesn't take obscure hardware. The classic x87 FPU is one such oddball platform as it uses an 80-bit floating point format internally, and few other platforms do as well such as the Itanium. So for code to be consistent the result of every operations has to be spilled to memory to truncate the result (or SSE/3DNow! needs to be used throughout). The compiler will also have to disable most floating-point optimizations, such as reordering associative expressions, replacing constant division by reciprocal multiplication or otherwise exploiting algebraic identities. Now as far as I can tell the "/fp:strict" option in recent MSVC versions ought to handle this, though I have to admit that the documentation is still somewhat unclear on whether you can expect full consistency across platforms.
In addition to needing working support for strict math on all compilers involved (not something I would expect of immature console tools) you'll have to pray that there aren't any bugs in the implementations (such as the Pentium FDIV bug, or Pentium II/Pro FIST bug), and that they've actually set out to be IEEE compliant in the first place and not save cycles and silicon with non-standard flush-to-zero behavior and such.
Furthermore you most definitely can't expect any support math functions beyond the core operations to be consistent so you'll have to provide your own methods for everything from pow() to formatted IO.
Emulators almost certainly won't be able to play the game either but whether that's a good thing or a bad thing is debatable. I suppose the most likely negative scenario is a attempting to play on backwards-compatible consoles or some retro gamer in ten years from now when we've all moved on to fancy new 1024-bit über-processors.

Developing and debugging a deterministic simulation is hard enough without having to worry about floating-point consistency. The last thing you want is to have to worry about some unforeseen issue cropping up towards the end of development. So do yourself a favor and stick with fixed-point arithmetic for the game logic, or at least don't expect networking to work between different platforms.

By the way has anyone successfully managed to use floats in a multi-platform RTS?

Share this post


Link to post
Share on other sites
Would Fixed Point integer maths should be OK? Say 32Bit for integers and 16bit for fractions?

Is there a GOOD (i.e. not the first one on Google) fixed point maths library?

There also must be a fast way to convert back to a float/double?!?

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
Developing and debugging a deterministic simulation is hard enough without having to worry about floating-point consistency. The last thing you want is to have to worry about some unforeseen issue cropping up towards the end of development. So do yourself a favor and stick with fixed-point arithmetic for the game logic.

QFE. Fixed-point isn't too tricky once you've got the idea behind it and a handful of functions to do the dirty work, and you'll save yourself a whole load of headaches.

Share this post


Link to post
Share on other sites
Quote:
Original post by ROBERTREAD1
Would Fixed Point integer maths should be OK? Say 32Bit for integers and 16bit for fractions?

Is there a GOOD (i.e. not the first one on Google) fixed point maths library?

There also must be a fast way to convert back to a float/double?!?


Converting back to a double/float is easy.


struct fixed_16_16 {
int raw;
// operator += and -= are trivial

fixed_16_16& operator*=( fixed_16_16 const& other ) {
__int64 raw_ = raw;
raw_ *= other;
raw_ >> 16;
raw = raw_;
return *this;
}

fixed_16_16& operator/=( fixed_16_16 const& other ) {
__int64 a,b;
a = raw;
b = other.raw;
a << 32;
a /= b;
a >> 16;
raw = (int)a;
return *this;
}

fixed_16_16(int whole):raw(whole<<16){};
explicit fixed_16_16(double value); // hmm, to do
// copy constructors, operator swap, etc

static fixed_16_16 fractional_part(int fraction) {
fixed_16_16 retval;
retval.raw = fraction;
return retval;
}
int whole_part() const;
int round_up() const;
int round_down() const;
double approx_value() const;
};
// non-member operator* and operator/ in terms of *= and /= go here



Do you have room for a 32/32 bit fixed point, or is 16/16 enough? :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
Floats can be represented differently in CPU registers than in RAM, which means that errors may be introduced when moving between these two types of memory.

Say what? I get the feeling that you think the scheduler uses FLD and FST on a context switch to save/restore state. It does not. It uses FSAVE/FRSTOR, which preserve the full internal precision.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hnefi
Quote:
Original post by Hodgman
Floats can be represented differently in CPU registers than in RAM....
That sounds highly unlikely ... Every floating point number has one and exactly one representation in computer memory - one and only one sequence of bits.
It's fairly common for them to be 32/64-bit in memory and 80-bit in the CPU, which to me implies that at from 16 to 48 bits of precision will be lost when copying from registers to RAM:
Quote:
Original post by implicit
The classic x87 FPU is one such oddball platform as it uses an 80-bit floating point format internally, and few other platforms do as well such as the Itanium.


Quote:
Original post by Sneftel
I get the feeling that you think the scheduler uses FLD and FST on a context switch to save/restore state. It does not. It uses FSAVE/FRSTOR, which preserve the full internal precision.
Thanks for the correction. So the bitwise representation (i.e. all 80 bits) *will* be preserved during a context switch?

But, it's still possible for the CPU to use a different internal representation than what will be stored in RAM? Meaning that the value of the boolean value z in my previous code snippet still depends on how much code is inserted at "..."? (i.e. if there's just enough code to force the compiler to move one (and only one) of the variables back to RAM temporarily then it may be false)

Share this post


Link to post
Share on other sites
Yes. That means that on subsequent runs of your program on identical hardware, the results should always be the same.

But if you recompile your code (eg. with slightly different optimisation settings), then the results before recompilation could be different depending on how the compiler issued/ordered instructions.

Share this post


Link to post
Share on other sites
Quote:
Original post by HodgmanIt's fairly common for them to be 32/64-bit in memory and 80-bit in the CPU, which to me implies that at from 16 to 48 bits of precision will be lost when copying from registers to RAM:
Quote:
Original post by implicit
The classic x87 FPU is one such oddball platform as it uses an 80-bit floating point format internally, and few other platforms do as well such as the Itanium.


Quote:
Original post by Sneftel
I get the feeling that you think the scheduler uses FLD and FST on a context switch to save/restore state. It does not. It uses FSAVE/FRSTOR, which preserve the full internal precision.
Thanks for the correction. So the bitwise representation (i.e. all 80 bits) *will* be preserved during a context switch?

But, it's still possible for the CPU to use a different internal representation than what will be stored in RAM? Meaning that the value of the boolean value z in my previous code snippet still depends on how much code is inserted at "..."? (i.e. if there's just enough code to force the compiler to move one (and only one) of the variables back to RAM temporarily then it may be false)

C/C++ is not assembler. Unless your compiler says otherwise, a float is 32 bits and a double 64 bits. The specialized 80-bit FPU registers on some platforms probably deal with such variables by padding or similar techniques. To make use of 80 bit floats, you need to specify that explicitly - typically by assigning a long double variable. At least, that's my understanding.

According to Wikipedia, the 82 bit registers on the Itanium are also used to "preserve precision for intermediate results". I'd guess that this means that the extra 2 bits are used to reduce rounding errors when performing multiple calculations in sequence, but this should definitely not have any effect when simply copying a number.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hnefi
C/C++ is not assembler. Unless your compiler says otherwise, a float is 32 bits and a double 64 bits. The specialized 80-bit FPU registers on some platforms probably deal with such variables by padding or similar techniques. To make use of 80 bit floats, you need to specify that explicitly - typically by assigning a long double variable. At least, that's my understanding.

According to Wikipedia, the 82 bit registers on the Itanium are also used to "preserve precision for intermediate results". I'd guess that this means that the extra 2 bits are used to reduce rounding errors when performing multiple calculations in sequence, but this should definitely not have any effect when simply copying a number.

Yes, my compiler says a float is 32/64bits... in RAM.
The compiler tells the CPU to load a 32/64bit float from RAM ... then once the CPU fetches that data *the CPU* can do whatever it wants, such as converting the 32/64bit value an intermediate 82bit representation. The compiler may be completely oblivious that this is happening.

A padded 32-bit value won't work in an 82-bit register, so I assume 32bit values are either always converted to higher-precision, or that the CPU must also have a bunch of 32-bit registers lying around as well as the new 82-bit ones.

They way that I interpret "The floating point registers are [>64] bits long to preserve precision for intermediate results" is that the internal representation is more precise than the in-memory representation. Hence moving data from intermediate storage (registers) back to RAM will cause a loss of accuracy.

Share this post


Link to post
Share on other sites
It depends on how the FPU is implemented. Since it must be able to differentiate between 32 and 64 bit floating point representations anyway - which means it must be able to deal with differently sized mantissas and exponents without affecting the accuracy of either representation - it is not unreasonable to expect the processor to only use the full 80/82 bits when given a long double (or equivalent) variable. Whether the Itanium actually behaves this way, I don't know. I don't have one available at the moment.

Share this post


Link to post
Share on other sites
It seems that whenever the Itanium fetches a float/double, it will always convert it to the internal 82-bit format. However, only 1/2 of the FPUs actually work with all 82-bits, the other FPUs only use 32 out of the 82-bits - I assume the compiler gets to choose which FPU-type to use when it's writing the assembly?

It definitely doesn't sound as scary as I was making it out to be, but still, in theory moving a float from the CPU to RAM (and even vica versa) can change it's value bitwise representation, which could end up changing it's value.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
It definitely doesn't sound as scary as I was making it out to be, but still, in theory moving a float from the CPU to RAM (and even vica versa) can change it's bitwise representation, which could end up changing it's value.
That's correct. Now, the dominant calling conventions push floating point numbers on the (execution) stack, rather than leaving them in the FPU, and the FPU register file is pretty roomy compared to x86, so this will often happen at easy-to-anticipate times. However, the bottom line (as sc4freak mentioned) is that you can't expect consistency across builds... both because of this, and because for best efficiency you want to allow the compiler to mess with your floating-point algebra. (Thanks to the lack of control flow and the large amount of past research, compilers are really, really good in this area... but changing (a*b)+(a*c) to a*(b+c) can produce different results, even without wacky extra internal precision.)

Now, RTS games have the advantage of patches being seen as mandatory. If you have 1.08 installed, you probably can't play with someone with 1.07 installed.... but that guy wouldn't be allowed online anyway, because of the "imploding narwhal" exploit in 1.07. So it works out pretty well.

Share this post


Link to post
Share on other sites
It's basically impossible to guarantee reproducible results
when using floats to influence any control path. Even though
the floats themselves have deterministic behavior, the overall
program does not. An easy way to see this is if you cache any
results, or sort any arrays, then the next time through the "same"
code you'll take a different path, and if any floating point operations
were skipped or added, you'll upset the deterministic floating point
apple cart. Remember that floating point numbers are not associative
or distributive either.

The same issue arises in alpha-beta, if evaluators use floating point
numbers.

Share this post


Link to post
Share on other sites
Quote:
Original post by ddyer
It's basically impossible to guarantee reproducible results
when using floats to influence any control path. Even though
the floats themselves have deterministic behavior, the overall
program does not. An easy way to see this is if you cache any
results, or sort any arrays, then the next time through the "same"
code you'll take a different path, and if any floating point operations
were skipped or added, you'll upset the deterministic floating point
apple cart.

The "basically impossible" does not follow. The solution to such caching problems is to not use such caching systems. Also, can you give an example of what you're talking about WRT alpha-beta pruning? Floating point operations may not be associative, but they are certainly strictly ordered.

Share this post


Link to post
Share on other sites
By "basically impossible" I mean impossible to verify or guarantee.
Results from simulations are hard enough to trust without adding
the additional qualifier "and assuming my floating point logic is
flawless".

In alpha-beta searching, it's invaluable to be able to re-run
the same search to investigate a suspected bug in some heuristic.
You won't be able to do so if there are floats in your evaluator.
Evaluators are especially likely to have caches or sorts that
induce floating point divergence.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this