Jump to content
  • Advertisement
Sign in to follow this  
NathanRidley

Procedurally-regenerating identical content on different machines (floating point determinism)

This topic is 869 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been designing the architecture for a distributed procedural content simulation engine for some time now, and recently it's become apparent that floating point determinism is going to become a thorn in my side with respect to my efforts to enable distributed setups, so for those of you with experience in (or reasonable knowledge of) floating point determinism issues, I'd really like to know how you'd approach this scenario:

 

Imagine you have two machines. The machines can be reasonably guaranteed to use modern, common hardware and operating systems (PowerPCs, IBM mainframes and other gotchas are not relevant here). In other words, both machines are either going to provide modern commodity x64 architectures on a server rack, or be two relatively-modern gaming machines of two friends whose machines are sharing the workload I'll describe. The operating system types can be constrained to the standard three (Windows/OSX/Linux).

 

Anyway, imagine we have two machines, and one of them is going to procedurally-generate some content based on known inputs. Let's say the content is a simple perlin noise-based terrain. Taking this a step further, let's say we procedurally-generate a set of physical objects (for brevity, perhaps they're simple wooden cubes of varying sizes and weights). We then use physics algorithms to scatter them around the terrain and allow them to settle based on physical characteristics such as gravity and the shape of the terrain.

 

Here's the catch:

  1. We don't know in advance which of the two machines will do the work.
  2. The machine that generates the content will not be able to guarantee that the other machine is available at the time that the work needs to be performed.
  3. The other machine will, later on, have to do generate the exact same outputs under the same conditions.
  4. We don't want to store the overall result as there will likely be too much data.
  5. It'd be nice to be able to offload some of the compute work (where relevant and reasonable) to the GPU (whatever is available on the machine in question).

Some places that I am willing to compromise:

  1. Reducing floating point precision
  2. Storing a limited subset of source data, as long as bulkier derivative data can be regenerated as needed
  3. Using large integers instead of floats
  4. Using fixed point calculations

Some additional questions:

  1. Is it possible to write integer-only versions of the popular coherent noise algorithms? (perlin/simplex/brownian/etc.)
  2. Can I get away with forcing IEEE754 floating point accuracy or will that compromise speed too much?
  3. Are 64-bit integers a thing on GPUs yet, or likely to become a thing any time soon?

As an addendum to this, I'm considering the possibility that this problem doesn't really have a good solution at this point in time, and that perhaps I need to be willing to simply store a lot of data and ensure that it is available when a machine needs to generate new content.

Share this post


Link to post
Share on other sites
Advertisement

Thanks for the reply, and what you have said sounds pretty much like what I've been anticipating is going to be a problem, particularly with respect to the fact that my design wants to facilitate a significant number of community-driven add-ons.

 



The most safe route is to use a software-based numerics library that guarantees deterministic results and does not rely on CPU-implemented functionality. They can potentially also be faster if the optimizer is allowed to hit them, whereas using the processor's floating point requires many pessimizations and constant validation that the system is still in the proper state.

 

Any you'd recommend?

Share this post


Link to post
Share on other sites

Sometimes you can get away with dropping the requirement for bit-for-bit identical results in the later stages of generation, as long as the early stages are perfectly identical.  Depending on the particular kind of procedural generation you're doing, this can possibly make things much easier.  Perhaps most of the numerical instability exists during the early/mid stages of generation, and so you can deal with that by using fixed point math, while a bunch of the heavy number crunching happens during the later stages, so you can switch to floating point and perhaps offload some of the work to the GPU.  Even if different GPUs produce different results, if the late-stage algorithms are numerically stable, they might still be close enough to work.

Share this post


Link to post
Share on other sites

This is something to consider. I have taken some time to peak through Unreals code to see what they did on a number of things. There is an item that just may very well be depreciated, but in the comments it described what it was used for.

The guys over at Epic actually created a custom floating point class for both 16 and 32bits. Their reasoning marked up in the comments was that a floating point was not always the same across systems.

The same was done by Starcraft, as mentioned in one of their GDC videos about AI. They use floating point arithmetic and ints to do the computation.

Edited by Tangletail

Share this post


Link to post
Share on other sites

Thanks for the replies.

 


STREFLOP is worth a look.
 
Though I don't see any particular barrier to designing new noise functions using integer math, and it's certainly possible to write them with a fixed-point math library.

 

Cheers, I'll take a look at that, and will investigate creating modified noise algorithms.

 


Have you considered using fixed point numbers rather than floating point?

 

Yep. I don't have a sense of how that would affect performance though. I think I'm going to have to simply experiment with this and come to my own conclusions.

 


Sometimes you can get away with dropping the requirement for bit-for-bit identical results in the later stages of generation, as long as the early stages are perfectly identical.  Depending on the particular kind of procedural generation you're doing, this can possibly make things much easier.  Perhaps most of the numerical instability exists during the early/mid stages of generation, and so you can deal with that by using fixed point math, while a bunch of the heavy number crunching happens during the later stages, so you can switch to floating point and perhaps offload some of the work to the GPU.  Even if different GPUs produce different results, if the late-stage algorithms are numerically stable, they might still be close enough to work.

 

Yeah I'd considered that, and it is certainly applicable with respect to the projecting of source data into client-side game assets. The problem is that the "server" component must be able to canonically represent all game state, including the results of positional changes due to physics. Because those tend to use compounded floating operations, I have a feeling they're going to be a problem for me. Perhaps there's an opportunity to get creative with physics code...

 


The guys over at Epic actually created a custom floating point class for both 16 and 32bits. Their reasoning marked up in the comments was that a floating point was not always the same across systems.

 

Sounds interesting, I'll look further into that, thanks.

Share this post


Link to post
Share on other sites
Interesting fact: Microsoft compilers will automatically use SSE on x64 builds and will not emit FPU instructions. This is something I've exploited in the past to great benefit, because SSE determinism is actually a lot easier and less fussy than x87 determinism.

NB this does not work on Linux or (I *think*) OSX.

Share this post


Link to post
Share on other sites


Interesting fact: Microsoft compilers will automatically use SSE on x64 builds and will not emit FPU instructions.

I can see how that would work for simple arithmetic and logic operations, but not so much on other operations.

 

Operations like exponents, logarithms, trig functions, and a few others are not available in SSE form but are implemented as intrinsics or direct FPU calls by most compilers, including Microsoft's.  Those also tend to be operations with high variation (still within tolerance but bit-for-bit differences) between chips.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!