I've been designing the architecture for a distributed procedural content simulation engine for some time now, and recently it's become apparent that floating point determinism is going to become a thorn in my side with respect to my efforts to enable distributed setups, so for those of you with experience in (or reasonable knowledge of) floating point determinism issues, I'd really like to know how you'd approach this scenario:
Imagine you have two machines. The machines can be reasonably guaranteed to use modern, common hardware and operating systems (PowerPCs, IBM mainframes and other gotchas are not relevant here). In other words, both machines are either going to provide modern commodity x64 architectures on a server rack, or be two relatively-modern gaming machines of two friends whose machines are sharing the workload I'll describe. The operating system types can be constrained to the standard three (Windows/OSX/Linux).
Anyway, imagine we have two machines, and one of them is going to procedurally-generate some content based on known inputs. Let's say the content is a simple perlin noise-based terrain. Taking this a step further, let's say we procedurally-generate a set of physical objects (for brevity, perhaps they're simple wooden cubes of varying sizes and weights). We then use physics algorithms to scatter them around the terrain and allow them to settle based on physical characteristics such as gravity and the shape of the terrain.
Here's the catch:
- We don't know in advance which of the two machines will do the work.
- The machine that generates the content will not be able to guarantee that the other machine is available at the time that the work needs to be performed.
- The other machine will, later on, have to do generate the exact same outputs under the same conditions.
- We don't want to store the overall result as there will likely be too much data.
- It'd be nice to be able to offload some of the compute work (where relevant and reasonable) to the GPU (whatever is available on the machine in question).
Some places that I am willing to compromise:
- Reducing floating point precision
- Storing a limited subset of source data, as long as bulkier derivative data can be regenerated as needed
- Using large integers instead of floats
- Using fixed point calculations
Some additional questions:
- Is it possible to write integer-only versions of the popular coherent noise algorithms? (perlin/simplex/brownian/etc.)
- Can I get away with forcing IEEE754 floating point accuracy or will that compromise speed too much?
- Are 64-bit integers a thing on GPUs yet, or likely to become a thing any time soon?
As an addendum to this, I'm considering the possibility that this problem doesn't really have a good solution at this point in time, and that perhaps I need to be willing to simply store a lot of data and ensure that it is available when a machine needs to generate new content.