Reproducibility of random numbers

Started by
9 comments, last by MarkS_ 12 years, 5 months ago
I've come to the conclusion that I need to bump up random number generation a bit for my system (especially WRT save/restore, interoperability and such).
I've gone through this document and I've somehow got the feeling random number generators might be extremely portable. The distribution functions are specified. Random engines are pretty well understood. They even mandate the value to be returned by the 10000th call to some random engines. I'm not quite sold on the way they support state restore but... well, I guess I'll have to go through [font="Courier New"]discard[/font].

Nonetheless, thinking I can just trust the generators to be there and perform as expected is totally alien to me. For the time being, I'll surely use the standard engines as I only support Windows.
I'd like to know about other implementations, Apple, GCC, Android...

The main problem is that I want to be able to provide fine grained control over the seed: of course this makes sense only if the implementations will reasonably converge. So far, I didn't have the need to trust specific features of a certain random sequence produced, nonetheless I feel the need to consider output coherence across implementation. If this cannot be trusted, I'll perhaps just drop seed control completely (because yes, I do realize I might be worrying too much).

Your opinions?

Previously "Krohm"

Advertisement
My opinion, based on writing an implementation of this standard library, is that the sequence generated by the engines is required to be exactly reproducible (at least if you stick to one of the standard supplied pseudorandom engines; creating your own may yield some portability issues across some architectures), given the same seed. The state save/restore should work as specified, that is to say you should be able to rely on it.

The distributions are less strongly specified, since only the PDF is required by the standard, not the implementation. I can guarantee that given an identical engine (algorithm+initialization vector) they will generate the same sequence on the same platform, but I can not guarantee the same sequence across differing implementations.

Stephen M. Webb
Professional Free Software Developer

2 + 2 = 4 across any platform ever existed*. Random number generation is usually based off a mathematical algorithm that is... well, mathematical**. wink.gif
Since math is a universal language, the same algorithm can't possibly give different answers on two different machines... math is math.

The only problems come when there is some pre-existing RNG that you're using, but it uses a different algorithm for each platform. Such is the case for rand() since I believe the algorithm was pretty much left up to the compiler designers (as long as certain conditions are met), and each compiler might have used a different method. I might be mistake n here, though.

The solution: Just use your own algorithm, and you can be 100% certain the same algorithm is used on every platform, since you're compiling it yourself. Math is math, so your algorithm will produce the same results regardless of the compiler compiling it, platform it's running on, time of the year, or hardware the platform is executed on.
So just find a random number generator algorithm that works for you, and use it instead of the standard library's one. Maybe Mersenne twister or some other one.

* [size="1"]Any platform where that doesn't hold true is not one you want to be working on, even if it did exist. biggrin.gif
** [size="1"]Most of the time, anyway. You can use static garbage for random values, but that's not what you're looking for here.
A computer is deterministic and an algorithm run on said computer will, given the same parameters, will produce identical results. This is why pseudo is often prepended to the phrase random number generator. The authenticator key fobs you get from Paypal and Blizzard rely this fact, they're little more than a ultra precision real time clock fed into an encryption algorithm (essentially a keyed PRNG) the company knows the key and where the clock is with in a few ticks and can reproduce the value on authenticator fob for comparison at login.
Patrick
The pseudo random algoritm used by the random function is usually (somewhat) documented.
Yes, the algorithm itself does not depend on any system calls, and is highly portable.
Use a well known algoritm directly if you want perfect control, if you want really good random (huge interval before it repeats, good random distribution), maybe try Mersenne Twister (free) or WELL (licence for commercial use)
If it is a game, it's probably enough with a simpler (and faster) algorithm though
The simple ones work by multiplying the seed with a carefully chosen number and mod with a carefully chosen prime number, a google will turn up lots of examples

So just find a random number generator algorithm that works for you, and use it instead of the standard library's one. Maybe Mersenne twister or some other one.

The Mersenne twister, in both a 32-bit and a 64-bit form, is among the 9 pseudorandom engines mandated by the C++11 standard. Which is the "one" you're referring to?

You assertion that math is math does not apply when it comes to the distributions. There is often more than one way to implement even basic floating-point algebra such that the same result is not obtained by two different implementations. How would you choose to implement a Gaussian distribution? Would you choose Box-Meuller? Would you implement it using basic or polar form rejection? Which is best on your hardware and why?

The designers of the standard random library had a requirement of making the pseudorandom sequences repeatable with the same initialization vector for the purposes of repeat runs simulations. They had no mandate to ensure those same sequences were repeatable across different library implementations. The philosophy of good library design is to restrict only as much as necessary and no more.

Stephen M. Webb
Professional Free Software Developer


[quote name='Servant of the Lord' timestamp='1320676634' post='4881409']
So just find a random number generator algorithm that works for you, and use it instead of the standard library's one. Maybe Mersenne twister or some other one.

The Mersenne twister, in both a 32-bit and a 64-bit form, is among the 9 pseudorandom engines mandated by the C++11 standard. Which is the "one" you're referring to?[/quote]
By 'standard' I meant the C++98 standard's rand(), forgetting that C++11 has been officially made the standard.

You assertion that math is math does not apply when it comes to the distributions. There is often more than one way to implement even basic floating-point algebra such that the same result is not obtained by two different implementations.[/quote]
That's true, I wasn't thinking about floating point inaccuracies. You can still make a RNG using integers, however, such that it is guaranteed to be the same across all platforms (as long as you take of the differences in endian, you should be fine).

The designers of the standard random library had a requirement of making the pseudorandom sequences repeatable with the same initialization vector for the purposes of repeat runs simulations. They had no mandate to ensure those same sequences were repeatable across different library implementations. [/quote]
Isn't that exactly what I said? "[color="#1C2837"]The only problems come when there is some pre-existing RNG that you're using, but it uses a different algorithm for each platform.[color="#1C2837"]"[color="#1C2837"]



Perhaps my explanation isn't clear enough, or maybe I chose the improper wording... but what you just said, and what I already said, mean the exact same thing (at least to me).
The distributions are less strongly specified, since only the PDF is required by the standard, not the implementation. I can guarantee that given an identical engine (algorithm+initialization vector) they will generate the same sequence on the same platform, but I can not guarantee the same sequence across differing implementations.
Thank you very much, you have nailed the point.

I think I might leave out distributions completely for the time being. After all, [font="Courier New"]rand()[/font] did it so far. Hopefully uniform distributions will be reasonably coherent across implementations, I'll see how it goes later.

Considering you have written a library, would you comment on the use of [font="Courier New"]discard(count)[/font]?
I'm currently considering keeping the count of generated numbers and use the count to restore state, mostly because I just hate to deal with text representation. It seems like [font="Courier New"]operator>>[/font] and [font="Courier New"]operator<<[/font] might be used to do the same, presumably in a more efficient manner. I don't have access to the spec. Is this textual representation guaranteed to be portable across implementations? If not, I will go for [font="Courier New"]discard(count)[/font].
The solution: Just use your own algorithm
Well, I considered that. But I was hoping in somewhat more of a plug-in, ready to go solution... damn, it's almost 2012. :P

Previously "Krohm"


The only problems come when there is some pre-existing RNG that you're using, but it uses a different algorithm for each platform. Such is the case for rand() since I believe the algorithm was pretty much left up to the compiler designers (as long as certain conditions are met), and each compiler might have used a different method. I might be mistake n here, though.

You are correct. I performed some testing after discovering that rand() doesn't return the same sequence across all platforms, and came to the conclusion that its behaviour sometimes varies even between different versions of the same OS.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]


Considering you have written a library, would you comment on the use of [font="Courier New"]discard(count)[/font]?
I'm currently considering keeping the count of generated numbers and use the count to restore state, mostly because I just hate to deal with text representation. It seems like [font="Courier New"]operator>>[/font] and [font="Courier New"]operator<<[/font] might be used to do the same, presumably in a more efficient manner. I don't have access to the spec. Is this textual representation guaranteed to be portable across implementations? If not, I will go for [font="Courier New"]discard(count)[/font].

The purpose of the [font="Courier New"]discard()[/font] member function is to bring the engine to a known state from a known starting position. It will do exactly what you describe, but possibly at considerable cost since it just calls the generator function z times.

You can rely on [font="Courier New"]operator<<[/font] and [font="Courier New"]operator>>[/font] to save and restore state of a standard engine, but no there is no guarantee of portability of that saved state across implementations.

BTW, if all you need is a uniform distribution, just using the engine directly is probably good enough. I doubt any implementation would choose anything but the obvious way to implement [font="Courier New"]std::uniform_int_distribution[/font] or [font="Courier New"]std::uniform_real_distribution[/font], so those are probably OK for cross-implementation use, as are the sampling distributions.

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement