Links to Pseudo-random Theory?

Started by
49 comments, last by taby 11 years, 10 months ago
Can anyone link me to some pseudo-random papers or websites? Google and Wikipedia helped me a bit but it's mostly other people's implementation of algorithms with little info as to why. I'm fascinated with noise and would like to spend some time exploring this exciting field.

Actually, maybe this belongs in a different branch of the forums?

EDIT:
My findings thus far for any future programmers interested in RNG:
http://6502.org/sour...dom/random.html
"The Art Of Computer Programming" - D. Knuth
Advertisement
I bet the Math and Physics folks can point you in the right direction ;-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I haven't encountered any good papers that I specifically recall, but I agree that it's an interesting field. My personal theory is that random numbers and cryptography are very closely related. In cryptography you want to have as little relationship between the plaintext byte and the encrypted byte, in pseudo random number generation you want as little relationship between the next number generated and the previous number. Papers on cryptography say that ideally by changing a single bit in the plaintext you should have a 50/50 chance of changing every bit in the output. I think that's one of the reasons that algorithms for both often combine operations such as multiplication and modulus (which have little to no relationship to specific bits if you use prime numbers) with operations such as XORing, rotating or lookups (which are non-algebraic, so prevent any simple formula relating terms to be derived).

I would just note that some algorithms exhibit non-random properties such as never repeating until the full set of values has been enumerated. This is a nice property to ensure no degenerate cases (such as repeating the same 4 numbers over and over again), but stands out from true random numbers. I think this is most easily avoided if you keep some internal state. For example, you could generate a non-repeating series internally, but for the xth number return random[2*x] XOR random[2*x + 1]. Internal state also potentially allows a longer period before repetition (although not the way I stated above).
Cool so I could build an algorithm that picks every number in a set in a pseudo-random order, but then only iterate it a random number of times from a random location. That seems like it could still get stuck in a loop eh?

I'll look into cryptography - sounds like I could find some useful algorithms there as well.
Look up information theory, entropy, compression, (pseudo)randomness, PRNG's, all those keywords and peruse them on the net. They pretty much sum up what pseudorandomness is about. Cryptography uses pseudorandom numbers too (because they are faster and cheaper to produce than true random numbers) but with additional constraints and properties (which makes cryptographic pseudorandom number generators often unsuitable for anything else than cryptography because of average performance).

It is not trivial to create a pseudorandom number generator. Anyone can devise a formula that *looks* pseudorandom but humans are extremely bad at identifying randomness because we tend to see patterns in everything (that's how the brain works). So as soon as we can't see an obvious pattern we immediately label the sequence as "random", which is often not true. There are statistical tests to determine "how" random a sequence of numbers is, but they are probabilistic, so they only give you a rough idea, not a definitive answer (look up the Diehard test which is not the best but easy to understand, very detailed too).

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

I am not an expert in this field, but I have had the need to generate relatively high-quality pseudo-random numbers, and there are a few things I have learned about what you need to do to get them:

  • Keep a large state. If you only have something like 32 bits of state, you will probably have loops that are much shorter than 2^32. You can keep something like 2KB of state, and you don't need to access it all at each step.
  • Combine arithmetic operations and bit-wise operations (this may not be mandatory, but it works well).
  • Make sure you combine low bits into the high bits and also high bits into the low bits.
  • Make sure the operations you use to mix bits around don't lose information (i.e., make sure they are bijections).
  • Use arithmetic (in particular, multiplication) with numbers as large as possible.

If you are interested in PRNGs, you may also want to look into the very closely related subject of hash functions.

Another key thing to know about random numbers is that you can combine several sources of randomness by XORing them together. The result is at least as random as the most random of the ingredients. That might come in handy at some point.
Interesting - I assumed XOR would be useful here. Not quite sure how I knew that!

So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?
So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?[/quote]
No, XOR is a binary operation and has the property that a xor a = 0, so repeatedly XORing a pair of numbers together won't produce a very interesting sequence!
And same for addition, if you keep doing it then there will be correlation between successive values generated which means the sequence won't be very random either.
Example: say A = 6 and B = 9. So A + B = 15. Without knowing A and B, A + B would seem random. But if you keep adding B, you get 15, 24, 33, 42, 51, etc... which clearly is not random.

There are ways to "stretch" an initial number to obtain many pseudorandom numbers but the entropy of the whole sequence is no greater than the entropy of the initial state (since the whole sequence can then be described by the initial state + the algorithm used). In fact this can be shown in that often, it is in fact possible to predict the next number to be generated by knowing any given one, if the algorithm used is known (unless you use cryptographic algorithms, of course, since they are designed to prevent that).

And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?[/quote]
It may seem so but in fact, it's actually very fast because when you have a 2KB internal state, you generally produce 2KB of random data at each step of the algorithm (in non-cryptographic generators, that is), so the total cost is fully amortized. The idea is that the bigger the state, the more entropy you have available to play with, shuffle around, etc... which, literally put, makes it harder to screw up. It's just simpler to use big states.

But it is possible to produce pseudorandom numbers with a very small state. For instance, a suitable cryptographic block cipher (assume it is an ideal random permutation = it is a "perfect" cipher) can generate 2^256 blocks of 256-bit random numbers (an unimaginably large quantity) using only 256 bits of state. Of course, it will be slower, which is why instead we use large states for better performance. And the cipher may have flaws which could reduce its quality significantly, but this is what peer review is for.

There are number-theoretic constructs which have provably random characteristics, but these yield pretty slow generators in general (because we are then working with very large integers).

Note that when I say cryptographic generators are slow, this isn't always true. I've had success generating such cryptographic-grade pseudorandom numbers on a mainstream GPU, reaching speeds of up to 13GB/s (and 1.5GB/s on the CPU), so it's not like it's infinitely slower than general-purpose algorithms. But in almost every situation you don't need such numbers so it's best to stick with simpler algorithms.

Use arithmetic (in particular, multiplication) with numbers as large as possible.[/quote]
This is actually debatable, if you stick to small words (say, an 8-bit byte) and work on arrays of those words then you are able to mix logical (XOR, NOT, etc...) and arithmetic (addition, subtraction, ...) operations in a much less predictable way and maximize diffusion. Of course you can also use both methods, but it's important to keep it simple.
Also multiplication needs to be done carefully as it is not a bijection unless the modulus is a prime (or rather, the inputs to the multiplication are all coprime to the modulus, but since the inputs are probably going to be pseudorandom you want a prime modulus) which is tricky to get right because you are then reducing your field size (as a power of two is not a prime). It's been done, though. In fact it can be done with a power of two modulus if you ensure the inputs are always odd. In general multiplication is pretty unpopular though because of those restrictions and the fact that it's still kind of slow in hardware, but it has some nice properties.

As you've probably noticed by now I'm more into cryptography than pseudorandom numbers but if you have any questions on anything I'm happy to help.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

This is great, I'm learning so much! Very fun.

The web is a little frustrating, however. Hard to understand what I find because I'm somewhat new to programming.

So for my RNG it's faster to use ridiculous numbers of bytes than to use multiplication? Very counter intuitive but I can roll with it.
If I did multiply would (2^n)-1 generally be a good choice? Or prime numbers of that?

And using a modulus if good? Isn't is as slow as multiplication though?
And I need a large prime modulus?

(And as for XORing I didn't mean the same numbers over and over I meant taking a few random numbers and XORing them together.)
As others mentioned, multiplication and modulus do have their down sides, for example that computers natively work with data sizes in powers of 2. Imagine if one of the operations that I used was (x * 17) mod 257. 17 and 257 are both primes, sounds good, right? If I'm generating a stream of bytes, the range 0-256 can't be represented by a single byte. If I do mod 256 afterwards so it fits in my byte, there are two values that get turned into 0 (0 and 256). Therefore the number 0 would appear twice as often as expected in my "random" sequence.

I tend to break it up into 5 steps:
1. Get some initial source of randomness or pseudo-randomness.
2. Distribute entropy within each byte.
3. Distribute entropy between bytes.
4. Repeat 2 and 3 as required.
5. Produce an output based on the internal state.

For step one you can get something that seems randomish from a wide variety of sources, such as the current time, mouse movement, network packets, whatever. In the past I got mine from Windows thread scheduling, e.g. launched 100 threads that were modifying the same byte array with no locking. I never got the same sequence twice, even before applying a single pseudo-random algorithm. Note that these may not have passed a randomness test, but they were a good source of entropy.

For step two you can use whatever works for you, e.g. rotate the byte, (x + 17) mod 256, (x * 3) mod 256, or my personal favourite of doing a perfect shuffle of 256 bytes and do a simple lookup into that array. What I like about the lookup is there's literally no relationship between the before and after values (on a per byte basis anyway). Therefore each bit has a 50/50 chance of affecting every other bit within the byte.

For step three you can do things like swap bytes, add each byte to the next byte, xor each byte with the next byte, or once again my favourite of doing a lookup (perhaps use the next byte as an index for which byte to swap with the current byte).

Note that many approaches would effectively combine my steps two and three by performing complex operations on a large number rather than working with bytes. The downside is that it is limited by how large a number your CPU/programming language can perform operations on efficiently.

Step four is... well, step four. Repeat as often as you like to get more random-looking numbers.

Step five is often simplified by people to "return the number" or "return the last number". If you're doing what I do and generating a big buffer of random numbers in one go, you probably don't need to worry about this. But if you care about the problem I mentioned regarding no repeats and the rest of your algorithm is very simple, I would suggest doing something like return the current number plus the previous number XOR the number before that.

This topic is closed to new replies.

Advertisement