**1**

# Links to Pseudo-random Theory?

###
#1
Members - Reputation: **355**

Posted 23 April 2012 - 06:10 PM

Actually, maybe this belongs in a different branch of the forums?

EDIT:

My findings thus far for any future programmers interested in RNG:

http://6502.org/sour...dom/random.html

"The Art Of Computer Programming" - D. Knuth

###
#2
Moderators - Reputation: **18214**

Posted 23 April 2012 - 06:16 PM

[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]

###
#3
Crossbones+ - Reputation: **2455**

Posted 23 April 2012 - 07:08 PM

I would just note that some algorithms exhibit non-random properties such as never repeating until the full set of values has been enumerated. This is a nice property to ensure no degenerate cases (such as repeating the same 4 numbers over and over again), but stands out from true random numbers. I think this is most easily avoided if you keep some internal state. For example, you could generate a non-repeating series internally, but for the xth number return random[2*x] XOR random[2*x + 1]. Internal state also potentially allows a longer period before repetition (although not the way I stated above).

###
#4
Members - Reputation: **355**

Posted 23 April 2012 - 07:37 PM

I'll look into cryptography - sounds like I could find some useful algorithms there as well.

###
#5
Crossbones+ - Reputation: **11962**

Posted 23 April 2012 - 07:51 PM

It is not trivial to create a pseudorandom number generator. Anyone can devise a formula that *looks* pseudorandom but humans are extremely bad at identifying randomness because we tend to see patterns in everything (that's how the brain works). So as soon as we can't see an obvious pattern we immediately label the sequence as "random", which is often not true. There are statistical tests to determine "how" random a sequence of numbers is, but they are probabilistic, so they only give you a rough idea, not a definitive answer (look up the Diehard test which is not the best but easy to understand, very detailed too).

*“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”*

###
#6
Crossbones+ - Reputation: **16809**

Posted 23 April 2012 - 08:35 PM

- Keep a large state. If you only have something like 32 bits of state, you will probably have loops that are much shorter than 2^32. You can keep something like 2KB of state, and you don't need to access it all at each step.
- Combine arithmetic operations and bit-wise operations (this may not be mandatory, but it works well).
- Make sure you combine low bits into the high bits and also high bits into the low bits.
- Make sure the operations you use to mix bits around don't lose information (i.e., make sure they are bijections).
- Use arithmetic (in particular, multiplication) with numbers as large as possible.

Another key thing to know about random numbers is that you can combine several sources of randomness by XORing them together. The result is at least as random as the most random of the ingredients. That might come in handy at some point.

###
#7
Members - Reputation: **355**

Posted 23 April 2012 - 10:03 PM

So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?

###
#8
Crossbones+ - Reputation: **11962**

Posted 23 April 2012 - 11:55 PM

No, XOR is a binary operation and has the property that a xor a = 0, so repeatedly XORing a pair of numbers together won't produce a very interesting sequence!So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

And same for addition, if you keep doing it then there will be correlation between successive values generated which means the sequence won't be very random either.

Example: say A = 6 and B = 9. So A + B = 15. Without knowing A and B, A + B would seem random. But if you keep adding B, you get 15, 24, 33, 42, 51, etc... which clearly is not random.

There are ways to "stretch" an initial number to obtain many pseudorandom numbers but the entropy of the whole sequence is no greater than the entropy of the initial state (since the whole sequence can then be described by the initial state + the algorithm used). In fact this can be shown in that often, it is in fact possible to predict the next number to be generated by knowing any given one, if the algorithm used is known (unless you use cryptographic algorithms, of course, since they are designed to prevent that).

It may seem so but in fact, it's actually very fast because when you have a 2KB internal state, you generally produce 2KB of random data at each step of the algorithm (in non-cryptographic generators, that is), so the total cost is fully amortized. The idea is that the bigger the state, the more entropy you have available to play with, shuffle around, etc... which, literally put, makes it harder to screw up. It's just simpler to use big states.And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?

But it is possible to produce pseudorandom numbers with a very small state. For instance, a suitable cryptographic block cipher (assume it is an ideal random permutation = it is a "perfect" cipher) can generate 2^256 blocks of 256-bit random numbers (an unimaginably large quantity) using only 256 bits of state. Of course, it will be slower, which is why instead we use large states for better performance. And the cipher may have flaws which could reduce its quality significantly, but this is what peer review is for.

There are number-theoretic constructs which have provably random characteristics, but these yield pretty slow generators in general (because we are then working with very large integers).

Note that when I say cryptographic generators are slow, this isn't always true. I've had success generating such cryptographic-grade pseudorandom numbers on a mainstream GPU, reaching speeds of up to 13GB/s (and 1.5GB/s on the CPU), so it's not like it's infinitely slower than general-purpose algorithms. But in almost every situation you don't need such numbers so it's best to stick with simpler algorithms.

This is actually debatable, if you stick to small words (say, an 8-bit byte) and work on arrays of those words then you are able to mix logical (XOR, NOT, etc...) and arithmetic (addition, subtraction, ...) operations in a much less predictable way and maximize diffusion. Of course you can also use both methods, but it's important to keep it simple.Use arithmetic (in particular, multiplication) with numbers as large as possible.

Also multiplication needs to be done carefully as it is not a bijection unless the modulus is a prime (or rather, the inputs to the multiplication are all coprime to the modulus, but since the inputs are probably going to be pseudorandom you want a prime modulus) which is tricky to get right because you are then reducing your field size (as a power of two is not a prime). It's been done, though. In fact it can be done with a power of two modulus if you ensure the inputs are always odd. In general multiplication is pretty unpopular though because of those restrictions and the fact that it's still kind of slow in hardware, but it has some nice properties.

As you've probably noticed by now I'm more into cryptography than pseudorandom numbers but if you have any questions on anything I'm happy to help.

*“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”*

###
#9
Members - Reputation: **355**

Posted 24 April 2012 - 12:26 AM

The web is a little frustrating, however. Hard to understand what I find because I'm somewhat new to programming.

So for my RNG it's faster to use ridiculous numbers of bytes than to use multiplication? Very counter intuitive but I can roll with it.

If I did multiply would (2^n)-1 generally be a good choice? Or prime numbers of that?

And using a modulus if good? Isn't is as slow as multiplication though?

And I need a large prime modulus?

(And as for XORing I didn't mean the same numbers over and over I meant taking a few random numbers and XORing them together.)

###
#10
Crossbones+ - Reputation: **2455**

Posted 24 April 2012 - 01:30 AM

I tend to break it up into 5 steps:

1. Get some initial source of randomness or pseudo-randomness.

2. Distribute entropy within each byte.

3. Distribute entropy between bytes.

4. Repeat 2 and 3 as required.

5. Produce an output based on the internal state.

For step one you can get something that seems randomish from a wide variety of sources, such as the current time, mouse movement, network packets, whatever. In the past I got mine from Windows thread scheduling, e.g. launched 100 threads that were modifying the same byte array with no locking. I never got the same sequence twice, even before applying a single pseudo-random algorithm. Note that these may not have passed a randomness test, but they were a good source of entropy.

For step two you can use whatever works for you, e.g. rotate the byte, (x + 17) mod 256, (x * 3) mod 256, or my personal favourite of doing a perfect shuffle of 256 bytes and do a simple lookup into that array. What I like about the lookup is there's literally no relationship between the before and after values (on a per byte basis anyway). Therefore each bit has a 50/50 chance of affecting every other bit within the byte.

For step three you can do things like swap bytes, add each byte to the next byte, xor each byte with the next byte, or once again my favourite of doing a lookup (perhaps use the next byte as an index for which byte to swap with the current byte).

Note that many approaches would effectively combine my steps two and three by performing complex operations on a large number rather than working with bytes. The downside is that it is limited by how large a number your CPU/programming language can perform operations on efficiently.

Step four is... well, step four. Repeat as often as you like to get more random-looking numbers.

Step five is often simplified by people to "return the number" or "return the last number". If you're doing what I do and generating a big buffer of random numbers in one go, you probably don't need to worry about this. But if you care about the problem I mentioned regarding no repeats and the rest of your algorithm is very simple, I would suggest doing something like return the current number plus the previous number XOR the number before that.

###
#11
Crossbones+ - Reputation: **16809**

Posted 24 April 2012 - 06:32 AM

No, XOR is a binary operation and has the property that a xor a = 0, so repeatedly XORing a pair of numbers together won't produce a very interesting sequence!So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

And same for addition, if you keep doing it then there will be correlation between successive values generated which means the sequence won't be very random either.

Example: say A = 6 and B = 9. So A + B = 15. Without knowing A and B, A + B would seem random. But if you keep adding B, you get 15, 24, 33, 42, 51, etc... which clearly is not random.

The feature of the XOR operation I was talking about probably needs to be explained in more detail, but the idea is that if a sequence of bits is hard to predict, XORing it with numbers from other sources won't destroy that feature. Of course if another method comes up with the same number, their bits will annihilate each other, but that means that the bits were not that hard to guess.

Imagine a project needs 1MB of random bits. Each member of the team could come up with their 1MB block and then we'll XOR them together. One particular member might have used a crappy PRNG, another one might have used the first bits of a compressed Linux kernel, another one might have used /dev/urandom... If at least one of the methods is hard to predict, the XOR of all of them will be hard to predict.

About multiplication, what I had in mind is multiplying by an odd constant modulo 2^64 (if your machine supports that operation) as a method to mix low bits into higher bits. If you have access to the full 128 bits of the result of multiplying two 64-bit integers, the middle bits have a good mix of the original bits, that for instance can help you achieve good avalanche effect in a hash function. Modern 64-bit CPUs support this operation and perform it in something like 3 clock cycles, so it is a very fast way of getting bit mixing. Anyway, mixing bits can be done in a number of ways, and this is just one idea.

###
#12
Moderators - Reputation: **10625**

Posted 24 April 2012 - 06:36 AM

For games in general, you are mostly going to be looking for speed, as opposed to high-quality. Especially in noise applications, where you are layering many layers of noise together, you want your PRNG hash to be a fast as possible while still providing decent output. The reference implementation of Perlin noise provides a hash algorithm that makes use of a look-up table to "fold" bytes together to get the lattice hash for each point. The benefit of a look up table in this manner is relatively fast performance. As well, for a noise implementation you want to avoid any generator that requires internal state. Mersenne Twister, for example, stores internal tables and is complicated to seed, thus making it unsuitable for noise generation.

The link in my signature goes to the sourceforge page for my personal noise library. It's still (perpetually) under construction. As well, I have written quite a few journal posts regarding noise generation, seamless noise, etc... a few of which were published in Game Developer Magazine last year.

###
#13
Crossbones+ - Reputation: **6497**

Posted 24 April 2012 - 07:27 AM

For simple noise generation in a game you certainly don't need cryptographic quality PRNGs. You want something fast with broad spectral characteristics and possibly a long sequence. A lagged Fibonacci generator with block rejection is probably the best time-space tradeoff. The classic linear congruential generator has very poor spectral characteristics and makes a poor choice for noise. The Mersenne Twister has excellent spectral characteristics at the expense of greatly increased computation and memory footprint.

If you're using C++, the standard library (<random> in the current standard, or <tr1/random> for ancient compilers) provides a really excellent selection of pseudorandom number generators and useful distribution adapters [disclaimer: I wrote an implementation that comes with a popular compiler]. Easy to use, well documented, and well tested.

*Stephen M. Webb*

Professional Free Software Developer

###
#14
Members - Reputation: **389**

Posted 24 April 2012 - 09:34 AM

Oh and this is fun, though obviously platform dependent:

http://spectrum.ieee.org/computing/hardware/behind-intels-new-randomnumber-generator/0

###
#15
Members - Reputation: **355**

Posted 24 April 2012 - 12:31 PM

I'm not familiar with Hashes at all - I'll check my book and google for more info, of course.

So I could do this with two or three tables of different lengths and XOR the results and get a lot of randomness right?

And it would still be fast?

Or should I just create one really, really big table at the start?

###
#16
Crossbones+ - Reputation: **11962**

Posted 24 April 2012 - 03:08 PM

I suppose it would be faster in a sense but it's not practical, imagine you need trillions of random numbers for a very long-running simulation (this is not uncommon), you aren't gonna be able to precalculate them all at the beginning. And the performance gain is probably not worth the increased memory use.Alright so it's faster to generate all of the random numbers at the start and then iterate through them one by one?

I'm not familiar with Hashes at all - I'll check my book and google for more info, of course.

No you don't need to do this. You just call the PRNG each time you need a pseudorandom number, it will give you one quickly. Remember that while memory probably abounds on your laptops and desktops, embedded devices also need some form of pseudorandom number generator, and it probably can't afford kilobytes of memory. Pseudorandom theory has improved enough to let us generate a virtual infinity of very high quality pseudorandom numbers very efficiently on a computer in O(1) memory, we should use this to our advantage.So I could do this with two or three tables of different lengths and XOR the results and get a lot of randomness right?

And it would still be fast?

Or should I just create one really, really big table at the start?

The basic idea is to start with an initial "seed" and mix/diffuse the state's bits around in a very unpredictable, nonrepeating (hopefully) way, and use that as the next random number. Then you start again with the new state, ad infinitum (or until the sequence starts repeating, anyway). There are other methods but this is pretty much the gold standard.

A fun exercise is to create your own PRNG with a small state (say, 32 bits) to facilitate analysis, and then run various tests on it (e.g. what it its period, etc...). You might be surprised! Then, modify some workings of the algorithm used and see how this modifies the results. It's quite fascinating.

EDIT: you can combine the outputs of two generators by XOR if you want but remember this will only result in a sequence that's as pseudorandom as the most pseudorandom of the two, and your speed is also halved, so in general it's best to just use a single, good pseudorandom number generator. There are exceptions, particularly in hardware (combining multiple shift registers, which are very fast, for instance), but these are special cases.

*“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”*

###
#17
Members - Reputation: **355**

Posted 24 April 2012 - 10:50 PM

And thank you for reminding me that XOR requires two decent random numbers so it's twice the effort.

Would it typically be a bad idea to XOR x{n} with x{n-1} or x{n-2}? Then I won't need to generate more numbers but since they have something in common that's probably bad right?

So basically right now I'm only working with 4 bits(!) at a time and I'm trying to stay away from multiplication and modulus.

Looks like I don't have the math skills to grasp much of the theory, though. That's a shame.

###
#18
Crossbones+ - Reputation: **11962**

Posted 24 April 2012 - 11:20 PM

No because then you'd have the sequence:Would it typically be a bad idea to XOR x{n} with x{n-1} or x{n-2}? Then I won't need to generate more numbers but since they have something in common that's probably bad right?

x{n}

x{n - 1}

x{n - 2}

x{n - 3}

...

And:

x{n} xor x{n - 1}

x{n} xor x{n - 2}

x{n} xor x{n - 3}

...

These two sequences are clearly correlated because if you XOR one with the other, you always obtain the same number: x{n}. Such a characteristic means that the numbers you thought were pseudorandom actually exhibit nonrandom behaviour, which makes them unsuitable for use (they don't satisfy all properties of random numbers so using them in a simulation or something would completely flaw it, as the correctness of the simulation relies on the numbers being random).

Here is a simple example of a very simple pseudorandom number generator (an algorithm). It takes as an input two bytes A and B and outputs two bytes A2 and B2 (and you can iterate it to generate a whole sequence of numbers):

A2 = A + B

B2 = (B <<< 3) xor A

Where <<< represents bit rotation (here, rotation three bits to the left). Try and feed it with two arbitrary numbers, say 44 and 171, and see what the output is. Then keep repeating it on the previous output and see when it eventually settles into a cycle (it will, but how big will the cycle be?). Does it produce the same output for two distinct inputs? Now what if you work on 16-bit words instead of bytes, or what if you change the rotation value to something else? Etc...

Yeah the math around pseudorandom numbers is pretty tough but much of it can be understood intuitively, and it's not like you need to know the theory to enjoy playing around with algorithmsLooks like I don't have the math skills to grasp much of the theory, though. That's a shame.

###
#19
Members - Reputation: **510**

Posted 25 April 2012 - 02:10 AM

Can anyone link me to some pseudo-random papers or websites?

http://people.seas.h...cnotes/list.htm

Harvard graduate course on the subject, taught by a expert, with free book/lecture notes! Kabam!

###
#20
Members - Reputation: **355**

Posted 28 April 2012 - 05:51 PM

I know some of this stuff might be overkill but I'm going to be generating a fair amount of random numbers.

Besides, I'm only really doing this to learn and have fun. But I am using it in my work later on, if it's any good lol.

I want my RNG to be nearly as fast as a congruential generator like:

{ R = ((aR + b) mod c) }.

It doesn't have to be as lightweight or as good, of course.

1) How fast, relatively is addition and subtraction compared to and/or/xor?

How fast, relatively is multiplication, division, modulus compared to addition and subtraction?

1.5) Could someone ball park a number of bit operations that would be acceptable to meet my goals for speed?

I'm building the RNG entirely out of and, or, xor, not, <<, >>... so I need to know a good maximum number of these I'm allowed to have.

2) Are two bitshifts as fast as one of equal size?

Like is { a << 5 } faster than { a << 2; a << 3; } or is it done one at a time and thus equally fast?

Thoughts?

**Edited by Expert Novice, 28 April 2012 - 07:32 PM.**