Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Links to Pseudo-random Theory?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
50 replies to this topic

#1 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 23 April 2012 - 06:10 PM

Can anyone link me to some pseudo-random papers or websites? Google and Wikipedia helped me a bit but it's mostly other people's implementation of algorithms with little info as to why. I'm fascinated with noise and would like to spend some time exploring this exciting field.

Actually, maybe this belongs in a different branch of the forums?

EDIT:
My findings thus far for any future programmers interested in RNG:
http://6502.org/sour...dom/random.html
"The Art Of Computer Programming" - D. Knuth

Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 16377

Like
1Likes
Like

Posted 23 April 2012 - 06:16 PM

I bet the Math and Physics folks can point you in the right direction ;-)

#3 jefferytitan   Crossbones+   -  Reputation: 2242

Like
1Likes
Like

Posted 23 April 2012 - 07:08 PM

I haven't encountered any good papers that I specifically recall, but I agree that it's an interesting field. My personal theory is that random numbers and cryptography are very closely related. In cryptography you want to have as little relationship between the plaintext byte and the encrypted byte, in pseudo random number generation you want as little relationship between the next number generated and the previous number. Papers on cryptography say that ideally by changing a single bit in the plaintext you should have a 50/50 chance of changing every bit in the output. I think that's one of the reasons that algorithms for both often combine operations such as multiplication and modulus (which have little to no relationship to specific bits if you use prime numbers) with operations such as XORing, rotating or lookups (which are non-algebraic, so prevent any simple formula relating terms to be derived).

I would just note that some algorithms exhibit non-random properties such as never repeating until the full set of values has been enumerated. This is a nice property to ensure no degenerate cases (such as repeating the same 4 numbers over and over again), but stands out from true random numbers. I think this is most easily avoided if you keep some internal state. For example, you could generate a non-repeating series internally, but for the xth number return random[2*x] XOR random[2*x + 1]. Internal state also potentially allows a longer period before repetition (although not the way I stated above).

#4 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 23 April 2012 - 07:37 PM

Cool so I could build an algorithm that picks every number in a set in a pseudo-random order, but then only iterate it a random number of times from a random location. That seems like it could still get stuck in a loop eh?

I'll look into cryptography - sounds like I could find some useful algorithms there as well.

#5 Bacterius   Crossbones+   -  Reputation: 9261

Like
1Likes
Like

Posted 23 April 2012 - 07:51 PM

Look up information theory, entropy, compression, (pseudo)randomness, PRNG's, all those keywords and peruse them on the net. They pretty much sum up what pseudorandomness is about. Cryptography uses pseudorandom numbers too (because they are faster and cheaper to produce than true random numbers) but with additional constraints and properties (which makes cryptographic pseudorandom number generators often unsuitable for anything else than cryptography because of average performance).

It is not trivial to create a pseudorandom number generator. Anyone can devise a formula that *looks* pseudorandom but humans are extremely bad at identifying randomness because we tend to see patterns in everything (that's how the brain works). So as soon as we can't see an obvious pattern we immediately label the sequence as "random", which is often not true. There are statistical tests to determine "how" random a sequence of numbers is, but they are probabilistic, so they only give you a rough idea, not a definitive answer (look up the Diehard test which is not the best but easy to understand, very detailed too).

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#6 Álvaro   Crossbones+   -  Reputation: 13879

Like
1Likes
Like

Posted 23 April 2012 - 08:35 PM

I am not an expert in this field, but I have had the need to generate relatively high-quality pseudo-random numbers, and there are a few things I have learned about what you need to do to get them:
  • Keep a large state. If you only have something like 32 bits of state, you will probably have loops that are much shorter than 2^32. You can keep something like 2KB of state, and you don't need to access it all at each step.
  • Combine arithmetic operations and bit-wise operations (this may not be mandatory, but it works well).
  • Make sure you combine low bits into the high bits and also high bits into the low bits.
  • Make sure the operations you use to mix bits around don't lose information (i.e., make sure they are bijections).
  • Use arithmetic (in particular, multiplication) with numbers as large as possible.
If you are interested in PRNGs, you may also want to look into the very closely related subject of hash functions.

Another key thing to know about random numbers is that you can combine several sources of randomness by XORing them together. The result is at least as random as the most random of the ingredients. That might come in handy at some point.

#7 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 23 April 2012 - 10:03 PM

Interesting - I assumed XOR would be useful here. Not quite sure how I knew that!

So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?

#8 Bacterius   Crossbones+   -  Reputation: 9261

Like
1Likes
Like

Posted 23 April 2012 - 11:55 PM

So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

No, XOR is a binary operation and has the property that a xor a = 0, so repeatedly XORing a pair of numbers together won't produce a very interesting sequence!
And same for addition, if you keep doing it then there will be correlation between successive values generated which means the sequence won't be very random either.
Example: say A = 6 and B = 9. So A + B = 15. Without knowing A and B, A + B would seem random. But if you keep adding B, you get 15, 24, 33, 42, 51, etc... which clearly is not random.

There are ways to "stretch" an initial number to obtain many pseudorandom numbers but the entropy of the whole sequence is no greater than the entropy of the initial state (since the whole sequence can then be described by the initial state + the algorithm used). In fact this can be shown in that often, it is in fact possible to predict the next number to be generated by knowing any given one, if the algorithm used is known (unless you use cryptographic algorithms, of course, since they are designed to prevent that).

And 2KB?? Seriously? Isn't that a bit extreme and would it not be very slow?

It may seem so but in fact, it's actually very fast because when you have a 2KB internal state, you generally produce 2KB of random data at each step of the algorithm (in non-cryptographic generators, that is), so the total cost is fully amortized. The idea is that the bigger the state, the more entropy you have available to play with, shuffle around, etc... which, literally put, makes it harder to screw up. It's just simpler to use big states.

But it is possible to produce pseudorandom numbers with a very small state. For instance, a suitable cryptographic block cipher (assume it is an ideal random permutation = it is a "perfect" cipher) can generate 2^256 blocks of 256-bit random numbers (an unimaginably large quantity) using only 256 bits of state. Of course, it will be slower, which is why instead we use large states for better performance. And the cipher may have flaws which could reduce its quality significantly, but this is what peer review is for.

There are number-theoretic constructs which have provably random characteristics, but these yield pretty slow generators in general (because we are then working with very large integers).

Note that when I say cryptographic generators are slow, this isn't always true. I've had success generating such cryptographic-grade pseudorandom numbers on a mainstream GPU, reaching speeds of up to 13GB/s (and 1.5GB/s on the CPU), so it's not like it's infinitely slower than general-purpose algorithms. But in almost every situation you don't need such numbers so it's best to stick with simpler algorithms.

Use arithmetic (in particular, multiplication) with numbers as large as possible.

This is actually debatable, if you stick to small words (say, an 8-bit byte) and work on arrays of those words then you are able to mix logical (XOR, NOT, etc...) and arithmetic (addition, subtraction, ...) operations in a much less predictable way and maximize diffusion. Of course you can also use both methods, but it's important to keep it simple.
Also multiplication needs to be done carefully as it is not a bijection unless the modulus is a prime (or rather, the inputs to the multiplication are all coprime to the modulus, but since the inputs are probably going to be pseudorandom you want a prime modulus) which is tricky to get right because you are then reducing your field size (as a power of two is not a prime). It's been done, though. In fact it can be done with a power of two modulus if you ensure the inputs are always odd. In general multiplication is pretty unpopular though because of those restrictions and the fact that it's still kind of slow in hardware, but it has some nice properties.

As you've probably noticed by now I'm more into cryptography than pseudorandom numbers but if you have any questions on anything I'm happy to help.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#9 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 24 April 2012 - 12:26 AM

This is great, I'm learning so much! Very fun.

The web is a little frustrating, however. Hard to understand what I find because I'm somewhat new to programming.

So for my RNG it's faster to use ridiculous numbers of bytes than to use multiplication? Very counter intuitive but I can roll with it.
If I did multiply would (2^n)-1 generally be a good choice? Or prime numbers of that?

And using a modulus if good? Isn't is as slow as multiplication though?
And I need a large prime modulus?

(And as for XORing I didn't mean the same numbers over and over I meant taking a few random numbers and XORing them together.)

#10 jefferytitan   Crossbones+   -  Reputation: 2242

Like
1Likes
Like

Posted 24 April 2012 - 01:30 AM

As others mentioned, multiplication and modulus do have their down sides, for example that computers natively work with data sizes in powers of 2. Imagine if one of the operations that I used was (x * 17) mod 257. 17 and 257 are both primes, sounds good, right? If I'm generating a stream of bytes, the range 0-256 can't be represented by a single byte. If I do mod 256 afterwards so it fits in my byte, there are two values that get turned into 0 (0 and 256). Therefore the number 0 would appear twice as often as expected in my "random" sequence.

I tend to break it up into 5 steps:
1. Get some initial source of randomness or pseudo-randomness.
2. Distribute entropy within each byte.
3. Distribute entropy between bytes.
4. Repeat 2 and 3 as required.
5. Produce an output based on the internal state.

For step one you can get something that seems randomish from a wide variety of sources, such as the current time, mouse movement, network packets, whatever. In the past I got mine from Windows thread scheduling, e.g. launched 100 threads that were modifying the same byte array with no locking. I never got the same sequence twice, even before applying a single pseudo-random algorithm. Note that these may not have passed a randomness test, but they were a good source of entropy.

For step two you can use whatever works for you, e.g. rotate the byte, (x + 17) mod 256, (x * 3) mod 256, or my personal favourite of doing a perfect shuffle of 256 bytes and do a simple lookup into that array. What I like about the lookup is there's literally no relationship between the before and after values (on a per byte basis anyway). Therefore each bit has a 50/50 chance of affecting every other bit within the byte.

For step three you can do things like swap bytes, add each byte to the next byte, xor each byte with the next byte, or once again my favourite of doing a lookup (perhaps use the next byte as an index for which byte to swap with the current byte).

Note that many approaches would effectively combine my steps two and three by performing complex operations on a large number rather than working with bytes. The downside is that it is limited by how large a number your CPU/programming language can perform operations on efficiently.

Step four is... well, step four. Repeat as often as you like to get more random-looking numbers.

Step five is often simplified by people to "return the number" or "return the last number". If you're doing what I do and generating a big buffer of random numbers in one go, you probably don't need to worry about this. But if you care about the problem I mentioned regarding no repeats and the rest of your algorithm is very simple, I would suggest doing something like return the current number plus the previous number XOR the number before that.

#11 Álvaro   Crossbones+   -  Reputation: 13879

Like
1Likes
Like

Posted 24 April 2012 - 06:32 AM

So I can XOR two random numbers together and get a number at least as random as the less random one? And I can keep doing that as much as I want to get more and more random numbers?

No, XOR is a binary operation and has the property that a xor a = 0, so repeatedly XORing a pair of numbers together won't produce a very interesting sequence!
And same for addition, if you keep doing it then there will be correlation between successive values generated which means the sequence won't be very random either.
Example: say A = 6 and B = 9. So A + B = 15. Without knowing A and B, A + B would seem random. But if you keep adding B, you get 15, 24, 33, 42, 51, etc... which clearly is not random.


The feature of the XOR operation I was talking about probably needs to be explained in more detail, but the idea is that if a sequence of bits is hard to predict, XORing it with numbers from other sources won't destroy that feature. Of course if another method comes up with the same number, their bits will annihilate each other, but that means that the bits were not that hard to guess.

Imagine a project needs 1MB of random bits. Each member of the team could come up with their 1MB block and then we'll XOR them together. One particular member might have used a crappy PRNG, another one might have used the first bits of a compressed Linux kernel, another one might have used /dev/urandom... If at least one of the methods is hard to predict, the XOR of all of them will be hard to predict.

About multiplication, what I had in mind is multiplying by an odd constant modulo 2^64 (if your machine supports that operation) as a method to mix low bits into higher bits. If you have access to the full 128 bits of the result of multiplying two 64-bit integers, the middle bits have a good mix of the original bits, that for instance can help you achieve good avalanche effect in a hash function. Modern 64-bit CPUs support this operation and perform it in something like 3 clock cycles, so it is a very fast way of getting bit mixing. Anyway, mixing bits can be done in a number of ways, and this is just one idea.

#12 JTippetts   Moderators   -  Reputation: 8645

Like
1Likes
Like

Posted 24 April 2012 - 06:36 AM

George Marsaglia posted several relatively simple generators to the old newsgroups many, many years ago. They're decent generators, and simple to implement. Alternatively, the boost library offers up good choices for high-quality generators, including lagged Fibonacci and Mersenne Twister. Knuth, of course, as you posted in your edit is always a good authority on PRNGs.

For games in general, you are mostly going to be looking for speed, as opposed to high-quality. Especially in noise applications, where you are layering many layers of noise together, you want your PRNG hash to be a fast as possible while still providing decent output. The reference implementation of Perlin noise provides a hash algorithm that makes use of a look-up table to "fold" bytes together to get the lattice hash for each point. The benefit of a look up table in this manner is relatively fast performance. As well, for a noise implementation you want to avoid any generator that requires internal state. Mersenne Twister, for example, stores internal tables and is complicated to seed, thus making it unsuitable for noise generation.

The link in my signature goes to the sourceforge page for my personal noise library. It's still (perpetually) under construction. As well, I have written quite a few journal posts regarding noise generation, seamless noise, etc... a few of which were published in Game Developer Magazine last year.

#13 Bregma   Crossbones+   -  Reputation: 5352

Like
1Likes
Like

Posted 24 April 2012 - 07:27 AM

I would recommend you read Knuth for a solid background on pseudorandom number generation from a theoretical perspective. There is no source better.

For simple noise generation in a game you certainly don't need cryptographic quality PRNGs. You want something fast with broad spectral characteristics and possibly a long sequence. A lagged Fibonacci generator with block rejection is probably the best time-space tradeoff. The classic linear congruential generator has very poor spectral characteristics and makes a poor choice for noise. The Mersenne Twister has excellent spectral characteristics at the expense of greatly increased computation and memory footprint.

If you're using C++, the standard library (<random> in the current standard, or <tr1/random> for ancient compilers) provides a really excellent selection of pseudorandom number generators and useful distribution adapters [disclaimer: I wrote an implementation that comes with a popular compiler]. Easy to use, well documented, and well tested.
Stephen M. Webb
Professional Free Software Developer

#14 taby   Members   -  Reputation: 336

Like
1Likes
Like

Posted 24 April 2012 - 09:34 AM

http://en.wikipedia....g_(mathematics)

Oh and this is fun, though obviously platform dependent:
http://spectrum.ieee.org/computing/hardware/behind-intels-new-randomnumber-generator/0

#15 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 24 April 2012 - 12:31 PM

Alright so it's faster to generate all of the random numbers at the start and then iterate through them one by one?
I'm not familiar with Hashes at all - I'll check my book and google for more info, of course.

So I could do this with two or three tables of different lengths and XOR the results and get a lot of randomness right?
And it would still be fast?
Or should I just create one really, really big table at the start?

#16 Bacterius   Crossbones+   -  Reputation: 9261

Like
1Likes
Like

Posted 24 April 2012 - 03:08 PM

Alright so it's faster to generate all of the random numbers at the start and then iterate through them one by one?
I'm not familiar with Hashes at all - I'll check my book and google for more info, of course.

I suppose it would be faster in a sense but it's not practical, imagine you need trillions of random numbers for a very long-running simulation (this is not uncommon), you aren't gonna be able to precalculate them all at the beginning. And the performance gain is probably not worth the increased memory use.


So I could do this with two or three tables of different lengths and XOR the results and get a lot of randomness right?
And it would still be fast?
Or should I just create one really, really big table at the start?

No you don't need to do this. You just call the PRNG each time you need a pseudorandom number, it will give you one quickly. Remember that while memory probably abounds on your laptops and desktops, embedded devices also need some form of pseudorandom number generator, and it probably can't afford kilobytes of memory. Pseudorandom theory has improved enough to let us generate a virtual infinity of very high quality pseudorandom numbers very efficiently on a computer in O(1) memory, we should use this to our advantage.

The basic idea is to start with an initial "seed" and mix/diffuse the state's bits around in a very unpredictable, nonrepeating (hopefully) way, and use that as the next random number. Then you start again with the new state, ad infinitum (or until the sequence starts repeating, anyway). There are other methods but this is pretty much the gold standard.

A fun exercise is to create your own PRNG with a small state (say, 32 bits) to facilitate analysis, and then run various tests on it (e.g. what it its period, etc...). You might be surprised! Then, modify some workings of the algorithm used and see how this modifies the results. It's quite fascinating.

EDIT: you can combine the outputs of two generators by XOR if you want but remember this will only result in a sequence that's as pseudorandom as the most pseudorandom of the two, and your speed is also halved, so in general it's best to just use a single, good pseudorandom number generator. There are exceptions, particularly in hardware (combining multiple shift registers, which are very fast, for instance), but these are special cases.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#17 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 24 April 2012 - 10:50 PM

Yeah I thought that using an algorithm was right but it seemed like everyone was telling me I should use tables.

And thank you for reminding me that XOR requires two decent random numbers so it's twice the effort.
Would it typically be a bad idea to XOR x{n} with x{n-1} or x{n-2}? Then I won't need to generate more numbers but since they have something in common that's probably bad right?


So basically right now I'm only working with 4 bits(!) at a time and I'm trying to stay away from multiplication and modulus.

Looks like I don't have the math skills to grasp much of the theory, though. That's a shame.

#18 Bacterius   Crossbones+   -  Reputation: 9261

Like
1Likes
Like

Posted 24 April 2012 - 11:20 PM

Would it typically be a bad idea to XOR x{n} with x{n-1} or x{n-2}? Then I won't need to generate more numbers but since they have something in common that's probably bad right?

No because then you'd have the sequence:
x{n}
x{n - 1}
x{n - 2}
x{n - 3}
...

And:
x{n} xor x{n - 1}
x{n} xor x{n - 2}
x{n} xor x{n - 3}
...

These two sequences are clearly correlated because if you XOR one with the other, you always obtain the same number: x{n}. Such a characteristic means that the numbers you thought were pseudorandom actually exhibit nonrandom behaviour, which makes them unsuitable for use (they don't satisfy all properties of random numbers so using them in a simulation or something would completely flaw it, as the correctness of the simulation relies on the numbers being random).

Here is a simple example of a very simple pseudorandom number generator (an algorithm). It takes as an input two bytes A and B and outputs two bytes A2 and B2 (and you can iterate it to generate a whole sequence of numbers):
A2 = A + B
B2 = (B <<< 3) xor A

Where <<< represents bit rotation (here, rotation three bits to the left). Try and feed it with two arbitrary numbers, say 44 and 171, and see what the output is. Then keep repeating it on the previous output and see when it eventually settles into a cycle (it will, but how big will the cycle be?). Does it produce the same output for two distinct inputs? Now what if you work on 16-bit words instead of bytes, or what if you change the rotation value to something else? Etc...

Looks like I don't have the math skills to grasp much of the theory, though. That's a shame.

Yeah the math around pseudorandom numbers is pretty tough but much of it can be understood intuitively, and it's not like you need to know the theory to enjoy playing around with algorithms :)

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#19 Nick Alger   Members   -  Reputation: 486

Like
1Likes
Like

Posted 25 April 2012 - 02:10 AM

Can anyone link me to some pseudo-random papers or websites?


http://people.seas.h...cnotes/list.htm

Harvard graduate course on the subject, taught by a expert, with free book/lecture notes! Kabam!

#20 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 28 April 2012 - 05:51 PM

Okay I'm getting along with my RNG! It's kind of bloated so I'm shaving it down.
I know some of this stuff might be overkill but I'm going to be generating a fair amount of random numbers.
Besides, I'm only really doing this to learn and have fun. But I am using it in my work later on, if it's any good lol.

I want my RNG to be nearly as fast as a congruential generator like:
{ R = ((aR + b) mod c) }.
It doesn't have to be as lightweight or as good, of course.

1) How fast, relatively is addition and subtraction compared to and/or/xor?
How fast, relatively is multiplication, division, modulus compared to addition and subtraction?

1.5) Could someone ball park a number of bit operations that would be acceptable to meet my goals for speed?
I'm building the RNG entirely out of and, or, xor, not, <<, >>... so I need to know a good maximum number of these I'm allowed to have.

2) Are two bitshifts as fast as one of equal size?
Like is { a << 5 } faster than { a << 2; a << 3; } or is it done one at a time and thus equally fast?

Thoughts?

Edited by Expert Novice, 28 April 2012 - 07:32 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS