How to Unit-Test a Random Number Generator

Started by
8 comments, last by OrangyTang 15 years, 11 months ago
The subject already says it, how would you go about unit-testing a random number generator? Specifically, I've got some algorithms that generate random uniformly distributed points on and in various geometric shapes (spheres, discs, rectangles, boxes, cylinders and so on). From visual inspection, the results seem sound, but I can't think of a proper way to write a deterministic unit test for these algorithms. I thought about just generating 1000 points and checking whether each one it actually on/in the shape of choice, then averaging all the points to check whether it's within a certain tolerance of the shape's actual center. But that wouldn't protect from certain errors like the sphere generator distributing more points on the sphere's poles than on its equator. Any ideas?
Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.
Advertisement
Generate a lot of results (coordinates) and plot them in 3d space using something like OpenGL. Make sure the entire geometric shape is represented as a solid by exploring it, this will show you that you have equal distribution. Then save this long random sequence to a file and compress it with bzip2. If the compression factor is extremely low you have good entropy as well.
To start with, look at this discussion, as it relates a bit to the topic.

Donald Knuth describes in his Art of Computer Programming series several methods to test PRNGs. He describes two common goodness-of-fit tests, Chi squared and Kolmogorov-Smirnov. These tests can be used to evaluate how well your random samples follow a given probability distribution. He also describes several good probability distributions to test against (Runs Up/Down, Gaps, N-dimensional Bucketing, Poker test to name a few).

I warn you though, there's no simple fail/pass criteria for the randomness of PRNGs. It can be an endless pit to try to improve more and more tests to your suite, and at the end of the day the best you'll get is "this PRNG satisfies this distribution with 51% probability" kind of information.

These tests are good to show that a PRNG has a flaw, but they won't prove they'd be satisfyingly random for all applications.
I tend to plot histograms of varying size for functions that don't really count. For things that are a little more important, I compare the actual distribution to the expected distribution and average the squares of the normalized difference for every value.

What I consider the most important test is periodicity; how long before you can expect your pseudorandom sequence to reoccur given different starting points.
We''re sorry, but you don''t have the clearance to read this post. Please exit your browser at this time. (Code 23)
Quote:Original post by asp_
Generate a lot of results (coordinates) and plot them in 3d space using something like OpenGL. Make sure the entire geometric shape is represented as a solid by exploring it, this will show you that you have equal distribution. Then save this long random sequence to a file and compress it with bzip2. If the compression factor is extremely low you have good entropy as well.


Are you recommending to drop the random number generator and replace it with a large set of pregenerated random points from which I randomly choose?

I have several generators that work on dynamic shapes (eg. triangle meshes) where this wouldn't work. Also, I could just as well use the same random seed again to generate this data set on the fly instead of saving it to a file.

Visual inspection already shows that my algorithms are resulting in uniform distribution. I'd just like to have an automated test (one I can run on my code each time I change something to prove it's still generated the right results).

And I don't quite see how compression factor could have anything to do with the entropy? The amount of candidate points wouldn't change.
Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.
The reason for compressing the results is to test entropy: the amount of randomness in the data. If the data is predictable it will compress easily and you will have a high compression ratio, truly random data will compress very badly, giving a low compression ratio.

Saving the results to a file is, in this case, not so you can load them up again, only as a way to measure how random the data is.
Diehard RNG tests.txt
--"I'm not at home right now, but" = lights on, but no ones home
Quote:Original post by clb
To start with, look at this discussion, as it relates a bit to the topic.

Donald Knuth describes in his Art of Computer Programming series several methods to test PRNGs. He describes two common goodness-of-fit tests, Chi squared and Kolmogorov-Smirnov. These tests can be used to evaluate how well your random samples follow a given probability distribution. He also describes several good probability distributions to test against (Runs Up/Down, Gaps, N-dimensional Bucketing, Poker test to name a few).
[...]


Quite interesting!
Maybe I can take some ideas from that.

My actual struggle is with the distribution of random points on various geometrical shapes:

Eg. from my previous example, to generate a random number on the surface of a sphere, you could naively generate two random numbers using any exchangeable RNG and use the first for yaw and the second for pitch. But that wouldn't result in uniform distribution because near the poles, changes in yaw equal less and less distance between the points. Thus, you modulate the RNG output for pitch with a curve so it is more likely to generate points near the equator than at the pole.

I think I can apply some of those ideas by taking the diameter of a shape on various points of each coordinate axis and comparing it to the distribution curve of a large number of test samples along that axis. Still, that's quite a lot of effort for a unit test :)
Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.
Quote:Original post by sphen_lee
The reason for compressing the results is to test entropy: the amount of randomness in the data. If the data is predictable it will compress easily and you will have a high compression ratio, truly random data will compress very badly, giving a low compression ratio.

Saving the results to a file is, in this case, not so you can load them up again, only as a way to measure how random the data is.


Ah, now I understand.

If it had any patterns, repeating sequences, common numbers or such, resulting from bad entropy in the random input, a good compressor would be able to find those.

Thanks for clearing that up!
Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.
I'd be tempted to break it down into two problems and test them individually - the first half is a regular random number generator, generating numbers between your random min and max values. That should be easyish to test by generating a whole load and checking the distribution is normal.

The second half is converting from a range (or two) into a uniform distribution over a shape. For something like a rectangle this is trivial, as you get something like:

Point MapToRect(float r1, float r2)
{
// r1 and r2 are in the range [0, 1)
return Point(r1*width, r2*height);
}

Since this is deterministic, you can unit test this with known inputs and outputs. For more complex shapes you can either use a single range factor and map that linearly over the whole shape, or use multiple input ranges and use them to generate a uniform distribution (a circle for example might use one for an angle and one for a distance from the center, where the distance would have to be mapped non-linearly to get a uniform distribution).

This topic is closed to new replies.

Advertisement