Archived

This topic is now archived and is closed to further replies.

Randomness in games

This topic is 5780 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys, I am working on a game using some dice, but the Rand function, even when seeded with Srand, just doesn''t seem random enough. I have heard of a better function to do the random number generation, but nobody I know, knows what it is. Can anyone help me?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
as far as i know the standard rand() function should provide a good distribution. though you can easily implement your own using either "linear" or "additive congruence", or try a search for "mersenne twister", which provides very good random results at good speed.

Share this post


Link to post
Share on other sites
rand() isn't that bad. Some ideas on how to improve it:

result = rand() * rand() * rand() + rand() (this will wrap around on 32bit, but will be slow)

Or use a hashing table:

  
static short HashTab[1024];

init()
{
for(i=0;i<1024;i++) HashTab[i] = rand();
}

short random()
{
int p = rand()&1023; // get random table index

int r = HashTab[p]; // result value

HashTab[p] = rand(); // update table entry

return( r );
}


the hashing thing works rather well for me. And there are lots of more complex hashing functions (for eg. encryption), but for a simple game rand(), this simple one should be enough.

[Edit: forum tags don't like me...]


Edited by - Yann L on February 19, 2002 11:58:50 AM

Share this post


Link to post
Share on other sites
I think that

result = rand() * rand() * rand() + rand();

is a complete waste of cpu cycles beause if only one of the first three rand() calls equals to zero the result will be the same as result = rand(); but it will have taken 4 times as many cycles...

------------------------------------------------------------
"To a computer, chaos is just another kind of order."

Share this post


Link to post
Share on other sites
quote:

is a complete waste of cpu cycles beause if only one of the first three rand() calls equals to zero the result will be the same as result = rand(); but it will have taken 4 times as many cycles...



Well, it''s an accepted fact in digital signal processing, that if you combine (multiply, add, whatever) two unrelated noise signals, the result will be even more random than before. So it''s definitely not a waste of cycles, if you''re after good randomness. It''s a not so good solution, because to be optimal, the noise sources would need to be unrelated, and that is definitely not the case. But it''s simple and quick to implement. I have the source to a blazing fast hashing PRNG, with a distribution that makes it suitable for very strong encryption. This is a totally different league, of course, but it''s very complex to implement, and takes a lot of RAM (for tables). And the initialization takes up to 30 seconds, because it uses harddisk access timing as noise source to initialize the table pointers. It''s always a question of want your needs are. For a simple dice game, it''s a definitely a bit overkill... I think my hashing function above would be suitable for that kind of task.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Why does this question come up so often? What are you trying to do that requires something so random that, at the moment, the user can tell it isn''t?

What evidence have you got that your random dice throws are not random enough?

Share this post


Link to post
Share on other sites
quote:
result = rand() * rand() * rand() + rand();



Hmmmm. Let''s assume rand() returns a value between 0 and 1.

Rand()
min = 0
max = 1
average = 0.5

Rand()*rand()*rand()+rand()
min = 0
max = 2
average = 0.5*0.5*0.5 + 0.5 = 0.625


So this completely fcks up the distribution.

- seb

Share this post


Link to post
Share on other sites
I faced a problem like this with a roulette game I was working on a while back. I had gone to a casino and written down the results of some 200 spins, then went home and compared these results to a simple random number series generated by rand(). The difference between the two sets was interesting. The distribution was roughly the same, but the casino set displayed certain patterns that the generated series did not. Well, perhaps pattern is not the right word. What I mean is that some stats, like the average number of odd numbers in a row for example, were very different. As I wanted to accuratly simulate the sort of odds that you get in a casino, I needed to find something better than rand().
What I came up with did the job very nicely. At the start of the program I created an array with 37 rows and 100 columns. 37 being the number of possible outcomes of a single spin. 100 was chosen arbitrarily, it could have been anything really. Each column was filled with the numbers 0-36, and then shuffled using something like a bubble sort in reverse. Then later, when I wanted to simulate a spin, I used two calls to rand(), one to choose a column, the other to choose a row, and returned the value found at that position of the array.
I don''t know the mathematical basis for how it worked any better than a simple rand(), but it gave results very similar to the data from the casino.

Share this post


Link to post
Share on other sites
quote:

Hmmmm. Let''s assume rand() returns a value between 0 and 1.
(snip)
So this completely fcks up the distribution.



That is mathemtically incorrect, since rand() is in the 0-32767 range, *and* (most important), the number will wrap over at 32bit limits. That''s the entire trick getting it more random. You essentially need a 46 bit number to represent the result, but you''ll only use the lowest 32 bits in each calculation. It will *not* influence the distribution in any way. The only effect will be, that the result is more random.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by Plasmadog
I faced a problem like this with a roulette game I was working on a while back. I had gone to a casino and written down the results of some 200 spins, then went home and compared these results to a simple random number series generated by rand(). The difference between the two sets was interesting. The distribution was roughly the same, but the casino set displayed certain patterns that the generated series did not. Well, perhaps pattern is not the right word. What I mean is that some stats, like the average number of odd numbers in a row for example, were very different. As I wanted to accuratly simulate the sort of odds that you get in a casino, I needed to find something better than rand().
What I came up with did the job very nicely. At the start of the program I created an array with 37 rows and 100 columns. 37 being the number of possible outcomes of a single spin. 100 was chosen arbitrarily, it could have been anything really. Each column was filled with the numbers 0-36, and then shuffled using something like a bubble sort in reverse. Then later, when I wanted to simulate a spin, I used two calls to rand(), one to choose a column, the other to choose a row, and returned the value found at that position of the array.
I don''t know the mathematical basis for how it worked any better than a simple rand(), but it gave results very similar to the data from the casino.


How many roulette wheels did you try? I don''t think 200 spins is enough to be able to see patterns myself. Surely the basis of the game is that each number is equally likely. I might be able to under this kind of generator for a coin flipping game (not a fun game, I know, but they normally have a bias) but not for this.

I mean, your not going to get someone playing a game and have the user saying "this game is cheating, I''ve spun x many times I that common pattern has not appeared" are you?

Share this post


Link to post
Share on other sites
I''m pretty sure that multiplicative formula produces a skewed result, but I can''t prove it.

By the way, the range varies from implementation to implementation. You can''t guarantee it''s from 0 to 32767. You should use RAND_MAX (or the C++ equivalent) in your calculations.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost ]

Share this post


Link to post
Share on other sites
> I''m pretty sure that multiplicative formula produces a skewed result, but I can''t prove it.

Well, it''s far from optimal (esp. in terms of performance), but it''s usable. The distribution, at least, is not ''skewed''. But it might also depend on the actual rand() implementation in the c/c++ lib. Though I would stay far away from such tricks, if the ''randomness'' of a PRNG is crucial, eg. for a cipher.

> By the way, the range varies from implementation to implementation. You can''t guarantee it''s from 0 to 32767. You should use RAND_MAX (or the C++ equivalent) in your calculations.

Yes, that''s correct. But it''s minimum 15bit. That''s why 3 multiplies, if you want a 32bit result, so that the wrap-around is almost forced. So RAND_MAX doesn''t have any importance, we don''t care about the exact range, since we actually want it to overflow. The formula is even better, if you want less bits in your result, eg. 16bit.

Share this post


Link to post
Share on other sites
200 spins was enough to see the sort of patterns I was interested in. When I broke down the data into evens/odds, blacks/reds, low/high, column, row, etc, and then looked at the length of runs of any of these properties, the two sets of data looked very different. As I said, the distibution was roughly the same, each number came up roughly the same number of times, but there were differences in the way each number followed another. Perhaps this is evidence that a Roulette wheel is not truly random. I can''t explain it better than that, I suggest you try it and see what I mean.
As for the reality of it, I agree that most human players are not likely to see any difference, but I was not working on a game at the time. I had been asked to write a simulation that could be used to test betting strategies. As such, I needed to accurately reproduce the sort of statistics that would be seen in the casino. And, as it happened, the guy that asked me to do this was a frequent casino-goer, and actually COULD (in a blind test) tell which set of data was generated by a series of rand() calls. A second test, using the array method, stumped him.
Like I said, I don''t know how it worked, but it did.

Share this post


Link to post
Share on other sites
If you need high quality random numbers, your best bet is to download a good random number generator and use it instead of the rand() C function. The rand() function is written to be fast and small, not to produce excellent randomness.

I wrote a program to test the bit patterns generated by various RNGs, and the Mersenne Twister in particular had excellent bit distribution and randomness (nearly as good as CryptoAPI-generated random numbers), while also being somewhat faster than the CRT rand(). Actually, all of the 3rd party RNGs I tried had better bit distribution than rand().

If you need cryptographically random numbers on Windows platforms, go with the CryptoAPI''s CryptGenRandom() function.

See Bruce Schneier''s excellent _Applied Cryptography_, 2nd Edition, section 2.8 (pages 44-46) to see the criteria my program used for evaluating RNG bit distribution.

--
Eric

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
A short search with Google would have solved the problem easily (with keyword ''Mersenne Twister'' it was the first link, actually). Anyways:

http://www.math.keio.ac.jp/~matumoto/emt.html

Share this post


Link to post
Share on other sites
one would think that rand () << 15 + rand () would give you a 30-bit uniform distribution. I'm 99% sure that multiplying two or more random variables of a uniform distribution about the same mean will not give you a larger uniform distribution. If memory serves, the new random variable will become Gaussian (or was that for sums?), but I'd have to dig up my random processes book (which is at home) to tell. Everything becomes Gaussian in the long run. =) I'll try to give you a better answer later tonight.

edit - get my terminology correct. haven't slept in 30 hrs.

Edited by - Stoffel on February 19, 2002 9:41:45 PM

Share this post


Link to post
Share on other sites
> one would think that rand () << 15 + rand () would give you a 30-bit uniform distribution

Good question. But you oversee the wrap around. Is (rand()*rand()) & 0x7FFF uniform ? I'm not sure, but I think so. When speaking in terms of DSP, you are effectively modulating a noise signal by another and quantizing the result. IMO, it really depends on the internals of the rand() implementation. If it is gaussian in the first place, then the whole idea won't work.

Edited by - Yann L on February 19, 2002 9:57:26 PM

Share this post


Link to post
Share on other sites
quote:
Original post by Stoffel
one would think that rand () << 15 + rand () would give you a 30-bit uniform distribution. I''m 99% sure that multiplying two or more random variables of a uniform distribution about the same mean will not give you a larger uniform distribution. If memory serves, the new random variable will become Gaussian (or was that for sums?), but I''d have to dig up my random processes book (which is at home) to tell. Everything becomes Gaussian in the long run. =) I''ll try to give you a better answer later tonight.

edit - get my terminology correct. haven''t slept in 30 hrs.

Edited by - Stoffel on February 19, 2002 9:41:45 PM


Multiplying two random variables does indeed give a gaussian distribution.

Adding random variables also skews the distribution so that it is psuedo-gaussian.

This is the theory behind D&D die rolling. When you roll a character (is it 3d6? I forget), the 6-sided die is rolled 3 times. this gives a minimum of 3, and a maximum of 18, but these two values are the least frequent. It is FAR more likely that you will get a result at the mean value, 10.

Also known as the Bell Curve.

Share this post


Link to post
Share on other sites
OK, I started working it all out and got stuck. I''ll take you as far as I got.

Some nomenclature:
fx(x) = probability density function of random variable y (for example).

Definitions:
It is known that for a random variable X, the function Y = a*X (where a is a scalar) has the pdf:
fy(y) = 1/abs(a) * fx(y/a)
..where abs = absolute value.

The conditional pdf of Z is the pdf of Z given Y, or fz(z|y). The pdf of Z from the conditional is:
fz(z) = integal (-inf to inf) of { fz(z|y'')fy(y'')dy'', where y'' is the placeholder variable of integration.

And finally, two variables are independent if their joint probability (fx,y (x,y)) is equal to the product of their individual probabilities (= fx(x)*fy(y)).

The random number generator in ANSI C generates a pseduo-random sequence that will hit every value in the range 0 to 0x7FFF before repeating. Asymptotically, this is a uniform random variable from 0 to R (where R is 0x7FFF). The pdf of a uniform random variable is fx = { 1/R for 0<=x<=R, 0 otherwise }.

Let''s evaluate the pdf of rand () * rand ():
Z = X * Y, where X and Y are uniform random variables, both with range R. Assume Y takes some value Y = y; in this case, the conditional probability density of z is:
fz(z|y) = 1/(abs(y)) * fx(z/y|y)
..by definition of pdf for Y = a*X.

By the definition of the conditional pdf, we can find fz(z):
fz(z) = integral (-inf to inf) of { fz(z|y'') * fy(y'') * dy'' }
= integral (0 to inf) { 1/y'' * fx(z/y''|y'') * fy(y'') * dy'' }

(note that I dropped abs and changed the range of integration because we''re evaluating unsigned numbers)
fx(x''|y'') * fy(y'') is the definition of the joint pdf, therefore this is:
fz(z) = integral (0 to inf) { 1/y'' * fx,y(z/y'',y'') * dy'' }
But remember that our X and Y are independent, therefore the joint pdf is equal to the product of the individual pdfs. The uniform pdf is:
1/R for 0 <= x <= R,
0 otherwise.
In other words, a unit block of height 1/R from 0 <= x <= R.
Using the unit step function u(x), we can describe the pdf as:
fz(z) = 1/R^2 integral (0 to inf)
{ 1/y'' *
(u(z/y'') - u(z/y'' - R)) *
(u(y'') - u(y'' - R)) *
dy'' }

OK, at this point I gave up. I don''t want to use Laplace analysis to perform this integration. So, I did the next best thing; I cheated with Matlab:
X = rand(10000,1) * 10; % my range is 10
Y = rand(10000,1) * 10;
Z = (X .* Y); % element-by-element multiplication
hist (Z,100); % plot a histogram of 100 bins

Histograms approximate the pdf for large sample sizes (which 10000 is). This is basically cheating.

The histogram clearly shows that the probability distribution is a decaying exponential, with values around 0 being the most probable, and values around 100 (R^2) the least probable, with an extremely concave slope.

So, what would rand * rand * rand be? It would be an even more concave slope, i.e. values around 0 would be extremely more likely, and values around R^3 (not R^2 now) extremely less likely.

Well, what about the + rand term? I actually know that one. For Z = sum of any two independent random variables X and Y, the pdf of Z is the convolution of the two pdfs of the random variables. The first pdf is an exponential decay with a very high exponent, and the second pdf is a rectangle (uniform distribution). Any good electrical engineer knows this convolution: it''s an exponential increase until the halfway point, and then an exponential decreases till the end of the full pdf. In other words, it looks like a capacitor charge/discharge cycle.

This doesn''t even take into account any overflow you might be getting; since the range of rand * rand * rand + rand is 0x7FFF^3 + 0x7FFF, I''m sure you''re overflowing 32 bits, so goodness only knows what aliasing does. Suffice to say, you don''t just get a larger uniform distribution.

Share this post


Link to post
Share on other sites