Probability calculation

Started by
4 comments, last by kirkd 12 years, 8 months ago
I'm working on a probability calculation problem and just cannot get the computed probability to match an empirical observation. I hope I can describe it, below, and no, it is not homework. The problem is very general, but here's a specific description.

Let's say I choose 10 numbers at random between 1 and 50 with replacement. What is the probability that the number 13 will occur 3 times in that sequence of 10?

I've approached it this way:

For 3 of the selections, I want a specific number, so that gives me (1/50)^3.
The other 7 selections, can be anything except my specific number (49/50)^7.
I don't care what the order of occurrence is, so I could have 10 Choose 3 different ways to get the same number 3 times in that series of 7, or combin(10,3)

Putting it all together, I should have (1/50)^3 * (49/50)^7 * combinations(10 choose 3)

This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??
Advertisement

I'm working on a probability calculation problem and just cannot get the computed probability to match an empirical observation. I hope I can describe it, below, and no, it is not homework. The problem is very general, but here's a specific description.

Let's say I choose 10 numbers at random between 1 and 50 with replacement. What is the probability that the number 13 will occur 3 times in that sequence of 10?

I've approached it this way:

For 3 of the selections, I want a specific number, so that gives me (1/50)^3.
The other 7 selections, can be anything except my specific number (49/50)^7.
I don't care what the order of occurrence is, so I could have 10 Choose 3 different ways to get the same number 3 times in that series of 7, or combin(10,3)

Putting it all together, I should have (1/50)^3 * (49/50)^7 * combinations(10 choose 3)

This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??

the formula should be:

P= nCx * p^x * q^(n-x)

Where p is the probability of getting 13 and q is the probability of not getting 13, x is the number of 13s you want and n the number of "dice"

thus P = 10C3 * 1/50^3 * 49/50^7

10C3 should be 10! / (3!*7!) or :

3628800 / (6*5040) = 120,

so 120 * (1/50)^3 * (49/50)^7

120 * 0,000008 * 0,86812553324672 = 0,0008334005119168512 or 0.08334%

Given the fairly low probability (around 8 tries in ten thousand will give you exactly 3x13) i have a hard time seeing how it can overestimate anything :D

How are you generating your random numbers ?
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??


I think your simulation is probably wrong.

#include <iostream>
#include <cstdlib>
#include <cmath>
#include <ctime>

int rand_50() {
static const int M = 50*(RAND_MAX/50);
int r;
do {
r = std::rand();
} while (r >= M);
return r % 50;
}

int main() {
std::srand(time(0));

int m=0;

for (int n = 1; ; ++n) {
int k = 0;
for (int j = 0; j < 10; ++j)
k += (rand_50() == 13);
m += (k==3);
if ((n&0xfffff) == 0)
std::cout << (double(m)/n) << '\n';
}
}
SimonForsman - it looks like we have exactly the same formula. As for the overestimate, it is consistently over the observed when I do 16,000 simulations.

Alvaro - I'm guessing you're correct. Maybe the RNG I'm using isn't behaving as expected?

Thanks for verifying that I'm not crazy! I _can_ do probability problems. 8^)
With 16,000 simulations you expect to see 13.33 hits. How many do you see? And how do you define "consistently"? (i.e., how many times did you run your 16,000 simulations to decide that there was a problem?)
On average, I see about 10 hits in 16,000. I ran this simulation a number of times - maybe 30 or so - and the observed frequency bounced around a bit, but it stayed very close around .0002 to .0004. The computed probability is .0008, hence my concern that the calculation was over-estimating the actual.

What I see now using your code is that it takes about 4-5 million runs to really see the observed frequencies settle down to .00083. I clearly didn't have enough samples.

-Kirk

This topic is closed to new replies.

Advertisement