Sign in to follow this  

Probability calculation

This topic is 2322 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm working on a probability calculation problem and just cannot get the computed probability to match an empirical observation. I hope I can describe it, below, and no, it is not homework. The problem is very general, but here's a specific description.

Let's say I choose 10 numbers at random between 1 and 50 with replacement. What is the probability that the number 13 will occur 3 times in that sequence of 10?

I've approached it this way:

For 3 of the selections, I want a specific number, so that gives me (1/50)^3.
The other 7 selections, can be anything except my specific number (49/50)^7.
I don't care what the order of occurrence is, so I could have 10 Choose 3 different ways to get the same number 3 times in that series of 7, or combin(10,3)

Putting it all together, I should have (1/50)^3 * (49/50)^7 * combinations(10 choose 3)

This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??

Share this post


Link to post
Share on other sites
[quote name='kirkd' timestamp='1312930618' post='4846942']
I'm working on a probability calculation problem and just cannot get the computed probability to match an empirical observation. I hope I can describe it, below, and no, it is not homework. The problem is very general, but here's a specific description.

Let's say I choose 10 numbers at random between 1 and 50 with replacement. What is the probability that the number 13 will occur 3 times in that sequence of 10?

I've approached it this way:

For 3 of the selections, I want a specific number, so that gives me (1/50)^3.
The other 7 selections, can be anything except my specific number (49/50)^7.
I don't care what the order of occurrence is, so I could have 10 Choose 3 different ways to get the same number 3 times in that series of 7, or combin(10,3)

Putting it all together, I should have (1/50)^3 * (49/50)^7 * combinations(10 choose 3)

This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??
[/quote]
the formula should be:

P= nCx * p^x * q^(n-x)

Where p is the probability of getting 13 and q is the probability of not getting 13, x is the number of 13s you want and n the number of "dice"

thus P = 10C3 * 1/50^3 * 49/50^7

10C3 should be 10! / (3!*7!) or :

3628800 / (6*5040) = 120,

so 120 * (1/50)^3 * (49/50)^7

120 * 0,000008 * 0,86812553324672 = 0,0008334005119168512 or 0.08334%

Given the fairly low probability (around 8 tries in ten thousand will give you exactly 3x13) i have a hard time seeing how it can overestimate anything :D

How are you generating your random numbers ?

Share this post


Link to post
Share on other sites
[quote name='kirkd' timestamp='1312930618' post='4846942']
This always over-estimates the actual rate of occurrence that I see in a simulation of the problem. Where have I gone wrong??
[/quote]

I think your simulation is probably wrong.

[code]#include <iostream>
#include <cstdlib>
#include <cmath>
#include <ctime>

int rand_50() {
static const int M = 50*(RAND_MAX/50);
int r;
do {
r = std::rand();
} while (r >= M);
return r % 50;
}

int main() {
std::srand(time(0));

int m=0;

for (int n = 1; ; ++n) {
int k = 0;
for (int j = 0; j < 10; ++j)
k += (rand_50() == 13);
m += (k==3);
if ((n&0xfffff) == 0)
std::cout << (double(m)/n) << '\n';
}
}
[/code]

Share this post


Link to post
Share on other sites
SimonForsman - it looks like we have exactly the same formula. As for the overestimate, it is consistently over the observed when I do 16,000 simulations.

Alvaro - I'm guessing you're correct. Maybe the RNG I'm using isn't behaving as expected?

Thanks for verifying that I'm not crazy! I _can_ do probability problems. 8^)

Share this post


Link to post
Share on other sites
With 16,000 simulations you expect to see 13.33 hits. How many do you see? And how do you define "consistently"? (i.e., how many times did you run your 16,000 simulations to decide that there was a problem?)

Share this post


Link to post
Share on other sites
On average, I see about 10 hits in 16,000. I ran this simulation a number of times - maybe 30 or so - and the observed frequency bounced around a bit, but it stayed very close around .0002 to .0004. The computed probability is .0008, hence my concern that the calculation was over-estimating the actual.

What I see now using your code is that it takes about 4-5 million runs to really see the observed frequencies settle down to .00083. I clearly didn't have enough samples.

-Kirk

Share this post


Link to post
Share on other sites

This topic is 2322 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this