Whats the probability (of getting an answer)?

Started by
10 comments, last by eq 16 years, 11 months ago
Assume that we have 100 balls in a bucket, they are marked with the letters A-E with the following counts: A = 30 B = 25 C = 20 D = 15 E = 10 If we randomly pick a ball, we'd get ball A 30 times out of a hundred etc... very basic stuff. Now I need to pick 5 different balls, i.e: Pick a ball, got an A, write A down, put the ball back. Pick a new ball, got an B, write B down, put the ball back. Pick a new ball, got an A, reject since we'd alreay picked an A, put the ball back. Pick a new ball, got an C, write C down, put the ball back. And so on.. Now to the problem, how do we calculate the probability that the A is one of the two first letters written down? I think it is something like this: A[0] = 30/100 A[0-1] = A[0] + (25/100)*(30/75) + (20/100)*(30/80) + (15/100)*(30/85) + (10/100)*(30/90) Is this correct? How do we extend it to A beeing one of the three first lettes, and so on? A[0-2] = A[0-1] + (25/100)*(20/75)*(30/55) + (25/100)*(15/75)*(30/60) + (25/100)*(10/75)*(30/65) + (20/100)*(25/80)*(30/55) + (20/100)*(15/80)*(30/65) + (20/100)*(10/80)*(30/70) + (15/100)*(25/85)*(30/60) + (15/100)*(20/80)*(30/65) + (15/100)*(10/85)*(30/75) + (10/100)*(25/90)*(30/65) + (10/100)*(20/90)*(30/70) + (10/100)*(15/90)*(30/75) Is the above correct? How could you implement this in an efficient way? I mean, saying "give me the probability of ball X appearing in the first 10 picks out of 100 different balls", would require very much work, or?
Advertisement
Quote:Pick a new ball, got an A, reject since we'd alreay picked an A, put the ball back.

I'm finding this comment a little confusing... do you mean that if we draw A a second time, we would act as if it hadn't happened and repeat the same step until we get a non-A match, or would we count it as not A and move on?
Quote:do you mean that if we draw A a second time, we would act as if it hadn't happened and repeat the same step until we get a non-A match, or would we count it as not A and move on?

Repeat the procedure until you doesn't hit a ball with a mark that we've already written down.
I.e
loop:  pick random ball  if ball marking has been written down     put ball back     goto loop  write ball marking down  put ball back  if not all ball markings have been picked     goto loop

Since we're all adults here, I post a piece of working C++ code:
	std::vector<char> balls;	balls.insert(balls.end(), 30, 'A');	balls.insert(balls.end(), 25, 'B');	balls.insert(balls.end(), 20, 'C');	balls.insert(balls.end(), 15, 'D');	balls.insert(balls.end(), 10, 'E');	std::set<char> found;	std::string order;	do {		int v = rand() % m_balls.size(); // Please no comments about this beeing biased, this serves as just an example!		char ball = balls[v];		if (found.find(ball) != found.end())			continue;		found.insert(ball);		order += ball;   	}while (found.size() < 5);	std::cout << order << std::endl;

This is how the "picking" is done, now how to calculate the probabilities.
double prob(const int amongstTheFirstN, const char ball, const std::vector<int>& ballCounts); // How to make an efficient (correct) function....double d3 = prob(3, 'D', counts); // returns the probability of ball D beeing amongst the three first picked.double a2 = prob(2, 'A', counts); // returns the probability of ball A beeing amongst the two first picked....
You have a 30% propability of A being the first letter you write down.
If the first letter is not A, which would happen 70% of the time, you have 30% of an A ball being drawn the second time, so wouldn't the percentage you ask for be 30% + 70% * 30% == 51%?

-EDIT: The possibility of A being not one of the 1st 2 is 100 % - (30% + 70% * 30%) == 49%, so the possibility of it being within one of the three would be
51% + 49% * 30% == 0.66 etc. Not quite sure this... at least it has a bit of logic in it :P
Quote:If the first letter is not A, which would happen 70% of the time, you have 30% of an A ball being drawn the second time, so wouldn't the percentage you ask for be 30% + 70% * 30%?

That would be the case if you didn't have the constraint that a previously seen ball would be rejected and a new ball wasn't drawn.
I.e
First ball, chance of getting an A as the second:
B 30/(100-25) = 30/75 (Edit: Since there are now only 75 valid balls left, 25 balls was invalidated when B was picked, due to the constraint mentioned above)
C 30/(100-20) = 30/80
D 30/(100-15) = 30/85
E 30/(100-10) = 30/90

The second probability depends on what was drawn first.
I'm fairly sure that the equations I presented in the OP is valid, I did some statistical testing (i.e ran code millions of times and they seem to be ok).
The reasons for asking anyway are:
A) I get a small error (difference between theory and simulation) that seems to grow when I increase the number of letters, i.e one, two, three and so on.
This might be rounding errors.
B) A smarter way of determening this since there's a lot of loops going on, i.e the complexity seems to be O(N^p) where N is the number of different letters and p is the number of locations. (Edit: to determine the probability for all N's).
In the example above to determine all probabilities for a letter to be among the three first it's: N=5, p=3


From what I understand of the problem:

subsequent ball picking is independant of previous ball picks because the ball is always replaced.
you want to know the chance of a particular ball occuring in n picks.

The chance of A is 30% what is the chance of at least one A in n picks:

Lets turn that around to: what is the chance of no A in n picks.

noA in one pick is: 100% -30% = 70%
noA in two picks is: 70% in pick one * 70% in pick two
etc.. so

chance of A in n picks is: p =1.0 -( 0.7^n )


That's what i erroneously thought at the beginning too but as ep explained to me after my reply, if you pick for example B as the first ball, then , while you 're picking balls to write down your second letter, you won't count B as a pick , you just reject it, so its like having only A, C, D, E balls in the box (so its ike having 30 A balls in a box with 100 - 25 balls), so eps original formulas are correct. What he is asking for now is a more efficient way to calculate this, if it exists.

edit: eq have you considered using a LUT so you can have precalculated terms for your complex equations in the OP (and save many expensive divisions too)?

[Edited by - D_Tr on May 16, 2007 10:35:24 AM]
To put this in a different notation, let n be a m-sized set of weights such that Σ(i=0..m) ni=100. The set is ordered in the order of appearance. The probability of any particular set can be expressed as:

P(n)=Π(i=1..m-1) [ni/(100-Σ(j=0..i-1)nj)]

The possibility of getting A in the first number is P({A})=.3

The possibility of getting A in the second number is the sum of all P({x,A}) where x is in {B,C,D,E} times 1-P({A}).

You can continue to iterate through a function P(a,b)=ab-a2b, where a is some probability of an event, and b is the probability of an event that is dependent on a not happening.

Lastly, it's only worth exploring what the first four possibilities are, since the last result has only one possibility in every scenario.
We''re sorry, but you don''t have the clearance to read this post. Please exit your browser at this time. (Code 23)

This topic is closed to new replies.

Advertisement