How to randomly select an item, taking probability into account

Started by
16 comments, last by LilBudyWizer 18 years, 7 months ago
Say I have 5 items.

ITEM      PROBABILITY
 A            50/100
 B            25/100
 C            10/100
 D            10/100
 E             5/100    
What I'm doing right now to build an array of size 100, and filling it with 50 A's, 25 B's, 10 C's, 10 D's, 5 E's. Then I generate a random number from 0 to 99, and I pick the value from the array based on that random number. The most repeated element (the one with higher probability) is selected more often. Is there some other (less expensive) way of selecting an element based on it's probability of being selected? Thank you in advance.
[size="2"]I like the Walrus best.
Advertisement
I usually just repeatedly subtract percentages from a random number, until it ends up less than 0. As in:

int rand = rand() % 100;int picked;for (picked=0; picked < num_items; picked++){  rand -= prob[picked];  if (rand < 0) break;}
Sure...in this case, generate a random integer between 1 and 100. If it's 1-50, choose A; if it's 51-75, choose B; etc. and so forth.

This of course doesn't hold up under systems where your probabilities don't all add up to 100%, but it would work fine in the scenario you suggest.

Cheers,
Twilight Dragon
{[JohnE, Chief Architect and Senior Programmer, Twilight Dragon Media{[+++{GCC/MinGW}+++{Code::Blocks IDE}+++{wxWidgets Cross-Platform Native UI Framework}+++
I'd say this method is fast when used in a tight loop, but has the drawback of costing some memory.

I'd do something like this:

int a = rand() / ( RAND_MAX >> 8 );

if( a < 128 )
{
// 50% probability
}
else if( a < 192 )
{
// 25% probability
}
else
{
// 25% probability
}
What about just picking a number from 1 - 100 and depending on the range it falls in, which are aligned to the probabilities, then you have your answer?

Ex:
A : 0-50
B : 50-75
C : 75-85
D : 85-95
E : 95-100

Pseudo Code:
Var x = rand[0-100];
// x == 39

The selected value is then A.

Of course my number ranges aren't right, but you get the idea [wink]. Would that work?
Just get a random number between 0 and 100 then set up an if/else:
if( random < 5 )    //do Eelse if( random < 15 )    //do Delse if( random < 25 )    //do Celse if( random < 50 )    //do Belse    //do A
I got beat bad...
I'm sorry, the example I posted isn't completely accurate.

The list of probabilities I have isn't sorted, it changes from time to time and it's A LOT bigger than that.

PS: and the probability is in float (1.7542)
[size="2"]I like the Walrus best.
Then the method of pinacolada (the first answer) is what you need. You only need the list of objects and the prob of each one. Then examine your list until the random <=0.

If your prob are a float, compute a number high enoug and compute a float. Considering the max value you get from random() is RAND_MAX (or is it MAX_RAND?) which value is 65535, then your float is:
p = (float)random()/65535.0f;

That give you a 0-1 value... multiply it by your delta in order to get a number in 0-100 range:
p = 100 * (float)random()/65535.0f;

Now, a variant may be to use an accumulated prob. So your list:
ITEM PROBABILITY
A 50/100
B 25/100
C 10/100
D 10/100
E 5/100

Becomes:
A 50/100
B 75/100
C 85/100
D 95/100
E 100/100

Then test your value until it is greater then the accumulated probability.
for(int i=0; i<maxID; i++)
if(list>p) break;

i holds your index.

This may perform better as you dont need to do any operation except a comparison which maps to ASM: JG.

Now if you want faster... go for assembly.

Luck!
Guimo



The way I'd go about solving this problem depends heavily on how often the probabilities are changing, how many there are, and what sort of pre-processing you can do before hand.

If there's not a lot of different categories, or if there's a few that make the bulk of all possible choices (for example, if only one of three choices will be picked 95% of the time), then the simple approach (like pinacolada's code) will be adequate (and that's what I've used when faced with a similar problem). It would be best if you could sort your choices in this case, so the most probable ones are first in the chain.

Other approaches would be to precalc. a look up table or a tree, but this will depend on exactly what you are doing. Can you give us some more information as to why the simple approach won't be adequate?

This topic is closed to new replies.

Advertisement