random selection of array elements

Recommended Posts

Hello,

I have a bit of programming dilemma which I cannot seem to solve my self.

What if I have an array of numbers and I want to randomly select entry from the array so that the higher the number in the array the more likely it gets picked?

For example array like this:

[1000,800,400,200]

Should have highest change of selecting the highest number (1000) and lowest change for the lowest number.

 

How to calculate this?

 

thanks!

 

Share this post


Link to post
Share on other sites
Posted (edited)

A memory intensive way to go about it is:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

#include <cstdlib>

int main(void)
{
    vector<long unsigned int> numbers;
    
    for(size_t i = 0; i < 1000; i++)
        numbers.push_back(1000);
    
    for(size_t i = 0; i < 800; i++)
        numbers.push_back(800);
    
    for(size_t i = 0; i < 400; i++)
        numbers.push_back(400);
    
    for(size_t i = 0; i < 200; i++)
        numbers.push_back(200);
    
    if(numbers.size() > RAND_MAX)
    {
        cout << "Need a better PRNG" << endl;
        return 0;
    }
    
    long unsigned int pseudo_random_number = numbers[rand() % numbers.size()];
    
    cout << pseudo_random_number << endl;
    
    return 1;
}
Edited by sjhalayka

Share this post


Link to post
Share on other sites

sjhalayka: You don't need to shuffle the array if you're picking from it randomly.

The main information missing from the question is a rigorous definition of the desired distribution. How much more likely should one number be picked over another, and what decides that?

Usually the simple approach here is to have an explicitly-weighted random selection. The standard algorithm for this is roulette wheel/fitness proportionate selection. (The Java 'rouletteSelect' function on that page is probably the most sensible implementation.)

Share this post


Link to post
Share on other sites

It's still not clear - are the numbers you're selecting also the exact proportions that you want them to be picked in? Or are the proportions different to the numbers?

Share this post


Link to post
Share on other sites
2 minutes ago, Kylotan said:

It's still not clear - are the numbers you're selecting also the exact proportions that you want them to be picked in? Or are the proportions different to the numbers?

proportions are different to the numbers. all I need is select just one number from the array as long as it's random.

Share this post


Link to post
Share on other sites

Right, so what I'm saying (and what Lactose said) is that you need to specify those proportions somehow. 'Higher' isn't enough information - how much higher, in each case? The roulette wheel selection method I linked to allows you to choose those proportions explicitly.

Share this post


Link to post
Share on other sites
Posted (edited)

Ok here's the deal, I have been working on code to do this but it doesn't seem right, I'm unsure if I got the math right.

I made sample program where there's no arrays but I'm trying to calculate the probability for each number. 100 / 100 for example should have 50% probability.

 

Or in another example max number being 100 the number 10 should have 10% change of being picked.

(max number just means the highest number found in array)

 

here's the program:

 

// Example program
#include <iostream>
#include <string>

using namespace std;

int maxNum = 100;

int main()
{
 for(float i = 0; i <= maxNum; i += 10)
 {
     float change = i / ((float)maxNum) / 2.0;
     
     cout << (int)i << " / " << maxNum << " = " << change << " " << ( int(100 * change)) << "% " << endl;
 }
}

 

Which outputs:

 

0 / 100 = 0 0% 
10 / 100 = 0.05 5% 
20 / 100 = 0.1 10% 
30 / 100 = 0.15 15% 
40 / 100 = 0.2 20% 
50 / 100 = 0.25 25% 
60 / 100 = 0.3 30% 
70 / 100 = 0.35 35% 
80 / 100 = 0.4 40% 
90 / 100 = 0.45 45% 
100 / 100 = 0.5 50% 

 

where the final (100 / 100) seems to be correct result but the five first calculations doesn't seem right. 10 / 100 should be 10% but I'm not sure if I got that part right.

 

any ideas how to make this work?

 

Edited by heh65532

Share this post


Link to post
Share on other sites

If you're using C++, the standard library provides just such a distribution.

 
#include <iostream>
#include <random>

int
main()
{
  std::random_device           rd;
  std::minstd_rand             gen(rd());
  std::discrete_distribution<> d({1000,800,400,200});
  
  for(int n=0; n<10; ++n)
  {
    std::cout << d(gen) << "\n";
  }
}

 

Share this post


Link to post
Share on other sites
Posted (edited)
3 minutes ago, Bregma said:

If you're using C++, the standard library provides just such a distribution.



 

 

 

Not using C++ just running tests with it. thanks though!

Edited by heh65532

Share this post


Link to post
Share on other sites
10 minutes ago, heh65532 said:

I'm unsure if I got the math right.

I made sample program where there's no arrays but I'm trying to calculate the probability for each number. 100 / 100 for example should have 50% probability.

 

Or in another example max number being 100 the number 10 should have 10% change of being picked.

(max number just means the highest number found in array)

It's hard to see how these 2 examples represent the same mathematics.

In the first example, it's 50/50 because each number is equal. Another way to view it is that the total weighting of the numbers is 200, and each of those numbers is 100, so each has a 100/200 chance, i.e. 0.5, which equals 50%.

In the second example, we know there is a number 100, and a number 10 - but that's a total weight of at least 110. If those are the only 2 numbers, then you'd expect the 10 to be picked 9% of the time (i.e. 10/110). If there were a few other numbers, that proportion could go down.

So I think you need to come up with a more rigorous definition of your weighting formula.

Share this post


Link to post
Share on other sites
Posted (edited)
28 minutes ago, heh65532 said:

I made sample program where there's no arrays but I'm trying to calculate the probability for each number. 100 / 100 for example should have 50% probability.

Do you mean to say you have a set S of integers and you wish to choose some s in S such that the probability P(s) of s being chosen is \(P(s) = \frac{s}{\sum(S)}\) (ie. a discrete distribution just like in the C++ standard library)?

Or, from your example, do you mean to say the largest value in S should always have a probability of 0.5 and all remaining values should be proportional to that (which is going to give you painful math).

Edited by Bregma
wretched GameDev.net editor eats baby kittens for breakfast

Share this post


Link to post
Share on other sites
Posted (edited)

Actually I think the code I posted earlier works like supposed to.

Here's my final code with array (I'll let the code show what I'm trying to archive):

 

#include <iostream>
#include <string>

using namespace std;



int main()
{
 srand (time(NULL));
    
    // Array already sorted
 int array[] = {200,300,700,1000,1500};
 int asize = 5;
 
 int maxNum = array[asize - 1];
    
 bool picked = false;
    

 for(int i = 0; i < asize; i++)
 {
     float change = (float)array[i] / ((float)maxNum) / 2.0;
     int r = rand() % 100;
     
     int ichange = (int(change * 100));
         
     cout << " change: " << ichange << "% random: " << r << "%" << endl;
     
     if( r <= ichange )
     {
         cout << " Picked random array element: " << array[i] << " with change of " << ichange << "%" << endl;
         picked = true;
         break;
     }
     
 }
 
 
 if(!picked) 
 {
  cout << "Max picked: " << array[asize - 1] << endl;
 }
 
}

 

Not 100% sure if it really works mathematically correct so please let me know what you think.

Edited by heh65532
Code updated

Share this post


Link to post
Share on other sites

Whether it is mathematically correct depends on whether you've defined what "correct" is. If you're using it for a game (presumably), then if it "feels good" then that is probably enough.

That said, the code you've posted may not very general, I suspect, as it depends on the largest number to calculate the "weight", so I imagine you might be less happy with the results if the largest number were either extremely large or not very large compared with the others. Generating the random numbers inside the inner loop could result in surprising distributions, particularly as the array size increases. That said, I cannot really visualise the resulting distribution, I could be wrong, these are just intuitive guesses.

Without knowing your use case, from your description you probably do want the "Fitness proportionate selection" algorithm.

Share this post


Link to post
Share on other sites
Posted (edited)

If you want to produce the numbers 0, 1, 2, 3 with probabilities proportional to 1000, 800, 600, 400 respectively, make an array with the partial sums of the non-normalized probabilities, like this:

a[4] = {0, 1000, 1800, 2400};

total_sum = 2800

Now, pick a random number between 0 and 2800 (including 0 and not including 2800) and find the highest index i such that a[i] is less or equal than your number. You can do this part with binary search, if you want the procedure to be fast even when using a large table.

 

Edited by alvaro
Remove accidental text style

Share this post


Link to post
Share on other sites

Thank you all for the replies.

@alvaro that's just what I needed, you approach is so much better than mine.

here's the resulting code, I think this is what you meant:

 

#include <iostream>
#include <string>

using namespace std;

int main()
{
 srand (time(NULL));
     
  int asize = 4;
  int array[asize] = {0, 1000, 2000, 3000};
  int total = 6000;
  
  int ra = rand() % total;
  
  for(int i = asize - 1; i >= 0; i--)
  {
      if( array[i] <= ra )
      {
          cout << "selected " << i << " --- " << array[i] << " <= " << ra << endl;
          break;
      }
  }
  
  cout << "finished" << endl;
}

 

Share this post


Link to post
Share on other sites
Posted (edited)

You should include <cstdlib> to have access to srand and rand, <ctime> for time, and you don't need anything from <string>. But other than that, your code seems OK to me.

There are a few issues with your use of pseudo-random numbers that would go away if you were to use C++11's <random> library, and then you may as well use std::discrete_distribution instead of rolling your own.

 

Edited by alvaro

Share this post


Link to post
Share on other sites

The code wrongly uses the relative frequencies in the array rather than the cumulative frequencies, so that it's possible to generate a random value that will never be less than or equal to any of the values in the array.

And the iteration should be forwards, not backwards. Currently you'd exit the loop on the first iteration every time (if the previous bug was fixed).

Once those are fixed, this would be a reasonable implementation of roulette wheel selection.

Share this post


Link to post
Share on other sites
6 minutes ago, Kylotan said:

The code wrongly uses the relative frequencies in the array rather than the cumulative frequencies, so that it's possible to generate a random value that will never be less than or equal to any of the values in the array.

And the iteration should be forwards, not backwards. Currently you'd exit the loop on the first iteration every time (if the previous bug was fixed).

Once those are fixed, this would be a reasonable implementation of roulette wheel selection.

 

The code uses relative frequencies (1000, 1000, 1000, 3000). It would be more clear if he actually wrote that and then some code that did the accumulation, or if he had stuck to the original example so we know for sure what he means.

The iteration seems fine to me.

 

Share this post


Link to post
Share on other sites

I ran to a problem with the latest code  I posted. While arrays like this work well:

int array[asize] = {0, 1000, 2000, 3000};

 

it doesn't work anymore if the values are closer to each other like this:

 

int array[asize] = {0, 1000, 2000, 2001};

The 2001 has only small change of being picked while it should have about same change of being picked than the 2000.

 

So not sure how to handle that.

Share this post


Link to post
Share on other sites
Posted (edited)

Someone please correct me if I'm wrong or if this is a terrible idea. It seemed to work in excel. Largest number gets 50% chance and each following number gets half the chance of the previous. The actual numbers in the array are not important. You would reach a limit to how many potential numbers can be selected quite quickly though.

Count the number of items in your list.
MaxRandValue = 2^arrayElementCount
ra = rand()%MaxRandValue
index = log10(ra)/log10(2)
return array[index]

 

Edited by kseh

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now