Sign in to follow this  

Unity Non-uniform random numbers

This topic is 4561 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is a programming challenge I proposed to my development team today, and we had fun with it, so I thought I'd present it to you. The challenge is to create a random number generation function, similar to rand() which will have a higher probability of returning low numbers than high numbers. The declaration for the function is
int nuRand(const int nUpper)
where nUpper is the upper boundary of the random numbers the function should return (non-inclusive) So calling "nuRand(100)" will return random numbers of the range 0 to 99. The rules:
  1. The function is to be written in C/C++ (no assembly)
  2. The code must be 100% yours.. no 3rd party libraries Edit: You may use the STL and CRT, so rand() and abs() are fine
    • No plagiarism please.
  3. The probability curve of the returned values may be anything you wish, it may even be linear, just so long as there is a much higher probability of getting a '0' returned than a '99'. edit: I'd like to see a nice linear distribution though [smile]
  4. Bonus points awarded for speed and memory usage.
  5. new: I don't have a linux box, so please make sure your functions will compile under MSVC6
  6. new: Please no anonymous posting! I will run your functions, but you won't get the leadership.
  7. new: Please place lengthy code exerpts in [ source ] [/ source] brackets. See the faq for more info.
If you post your functions here, I will test them when I get home from work, and update this post with the current leader(s) in the categories of execution speed and memory usage. Good luck.
Results: smart_idiot curently holds the speed record of 1196.1585ms in the speed test. The function posted by Mr. "Anonymous Poster" was actually faster (979.9690), but anonymous posters can't get the credit, I'm afraid. The curve was also VERY steep.. out of 10000000 function calls, 9970000 returned '0'. Wyrframe: 1484.7239ms (btw, your rand2() function will throw a div/0 error, so this is rand2_edited's time) Zahlman: 2073.2392ms The distribution from your function was nice and linear.. I like :) Conner McCloud, I wasn't able to actually get your function to return. It was getting stuck in an infinite loop somewhere. I'm not going to bother with testing memory usage.. They're all so close... I was afraid someone would do something like:
int nuRand(const int nUpper)
{
  int *aNumbers = new int[nUpper];
  for(int n=nUpper; n > 0; n--)
  {
    aNumbers[n-1] = rand()%n;
  }
  int nRet = aNumbers[rand()%nUpper];
  delete [] aNumbers;
  return nRet;
}



My test function: A preliminary loop of 10,000,000 function calls is performed to test the output of the function, which I then graph in excel to see the curve. After that, I do a number of 1000000-call iterations with various powers of two for the speed test.
int main(int argc, char* argv[])
{
   LARGE_INTEGER liFreq, liStart, liStop;
   double dTicks;
   QueryPerformanceFrequency(&liFreq);

   int aCounts[100];
   memset(aCounts, 0, sizeof(int)*100);

   int nTemp, n;
   for(n=0; n <10000000; n++)
   {
      nTemp = nuRand(100);
      if(nTemp >= 100 || nTemp < 0)
      {
         printf("Function returned a random # outside of parameters! (%d)\r\n", nTemp);
         break;
      }
      else
         aCounts[nTemp]++;
   }

   for(n=0; n < 100; n++)
   {
      printf("%3d = %8d\r\n",n, aCounts[n]);
   }

   QueryPerformanceCounter(&liStart);

   int max=0x40000000;
   for(int i=1; i < max; i<<=1)
   {
      for(n=0; n < 1000000; n++)
      {
         nuRand(i);
      }
   }
   

   QueryPerformanceCounter(&liStop);
   dTicks = liStop.QuadPart-liStart.QuadPart;
   printf("Elapsed = %6.4fms\r\n",(dTicks/liFreq.QuadPart)*1000);

   printf("\r\nDone! Press [enter] to quit...");
   getchar();
   return 0;
}


[Edited by - pragma Fury on June 23, 2005 11:45:33 AM]

Share this post


Link to post
Share on other sites

int nuRand(const int nUpper)
{
return 0;
}


...MUCH higher probability of low numbers than high numbers. Fast and efficient, too.

Share this post


Link to post
Share on other sites
if F is your desired distribution function and X is a random variable with uniform distribution [0,1]. Y=F^-1(X) is a random variable with distribution F.
( F^-1 is of course the inverse of F).

Share this post


Link to post
Share on other sites
[0,1)^2*(max-min)+min

As C:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

int nuRand(const int nUpper)
{
double temp=(double)rand()/((double)(RAND_MAX)+1.0);
return (int)(temp*temp*(double)nUpper);
}

int main()
{
int counts[10];
memset(counts, 0, sizeof counts);

int c = 0;

for(; c < 1000000; ++c)
++counts[nuRand(100)/10];

for(c = 0; c < 10; ++c)
printf("%d-%d:\t%d\n", c*10, (c+1)*10-1, counts[c]);
}



Output:

0-9:    315897
10-19: 131099
20-29: 100221
30-39: 85241
40-49: 74799
50-59: 67635
60-69: 61860
70-79: 57666
80-89: 53991
90-99: 51591


[Edited by - smart_idiot on June 23, 2005 12:21:38 AM]

Share this post


Link to post
Share on other sites
For the inverse (and given the properties of integers, much more obvious) solution to smart_idiot's...
#include <stdlib.h>
#include <stdio.h>

unsigned rand2( int high ) {
return rand() / rand() % high;
}

unsigned rand2_edited( int high ) {
static int last = 0;
if ( ! last ) last = rand();

int next = rand();
int result = (next / last) % high;
last = next;
return result;
}


int main(int argc, char** argv) {
unsigned hits[20] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
unsigned i;

for (i=0; i<1000000; i++)
hits[rand2(100) / 5] += 1;

for (i=0; i<20; i++)
printf("[%02u,%02u) hit %u times\n", i*5, (i+1)*5, hits[i]);

return 0;
}



Output:
[wyrframe@Earth ~/Programming]$ g++ rand2.cpp

[wyrframe@Earth ~/Programming]$ ./a.out
[00,05) hit 900533 times
[05,10) hit 50164 times
[10,15) hit 16997 times
[15,20) hit 8972 times
[20,25) hit 5306 times
[25,30) hit 3596 times
[30,35) hit 2581 times
[35,40) hit 1945 times
[40,45) hit 1697 times
[45,50) hit 1333 times
[50,55) hit 1131 times
[55,60) hit 983 times
[60,65) hit 872 times
[65,70) hit 779 times
[70,75) hit 628 times
[75,80) hit 582 times
[80,85) hit 558 times
[85,90) hit 508 times
[90,95) hit 447 times
[95,100) hit 388 times



(and yes; my main desktop is called Earth. My beautiful little silver laptop is named Mercury. When I get the damned thing working, my server will be called Mars. My girlfriend's computer, if I have any say in it, will be called Venus. I'm a geek. So sue me.)

Edit - Brief examination shows mine as slower than smart_idiot's by nearly 40%. But by caching the previous rand() in a static variable instead of calling it twice every time, mine is slower than his by only 12% (0.02 ms difference with one million calls, so who cares).

[Edited by - Wyrframe on June 23, 2005 2:50:44 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
int uRand()
{
static int seed = 1;
seed = ( seed * 3039177861 + 1 & 0x7fffffff );

return seed;
}

int nuRand( int nUpper )
{
return (nUpper-1) * 65536 / ( 65536 + uRand() );
}

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
if F is your desired distribution function and X is a random variable with uniform distribution [0,1]. Y=F^-1(X) is a random variable with distribution F.
( F^-1 is of course the inverse of F).

As I recall [and its been a few years], F isn't the distribution, its the integral of the distribution.

I like normals, and I dislike reinventing the wheel. Therefore, Brian Flannery et al officially suggest the following method:

double getFloat()
{
return rand()*(1.0/4294967295.0);
}

double getGaussian()
{
double fac,rsq,v1,v2;
do {
v1=2.0*getFloat()-1.0;
v2=2.0*getFloat()-1.0;
rsq=v1*v1+v2*v2;
} while (rsq >= 1.0 || rsq == 0.0);
fac=sqrt(-2.0*log(rsq)/rsq);
return v2*fac;
}

double getGaussian(double mean, double stdev)
{
return getGaussian() * stdev + mean;
}

Share this post


Link to post
Share on other sites
I couldn't figure out a good way to time it, but... here's a simple way, just taking the lower of two random values each time (and caching one random result).


#include <iostream>
#include <cstdlib>
#include <ctime>

int nuRand(const int nUpper) {
static int prev = rand();
int next = rand();
int result = std::min(prev % nUpper, next % nUpper);
prev = next;
return result;
}

const int bincount = 10;
const int binsize = 10;
const int trials = 1000000;

int main() {
srand(time(0));
int results[bincount];
for (int i = 0; i < trials; ++i) {
results[nuRand(bincount * binsize) / binsize]++;
}
for (int i = 0; i < bincount; ++i) {
std::cout << i << ": " << results[i] << std::endl;
}
}




// run 1
0: 189224
1: 170299
2: 149734
3: 130418
4: 110030
5: 90169
6: 70019
7: 50055
8: 29952
9: 10100

// run 2
0: 189702
1: 169299
2: 150861
3: 129924
4: 109899
5: 89904
6: 69998
7: 50351
8: 30203
9: 9859

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
I like normals, and I dislike reinventing the wheel.


OK, now massage it into a distribution which meets the spec :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by Conner McCloud
I like normals, and I dislike reinventing the wheel.


OK, now massage it into a distribution which meets the spec :)

I was going to, but when i reread the problem statement I saw "The probability curve of the returned values may be anything you wish" and missed the actual function definition.


int nuRand(const int nUpper)
{
int i = abs(int(getGaussian(0, nUpper/5.0)));
return (i < nUpper) ? i : nUpper-1;
}


*edit:
Using your test setup:

0: 383559
1: 299602
2: 184026
3: 87541
4: 32710
5: 9868
6: 2270
7: 356
8: 63
9: 5

If you change the 5.0 to 4.0, you get more of the higher values:

0: 311610
1: 264792
2: 193576
3: 120526
4: 64068
5: 28911
6: 11316
7: 3800
8: 1090
9: 311

I like the 5.0 more.
CM

Share this post


Link to post
Share on other sites
Ok... here goes... this is a modification to my general quadratic residue generator. Since everyone else is using rand() (which, btw, is a 3rd party library) I'm using GMP (mostly because I didn't feel like scaling my numbers down, it could be done, but there's really no reason to)


QRG.H-
#include <gmp.h>

class QRG
{
private:
mpz_t n;
mpz_t seed;
public:
QRG();
unsigned long GetNext();
unsigned long GetNext(int);
unsigned int GetNextBit();
~QRG();
};

QRG.CC-
#include <gmp.h>
#include <stdlib.h>
#include "qrg.h"

QRG::QRG()
{
mpz_t p, q;
mpz_init(p);
mpz_init(q);
mpz_init(n);
mpz_init(seed);

mpz_set_ui(p,2);
mpz_pow_ui(p,p,127);
mpz_sub_ui(p,p,1); // p is the Mersenne prime 2^127-1

mpz_set_ui(q,2);
mpz_pow_ui(q,q,521);
mpz_sub_ui(q,q,1); // q is the Mersenne prime 2^521-1

mpz_mul(n,p,q);

srandomdev(); // Seed the system random number generator from /dev/random

mpz_set_ui(seed,random()); // Use the system random number generator to generate our initial n
mpz_powm_ui(seed,seed,2,n); // Calculate the initial seed value

mpz_clear(p); // Delete p
mpz_clear(q); // and q
}

QRG::~QRG()
{
mpz_clear(n);
mpz_clear(seed);
}

// This function returns a random 32-bit integer
// and then calculates and stores the next seed value
unsigned long QRG::GetNext()
{
unsigned long output = 0;
mpz_t r;
mpz_init(r);

for (int i = 0; i < 32; i++)
{
unsigned long mod = mpz_mod_ui(r,seed,2);
output = output + mod;
if (i < 31)
output = output << 1;
mpz_powm_ui(seed,seed,2,n);
}
mpz_clear(r);
return output;
}

unsigned long QRG::GetNext(int bias)
{
unsigned long output = 0;
mpz_t r;
mpz_init(r);

for (int i = 0; i < 32; i++)
{
bool exiting = false;
unsigned long mod = mpz_mod_ui(r,seed,2);
output = output + mod;
if (i < 31)
output = output << 1;
mpz_powm_ui(seed,seed,2,n);
for (int i = 0; i < bias; i++)
{
if (GetNextBit() == 1)
exiting = true;
}
if (exiting == true)
break;
}
mpz_clear(r);
return output;
}


// This function returns a random 0 or 1
// and then calculates and stores the next seed value
unsigned int QRG::GetNextBit()
{
unsigned int output = 0;
mpz_t r;
mpz_init(r);

output = mpz_mod_ui(r,seed,2);
mpz_powm_ui(seed,seed,2,n);
mpz_clear(r);
return output;
}

TEST.CC-
#include <iostream>
#include "qrg.h"

int main(int argc, char** argv)
{
int trials = atoi(argv[1]);
int bias = atoi(argv[2]);
QRG* qrg = new QRG();
for (int i = 0; i < trials; i++)
std::cout << qrg->GetNext(bias) << std::endl;
delete qrg;
}


Output-
[jperalta@testarossa ~/src]$ ./biased.o 10 0
78967482
3532507987
769619069
1657007137
1493601390
318111018
1070407249
3036123774
2652192994
3264996435
[jperalta@testarossa ~/src]$ ./biased.o 10 1
0
4
12
0
6
0
0
2
2
2
[jperalta@testarossa ~/src]$ ./biased.o 10 2
0
0
0
4
0
0
2
2
2
2
[jperalta@testarossa ~/src]$ ./biased.o 10 3
2
2
0
0
0
0
0
2
0
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
As I recall [and its been a few years], F isn't the distribution, its the integral of the distribution.

I wasn't sure of the right term either, but distribution function is the integral of the probability function:
http://mathworld.wolfram.com/DistributionFunction.html
So it is the distribution function, which btw was called F on my math courses while f was the probability function.

Share this post


Link to post
Share on other sites
Okay, one of the early xorshift randomizers, which make up the predecessors of the Merseen Twister. Amazingly fast. Uniform distribution. ( http://www.jstatsoft.org/v08/i14/xorshift.pdf )

static unsigned int seeda;
static unsigned int seedb;
unsigned int xorranda()
{
seed^=(seed<<1); seed^=(seed>>3); seed^=(seed<<10);
return seed;
}
unsigned int xorrandb()
{
seed^=(seed<<4); seed^=(seed>>7); seed^=(seed<<19);
return seed;
}
void xorsrand( unsigned int p )
{
seeda=seedb=(p>>16)|(p<<16);
}


Now, I want to assert that somewhere exist a fast interger square root function. I'm going to call it int fastsqrt(int). I don't know an implementation for it, and I don't feel like spending 5 minutes in google.


unsigned int biased_rand()
{
return (((65536 - fastsqrt(xorranda()))<<16)+(xorrandb&65535));
}


So, the probability of 0 through 65535 is equal. The probability of 65536 through 131071 is less than that of 0 through 65535, but equal among it's range. Follows up to 4294901760 through 4294967295, each number in the range is equivalent chance, but the whole range's chance is near impossible.

Now, the follow up.

int nuRand(const int nUpper)
{
return (int) ((float)biased_rand() / 4294967296.0 ) * nUpper;
}


For the sample, nuRand(100), 99 will be near impossible, but 0 will be the most occurant. My solution falls within the range of the given parameters, but shows faults when you test it's upper extremities.

Share this post


Link to post
Share on other sites
Quote:
Original post by jperalta
Since everyone else is using rand() (which, btw, is a 3rd party library)...

No its not. Its just as much part of the language as + or & is.

FlowingOoze: Oh, right. Nevermind [grin]

pragmaFury: That's why you never you never use magic numbers. I copied that straight from my random number class, but switched the actual random number generator call to rand() for simplicity. But I didn't bother changing the to float calculation to account for the new value of RAND_MAX. Oops. Should anybody care, to get that code to work replace the 'return rand()*(1.0/4294967295.0);' in getFloat() with 'return rand()*(1.0/RAND_MAX)'. I didn't submit that to compete, just to illustrate the generation of normal distributions, so don't worry about retrying my solution, I won't feel slighted.

CM

Share this post


Link to post
Share on other sites

This topic is 4561 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this