Is this just stupid?

Started by
9 comments, last by Antheus 16 years, 6 months ago
Hello, I am working on a small random number generator. I have a "People" class and each person gets an "age of death" when they get created. Life expectancy is at 80 so i want most of my people to live to around 80 with the rest dying before 80 or after 80. I have been playing around with statistics and various distribution techniques but i couldnt get it just right. I then came up with the below novel soluton and would like to know if it is just silly or too computationally intensive. Basically, what i have done is set up an array that stores ages between 0 and 100. 0 is stored once, 1 is stored twice, 2 is stored 3 times ect up to 80. Then after 80 it starts declining by 4. So 80 gets stored 81 times 81 gets stored 77 times ect ect down to 100 which gets stored 1 time. So my array is like so: 0, 1, 1, 2, 2, 2, 3, 3, 3, 3........ 80(81 times), 81(77 times), 82(73 times)........99, 99, 99, 99, 99, 100 All up the array is of size 4101. I then get javas Random number to produce a number that is between 0 and 4100 and use that as the index of the array above to return a random number between 0 and 100. I ran it 10000 times and the distribution works out perfectly. I end up with heaps of people dying at around 80 with a gradual incline from zero up to 80 and steep decline from 80 down to 100. Now, my question is, is having an array of size 4101 to big? Is this just a convoluted way of achieving this distribution? Any tips would be great. Thanks P.S If the above was too confusing I have wrote a simplified form below. Lets say we have a pig, dog and cat. We want to select a random animal from this set but we want 3 times more cats than pigs and twice as many dogs than pigs. So its - pig:1 dog:2 cat:3 String[] animals = {pig, dog, dog, cat, cat, cat}; Then you select a random number beween 0 and 5 and you end up with the distrubution you wanted. That was probably more complicated than the first one :P
Advertisement
The array you are using is pretty trivial.

But there should be a math solution also. Give me a little bit and see if I can come up with a math solution.

theTroll
That is one way.

Of course, life expectancy follows natural distribution, so you might want to just use Random.nextGaussian, adjusted for your desired distribution.

Life expectancy doesn't rise linearly. It has a spike at age of 0, rises very slowly after that, and forms a bell curve around expected age of death.

You could simulate this using nextGaussian. If the result is less than 0.5, then you scale the value to [0, d], otherwise to [d, n], where d is expected age of death, and n is maximum age.

double getAoD( double aod, double max ){  double x = random.nextGaussian();  if (x <= 0.5) {    return x * aod * 2; // 0..aod  } else {    return aod + 2 * (x - 0.5) * (max-aod); // aod..max  }};
This should produce skewed bell curve. If you replace nextGaussian with linear function, the result should be similar to your algorithm.

But - table based approach might be just fine.
Quote:Original post by one mind
Now, my question is, is having an array of size 4101 to big? Is this just a convoluted way of achieving this distribution?

It's not too big, but it's wasteful and sure is convoluted.

If you were to generate a random number between 0 and 100, the probability that it lies below 80 is 80%. The probability that it lies above 80 is 20%, or four times less likely than it lying below 80. You can use this simple fact to set cut-off points such that you achieve your desired distribution.

Using your pig, dog and cat example to illustrate, you want three times more cats than pigs and twice as many dogs as pigs - the ratio pig:dog:cat → 1:2:3. You want a multiple of six (1 + 2 + 3) for your random spread, with either the lowest or highest one-sixth representing a pig selection, the next one-third representing a dog selection, and the remainder, a full half, representing cat.

import java.util.Random;...java.util.Random rgen = new java.util.Random();int spread = 120;int cats = 0, dogs = 0, pigs = 0;for(int i = 0; i < 60; ++i) {    int r = rgen.nextInt(spread);    if(r < spread / 6) {        ++pigs;    } else if (r < spread / 2) {        ++dogs;    } else {        +=cats;    }}System.out.printf("For a given 60 animals, %1$ are pigs, %2$ are dogs and %3$ are cats", pigs, dogs, cats);...
Thanks guys.

Yeah, i know its not a very accurate model but it should do for my purposes. All that standard deviation talk was making me nervous when i looked into gaussian distribution :P

There is a math solution, like so for example:

age =
when 0 <= x < 3321:
floor( (sqrt(8x + 1) - 1) / 2 )
when 3321 <= x < 4101:
100 - floor( (sqrt(131204 - 32x) + 2) / 8) )

However, i find using a lookup table to be much less computationally intensive.

I did a test run using 10000 random numbers and graphed the result.

Here is a pic if anyone is interested:

http://img413.imageshack.us/img413/8126/untitledzo9.gif

The top graph is obviously the linear model and the bottom are the actual results.
This one isn't particularly fast (and therefore probably suitable only for pre-generation), but it makes a nice curve.

Generate a random integer between 1 and 2 a total of 53 times and add the results.

You get a curve with an average of 79.5 and a maximum age of 106, though there is only a 1.11 x10^-16 chance, so it's just not going to happen except over the course of billions of number generation sequences.

In other words, roll 53d2. :D

EDIT: I'd go with the lookup table, really. RAM is at much less of a premium than computing power when running games.

[Edited by - caesura on September 26, 2007 10:50:56 AM]
Noticed some new posts while i was writing, reading them now :P
Ok, just read some alternative solutions so I will try them out.

Below is my actual class doing it my way. As you can see, the array is populated in the constructor so it only has to do that once so each time i need a new age i really am only generating a random number and returning an int from the array. It is really fast compared to my square root way.

    class Randomness    {        private int[] weights;        private Random rand;        public Randomness()        {            weights = new int[4101];            rand = new Random();            int count1, count2, count3;            count1 = 0;            count2 = 1;            count3 = 0;            while (count3 < 3320)            {                while (count2 > 0)                {                    weights[count3] = count1;                    count2--;                    count3++;                }                count1++;                count2 = count1 + 1;            }            count3 = 3321;            count1 = 81;            count2 = 77;            for (int temp = 77; count3 < 4101; temp -= 4)            {                while (count2 > 0)                {                    weights[count3] = count1;                    count2--;                    count3++;                }                count1++;                count2 = temp - 4;            }        }        public int getMaxAge()        {            return weights[rand.Next(0, 4101)];        }    }
What you're talking about is a normal distribution. I was going to provide a formula for calculating age-of-death, but I don't know how to calculate the area under the bell curve. (In finite math, we learned how to plot points along the curve, but I've only taken pre-calculus in high school some zillion yeas ago, and I don't remember integration very well, so I don't remember how to calculate the area beneath the curve.) If someone does, you can use this general equation:

z = (x - m) / s

or

x = z * s + m

...where z is the number of standard deviations (calculated using the area under the curve based on your given percentile — in other words, the part I don't know), m is the mean (in your case, 80 years), s is the length of one standard deviation (the values on either side of the mean where 68 percent of deaths occur . . . in your case, maybe +/-20 years?), and x will be the resultant age. Two of those will be constants: m and s.

Let me dig around the web and try to find an area formula.

EDIT: Okay, gonna need some more college for this. Sorry! If your present solution works, you can stick with it until you find something more graceful. I would consider a "working but ugly" bit of code a low-priority fix. You can generate the lookup table procedurally if you want, perhaps using a graphing calculator, to ensure the highest possible quality, but in the end it boils down to whatever works.

EDIT #2: Or you could copy the z-table from a site like this, or from a college text, and use it in your app. It would certainly be faster than generating the area algorithmically, and easier.

[Edited by - Tom on September 26, 2007 11:27:57 AM]

GDNet+. It's only $5 a month. You know you want it.

Quote:Original post by Tom
In finite math, we learned how to plot points along the curve, but I've only taken pre-calculus in high school some zillion yeas ago, and I don't remember integration very well, so I don't remember how to calculate the area beneath the curve.


The integral for this one is "elementary", meaning basically that nothing works to simplify it :) This is why there are Z-tables (generally worked out with numerical integration. Also see 'error function' on Mathworld.

This topic is closed to new replies.

Advertisement