Selfbalancing random number generation
Started by Kylotan, Aug 09 2001 06:50 PM
36 replies to this topic
#21 Members  Reputation: 122
Posted 20 August 2001  02:44 AM
Okay, maybe I''m just not getting what you''re REALLY wanting here...
We''ll use your example in your reply to me. We keep calling RandPercent(90). Your first call results in a 90% true, 10% false return. I''m good with you on that part.
However, as you mentioned, let''s pretend we get 5 true returns from this. You say the probability is still > 50%. What''s your rationale for this, because I don''t think i see it...
Unless it''s this: at 90%, we get 9 true returns out of every 10. We''ve already got back 5, so that means 4 out of the next 5 must be true, thus the odds should be 80%?
Dragonus
Thought of the Moment
If F = ma, and Work = Fd, then does Work = mad?
We''ll use your example in your reply to me. We keep calling RandPercent(90). Your first call results in a 90% true, 10% false return. I''m good with you on that part.
However, as you mentioned, let''s pretend we get 5 true returns from this. You say the probability is still > 50%. What''s your rationale for this, because I don''t think i see it...
Unless it''s this: at 90%, we get 9 true returns out of every 10. We''ve already got back 5, so that means 4 out of the next 5 must be true, thus the odds should be 80%?
Dragonus
Thought of the Moment
If F = ma, and Work = Fd, then does Work = mad?
#22 Moderators  Reputation: 3786
Posted 20 August 2001  06:02 AM
quote:
Original post by liquiddark
Kylotan, I don''t see how you expect any system to perform much better than a standard PNRG if you don''t provide a "cap" to its behaviour. If you''ve got an openended system, any sufficiently dimensionally independent PNRG (and there are lots) is in fact pretty close to the definition of proving the statistic.
Yeah, given enough samples, which it won''t always get.
Think about the Taylor series or some other approximation method. The series doesn''t know what the real answer you''re looking for is, but you can keep applying it to get better approximations. That is what I figured could be done here: there could be a mechanism that moves in the direction I want it to. You couldn''t guarantee it moves to the exact ''position'', since there is no cap, but it should be able to move in the right direction. Do you see what I''m getting at?
I still don''t think that algorithm takes enough of the data into account. I didn''t understand the "newChance = pctg  ((int)((float)trues/(float)(++totalcalls))  pctg))"  too much in one line for my little brain to follow. But it''s only taking into account previous true/false scores, rather than previous expected scores. Maybe the next paragraph will clarify.
Dragonus: For that 6th call, after retrieving 5 trues, ideally the percentage should be 80%. Because you would expect 9 trues out of 10. A normal PRNG will have a percentage of 90%, since it ignores what''s gone before. Your algorithm however has a 15% chance of returning true on that 6th call. Basically, it overcompensates, because it doesn''t take into account that I called the function with a 90% parameter. Any function that only counts trues and falses returned is only accurate when always called with a 50% parameter since it weights them equally.
#23 Members  Reputation: 122
Posted 20 August 2001  09:04 AM
Well, empirically, the final equation you want will be logistic. After a bit of work on my TI83, I figured out that a model in the forum of
tweaked = original / (1 + A * e^B)
is the type of equation you want, with an implicit logarithm attached. However, I did this based on X number of trials, and coming up with an allcase rule won't be too effective...
I also had some luck with this equation:
Granted, this does tie it down to a specific number of calls. However, you can technically cheat and force your hand, so to speak. Let maxCalls = 100 (since our inputs are 0 to 100). Every time totalCalls == 100, reset totalCalls and trueCalls to 0.
The above equation GUARANTEES beyond all doubt (I think) that you will get the exact number of true returns for the number over 100 calls to the function. For example, let's deal with 90%. If trueReturns == 90, then the numerator == 0 and tweaked == 0. If falseReturns == 10 (e.g., totalCalls == trueReturns + 10), since maxCalls == original + 10, the 10's cancel and you're left with a solid probability of 100% over the calls up until 100.
You can set maxCalls as high as you want, and you can set it extremely high if you want it to be "infinite". The only problem that you might find in doing this is that you won't get the exact results you planned on, but theoretically, you should get pretty close. The reason for this is, when totalCalls << maxCalls, the tweaked probability doesn't vary greatly. Then again, we really don't want it to from the getgo, but towards the end, you won't see the massive probability jump to 0% or 100% like you'd expect if you knew how many calls you were going to make, which has both it's negatives and positives.
I think the 2nd equation is the better of the two, even though it is tied down to the maxCalls variable, but, as I said, there are ways to get around it.
Dragonus
Thought of the Moment
If F = ma, and Work = Fd, then does Work = mad?
Edited by  Dragonus on August 20, 2001 4:07:51 PM
tweaked = original / (1 + A * e^B)
is the type of equation you want, with an implicit logarithm attached. However, I did this based on X number of trials, and coming up with an allcase rule won't be too effective...
I also had some luck with this equation:

Granted, this does tie it down to a specific number of calls. However, you can technically cheat and force your hand, so to speak. Let maxCalls = 100 (since our inputs are 0 to 100). Every time totalCalls == 100, reset totalCalls and trueCalls to 0.
The above equation GUARANTEES beyond all doubt (I think) that you will get the exact number of true returns for the number over 100 calls to the function. For example, let's deal with 90%. If trueReturns == 90, then the numerator == 0 and tweaked == 0. If falseReturns == 10 (e.g., totalCalls == trueReturns + 10), since maxCalls == original + 10, the 10's cancel and you're left with a solid probability of 100% over the calls up until 100.
You can set maxCalls as high as you want, and you can set it extremely high if you want it to be "infinite". The only problem that you might find in doing this is that you won't get the exact results you planned on, but theoretically, you should get pretty close. The reason for this is, when totalCalls << maxCalls, the tweaked probability doesn't vary greatly. Then again, we really don't want it to from the getgo, but towards the end, you won't see the massive probability jump to 0% or 100% like you'd expect if you knew how many calls you were going to make, which has both it's negatives and positives.
I think the 2nd equation is the better of the two, even though it is tied down to the maxCalls variable, but, as I said, there are ways to get around it.
Dragonus
Thought of the Moment
If F = ma, and Work = Fd, then does Work = mad?
Edited by  Dragonus on August 20, 2001 4:07:51 PM
#24 Anonymous Poster_Anonymous Poster_* Guests  Reputation:
Posted 20 August 2001  12:28 PM
quote:
Original post by Kylotan
I didn''t understand the "newChance = pctg  ((int)((float)trues/(float)(++totalcalls))  pctg))"  too much in one line for my little brain to follow.
Basically it says your new percentage is:
[the desired percentage] *minus* [the difference between] [the percentage achieved thus far] and [the desired percentage]
Thus, you get something like this:
Randomchance(80) = 1: next time the tweaked percentage is going to be .80  (1/1  .80) = .80  .2 = .60.
Next generation:
RandomChance(80) = 1: next time the tweaked percentage is going to be .80  (2/2  .80) = .80  .2 = .60
Actually, this scheme isn''t all that great, really, since you''re set on a "squeeze"type solution.
quote:
Dragonus: For that 6th call, after retrieving 5 trues, ideally the percentage should be 80%. Because you would expect 9 trues out of 10. A normal PRNG will have a percentage of 90%, since it ignores what''s gone before. Your algorithm however has a 15% chance of returning true on that 6th call. Basically, it overcompensates, because it doesn''t take into account that I called the function with a 90% parameter. Any function that only counts trues and falses returned is only accurate when always called with a 50% parameter since it weights them equally.
In an openended system this is an unrealistic request. What is your chance of getting a false in six calls? Since you''ve got a 1/10 chance per go, it''s about 6/10. Hence your adjusted percentage is 40%, right?
So I''ll try a derivation of a method to use with this insight:
Firstly, I really need a generator object to talk about this intelligently. Assume the generator has an exposed interface with:
a Generate() function,
a SetWeightsAndValues(vector weights, vector vals)
internally, it should at least have:
an OriginalWeights[] vector
an AdjustedWeights[] vector
a Values[] vector
a NumberofGenerations[] vector to keep track of how many times this value has been generated.
a NumberofGenerates integer to keep track of how many times the Generate() function has been called altogether
Once the weights and values are set, the Generate() function is called. Add the OriginalWeight of amount(A) to the AdjustedWeight of amount(A). If the total POSITIVE (ie, disregarding negative entries) AdjustedWeight provided > 100, normalize the POSITIVE ELEMENTS ONLY of the AdjustedWeights vector. Once a value is generated, subtract 100 from the AdjustedWeight of that value. Let''s go back to trues and falses for a second to give an idea of what I''m thinking here:
say we call
SetWeightsAndValues({99,1}, {0,1})
we get:
Value New probabilities:
1 99100+99 = 98 1+1 = 2
1 98100+99 = 97 2+1 = 2
1 97100+99 = 96 3+1 = 4
. . .
. . .
. . .
1 55100+99 = 54 45+1 = 46
0 54+99 = 153 46100 = 53
1 153100+99 = 152 53+1 = 52
. . .
. . .
. . .
Don''t know if this helps or not. I can prove that it functions very much as you request, and I believe the process extends to other vectorized number generators, but I don''t have a clean proof.
ld
#26 Members  Reputation: 532
Posted 20 August 2001  12:57 PM
Just a little suggestion....say for instance you only store a certain number of past values. Say 50, for example. First you start with your basic algorithm for generating a random number, then store the result in a linked list, queue, whatever. Then build the first 50 numbers by taking the number of past values that are opposite to what you want (In your case this would involve numbers less than .5, greater then .5), and using that number as the probability of generating the value you need on this one. Makes Sense? Then, when you reach 51, 11, etc., you simply discard the first value and add the new value. A linked list would be great for this.
#27 Members  Reputation: 864
Posted 20 August 2001  06:13 PM
Sorry if my wading into this thread so late causes a rehash of old stuff...
Kylotan: You have a problem in your requirements for the system that makes a solution not possible.
By biasing the coin flip (or roulette wheel for continuous variables) so that in the limit as the number of trials approaches a given number N, the apparent bias of the coin approaches the actual bias (i.e., k/n > p(success,t=0); k= number of successes, n = number of trials held so far) means that you no longer have a random process of independant trials. You''re coin flips will be correlated and what you will be producing is the equivalent of a coloured noise process.
However, you state that you desire a system that does not maintain information about prior flips. I don''t see this as being possible. You will need to maintain some information regarding the sequence generated so far. In particular, the coloured noise will be a time varying function (time is the number of trials conducted so far).
Stochastic is best defined as ''containing an element of unpredictability''. More formally, if a variable X can be explained as a (possibly nonlinear) function of a set of other variables, Y, then X is a random (or stochastic) variable if there exists a residual variance in the values of X when Y is held constant.
You would be far better off just to stick with random numbers and maintain a moving window of behaviour that broke streaks of good luck or bad luck by inserting a failure or success respectively.
The easiest way to do this is to reject any coin flip if the sequence of previous flips (over a window of length k) are identical. Simply resample from the distribution (which, if it is binary, means return the desired result). In the limit as the number of trials is large, this insertion will not appreciably harm the statistics of your sequence.
Cheers,
Timkin
Kylotan: You have a problem in your requirements for the system that makes a solution not possible.
By biasing the coin flip (or roulette wheel for continuous variables) so that in the limit as the number of trials approaches a given number N, the apparent bias of the coin approaches the actual bias (i.e., k/n > p(success,t=0); k= number of successes, n = number of trials held so far) means that you no longer have a random process of independant trials. You''re coin flips will be correlated and what you will be producing is the equivalent of a coloured noise process.
However, you state that you desire a system that does not maintain information about prior flips. I don''t see this as being possible. You will need to maintain some information regarding the sequence generated so far. In particular, the coloured noise will be a time varying function (time is the number of trials conducted so far).
quote:
Original post by Kylotan
I have seen the term ''stochastic'' used in some contexts similar to this, but have yet to find an explanation of the term that is both precise and yet not overly complex. Perhaps this is the solution to my problem, but I don''t know.
Stochastic is best defined as ''containing an element of unpredictability''. More formally, if a variable X can be explained as a (possibly nonlinear) function of a set of other variables, Y, then X is a random (or stochastic) variable if there exists a residual variance in the values of X when Y is held constant.
quote:
Original post by Kylotan
(For those who are interested, basically I just want this so that I can have a random number system in a game that allows for freak events, but takes corrective action to reduce the chance of an unlucky streak damaging a player''s chances.)
You would be far better off just to stick with random numbers and maintain a moving window of behaviour that broke streaks of good luck or bad luck by inserting a failure or success respectively.
The easiest way to do this is to reject any coin flip if the sequence of previous flips (over a window of length k) are identical. Simply resample from the distribution (which, if it is binary, means return the desired result). In the limit as the number of trials is large, this insertion will not appreciably harm the statistics of your sequence.
Cheers,
Timkin
#28 Moderators  Reputation: 3786
Posted 21 August 2001  08:08 AM
quote:
Original post by Timkin
However, you state that you desire a system that does not maintain information about prior flips. I don''t see this as being possible. You will need to maintain some information regarding the sequence generated so far. In particular, the coloured noise will be a time varying function (time is the number of trials conducted so far).
The system will not be able to store the previous sequence in its entirety due to implementational limitations. (Finite RAM and all that.) But it could happily store the number of samples, plus the mean of the returned values. Or any other measure of central tendency, or any other value that needs recording. It just can''t record the whole sequence because that will grow infinitely, and is thus impractical.
quote:
You would be far better off just to stick with random numbers and maintain a moving window of behaviour that broke streaks of good luck or bad luck by inserting a failure or success respectively.
The easiest way to do this is to reject any coin flip if the sequence of previous flips (over a window of length k) are identical. Simply resample from the distribution (which, if it is binary, means return the desired result). In the limit as the number of trials is large, this insertion will not appreciably harm the statistics of your sequence.
This is all very well for coin flips. Coin flips are trivial to deal with. However, I am not using coin flips. How do I do something like this if I get 2 true results from 10 calls, when in fact I only expected 1 true result (ie. a 10% chance)?
#31 Moderators  Reputation: 3786
Posted 21 August 2001  03:51 PM
quote:
Original post by Dragonus
Kylotan, did you see my last post?
Well, I don''t quite see how to get around the ''max calls'' problem, for starters. But the other issue is that it assumes the same percentage being passed to it each time. I don''t want to limit the procedure to only working on a given probability: I just want it to know whether it''s been scoring true more than it ''should'' or false more than it ''should'', on input values which can vary.
#32 Members  Reputation: 491
Posted 22 August 2001  12:23 AM
It''s early, I''m tired, so maybe this has already been suggested. Why not generate a number between 0 and 1. A value greater than the number of successes divided by the number of roles would be needed for a success. So if you had ten roles already and six successes you would only have a 40% chance of a success. With a bit of thought this could be expanded out to more than a true/false outcome, but as I said it is too early to think clearly.
#33 Anonymous Poster_Anonymous Poster_* Guests  Reputation:
Posted 22 August 2001  02:31 AM
Sigh.
I don''t think that you have fixed your design requirements firmly enough in your own head. First you complain that a standard PRNG won''t get enough samples, then you state that storing values will ''grow to infinity.''
I think you need to figure out how many calls you really expect this system to get, and when resets will or will not be noticable by the user. Then you''ll really know if you can cap the number of calls or not... and make no mistake, you will either have to cap the number of calls, or stick with one percentage per instance of your generator object.
I don''t think that you have fixed your design requirements firmly enough in your own head. First you complain that a standard PRNG won''t get enough samples, then you state that storing values will ''grow to infinity.''
I think you need to figure out how many calls you really expect this system to get, and when resets will or will not be noticable by the user. Then you''ll really know if you can cap the number of calls or not... and make no mistake, you will either have to cap the number of calls, or stick with one percentage per instance of your generator object.
#34 Members  Reputation: 122
Posted 22 August 2001  02:51 AM
Oops, I goofed a bit on the equation. Here's what it should be...
That's done so that the numerator will be with the same order of magnitude (roughly) as the denominator. (Also assumes original is between 0 and 100. If original is between 0 and 1, i.e., a fraction, replace the 100 with 1.)
~*~*~*~
Actually, Kylotan, it doesn't assume that it will get the same probability every time  I was just using the same probability case as an example:
If you assume maxCalls == 100, your first call to the function (let's say original == 90 the first time), trueReturns == totalCalls == 0, so you get 90%.
Assume you get a true next time you call it 50% being the original percent. trueReturns == totalCalls == 1, so the real probability is (501)/(1001) = 49/99 = 49.5%.
Assume you then get a false and call it at 30%. TrueCalls == 1 and totalCalls == 2. So your chance now is (301)/(1002) = 29.6%. A similar call to 70% would give (701)/(1002) = 70.4%. It will keep the odds you want.
As far as "getting rid of" the maxCalls number, I still think setting it "high enough" would work. All you have to do is make sure you don't call it that many times. If you set it to exactly the number of calls you'll be making, this equation will be exact. However, if you increase the maxCalls number, it won't be perfectly exact, BUT the probabilities will follow the distribution you're looking for.
Given your explanation of the problem (when you specified a number of calls), this is exactly the equation you're looking for. But if you want to make any number of calls to the function, all you have to make sure is that maxCalls is significantly bigger than the actual number of times you intend on calling the function; if you set it "high enough", you'll never run into that problem.
~ Dragonus
Edited by  Dragonus on August 22, 2001 9:56:57 AM

That's done so that the numerator will be with the same order of magnitude (roughly) as the denominator. (Also assumes original is between 0 and 100. If original is between 0 and 1, i.e., a fraction, replace the 100 with 1.)
~*~*~*~
Actually, Kylotan, it doesn't assume that it will get the same probability every time  I was just using the same probability case as an example:
If you assume maxCalls == 100, your first call to the function (let's say original == 90 the first time), trueReturns == totalCalls == 0, so you get 90%.
Assume you get a true next time you call it 50% being the original percent. trueReturns == totalCalls == 1, so the real probability is (501)/(1001) = 49/99 = 49.5%.
Assume you then get a false and call it at 30%. TrueCalls == 1 and totalCalls == 2. So your chance now is (301)/(1002) = 29.6%. A similar call to 70% would give (701)/(1002) = 70.4%. It will keep the odds you want.
As far as "getting rid of" the maxCalls number, I still think setting it "high enough" would work. All you have to do is make sure you don't call it that many times. If you set it to exactly the number of calls you'll be making, this equation will be exact. However, if you increase the maxCalls number, it won't be perfectly exact, BUT the probabilities will follow the distribution you're looking for.
Given your explanation of the problem (when you specified a number of calls), this is exactly the equation you're looking for. But if you want to make any number of calls to the function, all you have to make sure is that maxCalls is significantly bigger than the actual number of times you intend on calling the function; if you set it "high enough", you'll never run into that problem.
~ Dragonus
Edited by  Dragonus on August 22, 2001 9:56:57 AM
#35 Moderators  Reputation: 3786
Posted 22 August 2001  04:47 AM
quote:
Original post by Anonymous Poster
Sigh.
I don't think that you have fixed your design requirements firmly enough in your own head. First you complain that a standard PRNG won't get enough samples, then you state that storing values will 'grow to infinity.'
Where did I say anything about a standard PRNG not getting enough samples? *looks very confused*
EDIT: Oh, I see: you misunderstood what I was referring to. Any random sampling is subject to higher rates of error given smaller samples. A normal PRNG is going to have almost zero error, eventually, but 'eventually' is too late. All I want is a system that can calculate the error (which is theoretically possible since we always know what the 'expected' outcome is, and just how likely it is), and take action to reduce that error by biasing subsequent calls.
quote:
I think you need to figure out how many calls you really expect this system to get.
As many as it needs. There is no fixed amount, nor can there be.
Edited by  Kylotan on August 22, 2001 12:02:04 PM
#36 Moderators  Reputation: 3786
Posted 22 August 2001  05:11 AM
quote:
Original post by Dragonus
But if you want to make any number of calls to the function, all you have to make sure is that maxCalls is significantly bigger than the actual number of times you intend on calling the function; if you set it "high enough", you''ll never run into that problem.
Ok, I''m with you, and this is certainly along the right lines, but: if you make ''maxCalls'' too high, then the balancing effects become minimal.
I feel sure there must be some way of doing this that doesn''t have some fixed bound, because given enough ''errorcorrection'', what has gone previously can be considered to have been adequately compensated for. Am I making sense?
#37 Members  Reputation: 340
Posted 22 August 2001  12:32 PM
The method I (accidentally) doubleposted earlier is guaranteed to work "correctly" from the getgo, has no fixed upper bound, and I''m even surer now than I was at the time that it is extensible to any vector of weighted probabilities. I believe there are a number of ways to address its deficiencies, as well.
ld
ld