Random Arbitrary Precision Numbers

Started by
21 comments, last by iMalc 10 years, 1 month ago


All the standard random classes provide is a single unsigned integer return value at a time; should I mandate that these RNG objects have a method that fills a buffer of bytes?

Not necessarily. If they already return a single unsigned integer each, you can work with that too.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Advertisement


All the standard random classes provide is a single unsigned integer return value at a time; should I mandate that these RNG objects have a method that fills a buffer of bytes?

Not necessarily. If they already return a single unsigned integer each, you can work with that too.

Then I run into the problem where I don't know what I'm handling. I guess requiring some sort of standard interface is required. To what extent should things be required? Should RNG classes also be expected to provide things like floats in the range [0, 1], and other things?


Then I run into the problem where I don't know what I'm handling. I guess requiring some sort of standard interface is required.

Yes, if you don't know what you're handling then you can't do anything with it wink.png though if you know that they are all unsigned integer types, you might be able to use a compile-time sizeof() to work out how many bits you're getting from the RNG, and using that to piece together a digit, I don't know if such a thing is idiomatic in C++ though.


To what extent should things be required? Should RNG classes also be expected to provide things like floats in the range [0, 1], and other things?

To be perfectly honest there is no convention. Some people (like me) advocate for separating the RNG from the type (distribution) of data generated, so that the RNG provides only a stream of bits (either as a literal stream of bits, or by chunks of 32/64 bits, whichever is easier) and then another class uses that to produce floating point values, another class uses that to produce a normal distribution, another class uses that to produce integers between 0 and N, and so on. Other people prefer to couple the two together, so that you have this big "Random" object that usually doesn't expose its output directly, but is able to convert it to what you need on the fly. It's a trade-off between composability and convenience, but even if you don't separate them in code I feel it is crucially important to understand the difference between a pseudorandom bit generator and a probability distribution.

Both approaches are sufficient in terms of features you need, but of course the code needed to make use of them will not be the same in both cases. At the end of the day, you (so far) only want to generate uniform "big integers" in a specific range. As I've shown in my previous post, you can build this using only a single primitive: "give me N random bits" (or "give me N random 32-bit integers", and so on). That is what your RNG needs to be able to do. There's no way to tell which particular signature is best-suited without knowing more about your code and its potential use cases, but they are all equivalent and sufficient.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”


though if you know that they are all unsigned integer types, you might be able to use a compile-time sizeof() to work out how many bits you're getting from the RNG, and using that to piece together a digit, I don't know if such a thing is idiomatic in C++ though.

Possible, but like the rand() example before, not foolproof. I'm pretty sure that the standard PRNG classes of C++11 allow you to have an instantiation where the return type has more bits than the generated output.

I think I will make the segregation between distribution and generation; it seems like the strongest way to re-use code while still maintaining a perfectly standardized interface. Thank you, I will do that. Thus, internally, the method will generate a distribution object that suits its needs, and the distribution object will handle reading from the RNG and emitting values that fit the distribution.


Some people (like me) advocate for separating the RNG from the type (distribution) of data generated

It took me a little bit to catch on to what you meant, but having a separate object from the RNG that managed the distribution seemed to be key in this instance; the RNG exposed its types and its limits, which the distribution specifically sought: the distribution only needs to know about the RNG, and defines itself based on it and the parameters given to it. The distribution then exposes its types and limits, which have been tuned to meet the requirements of the caller. In this case, its return type is of little consequence, since the parameters used to define the distribution guarantee that it precisely matches the range of the digit.

In this way, each object passes information in one direction (an object doesn't need to know about the object that will be using it), and each is restricted to one responsibility, alleviating the problem where it was unknown how the responsibility should be divided amongst the two objects by introducing a third object in between them. Thank you for the advice, and let me know if there's something that I missed.

one that generates a random number modulo a specified number

Then it isn’t very random. Modulus is absolutely the wrong way to give a range of random numbers from 0 to N, as it favors lower numbers.

This explains why:
http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator

However do not use a while loop to generate an unbiased range of random numbers as he suggested; by definition that could be an infinite loop, or at least be severely (order of seconds) slow. Imagine that the range of rand() is 0-268,435,456 (which RAND_MAX is allowed to be by the standard) and you looped until it picked a number below 3.

The correct way is:


unsigned int Rand( unsigned int _uiMax ) {
    double dRand = static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    return static_cast<unsigned int>(dRand * (_uiMax - 1.0)); // Remove “ - 1.0” to be inclusive instead of exclusive.
}
L. Spiro

That doesn't produce an even distribution either. That still results in some numbers being more likely than others, but the bias isn't with all the numbers at one end. Rather it might result in every third value being more likely, for example.

It is absolutely not right to call that the "correct" way. It is a way sure, but it by no means a panacea.

The rejection method actually does completely remove bias, not just smear that bias around.

In fact if there were a correct way to do it, it would be the rejection method, which yes involves a while loop. For such a technique to cause an infinite loop, the PRNG you are using would have to be braindead useless, to the point that this attempt at removing bias is futile.

If you do some analysis on how many times a generator is expected to loop around when using the rejection method, you'll find that the number of iterations follows the power law, and realistically the chance of looping much at all is completely insignificant, especially if using a 32-bit generator or larger.

Here is a previous post I have made explaining the issue very clearly:

http://cboard.cprogramming.com/c-programming/145187-how-pick-random-number-between-x-y.html

Ah what the heck, I'm finally starting to write that article on random number generation that I've been meaning to write for a long time. I shall explain everything fully in there, including precisely why the floating point technique used here is actually pretty darn awful. I'll link it in this thread when I'm done.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

Okay, here is a rough draft of my new article, comments welcome:

http://homepages.ihug.co.nz/~aurora76/Malc/Random_Numbers.htm

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement