Random Arbitrary Precision Numbers

Started by
21 comments, last by iMalc 10 years ago

I have an arbitrary precision number class, and I'm looking for a way to generate numbers in a thread-safe way by having the caller pass in the RNG state. What do you think is the best way to generate a random arbitrary precision number? I have three random functions, one that generates a random number, one that generates a random number modulo a specified number, and one that generates a random number that is a certain number of bits in size. I considered randomly generating each piece of the number separately, or filling it all with random bytes, but I haven't yet decided on what the interface for the RNG state should look like, given that the number class is templated, so each "digit" in the class can vary in size, so the either the RNG class or the method that uses it has to account for this disparity.

Advertisement

What does "generating a random number" mean? Is it a arbitrary precision decimal class, otherwise what should the maximum number be? Anyway, each of these can be achieved efficiently by simply generating each digit at random followed by some post-processing. I don't think having a templated digit size should be a problem - the generation process is much more efficient if the digits are powers of two, but I think an algorithm can handle arbitrary digit sizes (and you can always specialize as needed).

So for the interface I'd go with a function like "int<digit_size> rand_range(int<digit_size> n, random_engine engine)" or something like that, along with "rand_bits" that takes the number of bits needed (ideally as a plain unsigned integer, not an arbitrary precision integer).

The RNG state itself should IMHO be separate from the way it is used in the code, it's not a "random digit generator", it's a PRNG and so should behave that way, with functions to generate random bits, a random integer, a random double, and so on... (or just random bits, if you opt for separating the underlying random generator from the distribution to generate, which has its advantages). Another alternative is to extend your RNG class with a random arbitrary precision integer function, but I don't really like that approach as it couples the two classes.

I would try and promote interoperability with other RNG states, how were you thinking of proceeding? Is your RNG state custom-made specifically for your class, or were you planning on basing it on something standard (e.g. the C++11 random engines)?

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”


What does "generating a random number" mean? Is it a arbitrary precision decimal class, otherwise what should the maximum number be?

I guess I could remove the most general function; it would have generated a number up to as large as allowed by the implementation. If one really wanted this behavior, they could generate a random number with a bit count of either (digit_bits * max_digits) or 2^size_type_bits - 1, whichever is less, ignoring the fact that it'd potentially fail to allocate. The other two have an upper bound defined by either a number that is the divisor, or a number of bits.


I would try and promote interoperability with other RNG states, how were you thinking of proceeding? Is your RNG state custom-made specifically for your class, or were you planning on basing it on something standard (e.g. the C++11 random engines)?

I was kind of torn between an approach where each RNG class has a unified interface, with functions to return an integer, return a normalized float, fill an array of characters... etc., and an approach where the class could have a templated get function, which could have specializations to efficiently return valid values of a given type, and fallback to writing over the value with random characters. Both have pros and cons.

The standard approach is clearly the most flexible, but also the least practical; it provides values in a way that need a large amount of processing and shuffling in order to be used, by generally returning a single value, which falls short when you need a large, complicated value, or a lot of such values. Worse is the fact that all of the types are templated everywhere, so there's the fact that the digit type could be larger or smaller than the RNG's generated value, requiring more complicated code to fill the values.

Hence, why I'm leaning toward the first two mentioned approaches. Ideally, I'd like this RNG class to have the ability to be used with other classes, with ease. If possible, I'd like for the majority of the bit fiddling to be in the RNG's implementation, rather than the caller having to know too much about what will come out of it. Do you have any suggestions?

one that generates a random number modulo a specified number

Then it isn’t very random. Modulus is absolutely the wrong way to give a range of random numbers from 0 to N, as it favors lower numbers.

This explains why:
http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator

However do not use a while loop to generate an unbiased range of random numbers as he suggested; by definition that could be an infinite loop, or at least be severely (order of seconds) slow. Imagine that the range of rand() is 0-268,435,456 (which RAND_MAX is allowed to be by the standard) and you looped until it picked a number below 3.

The correct way is:

unsigned int Rand( unsigned int _uiMax ) {
    double dRand = static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    return static_cast<unsigned int>(dRand * (_uiMax - 1.0)); // Remove “ - 1.0” to be inclusive instead of exclusive.
}


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?


Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?

You don't. You basically do what BlueRaja suggested in the comments of that stackoverflow question, which is an improved form of rejection sampling which sort of does the floating-point division using integers. Btw, I didn't mention that because when you said "modulo" I thought you meant "generating integers in the range 0..n - 1" without referring to any specific generation algorithm. Obviously just taking the modulo tends to have a slight bias, unfortunately.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

I'm using this call mostly for the Rabin-Miller Primality Test; is a uniform distribution actually required for this task?

Moreover, I understand what skews the uniformity of a random number generator. I'm just looking to write an interface, here.


I'm using this call mostly for the Rabin-Miller Primality Test; is a uniform distribution actually required for this task?

Yes. If the base is not chosen uniformly then the worst-case bounds of the primality test no longer hold. Though as usual, in practice, a very small bias is unlikely to make a difference in the operation of the program, but if you can get it right, you should, because it will screw you over eventually (the bias actually gets quite measurable in some cases). But, yeah, let's get this thread back on its rails.

In my opinion, if you'd like to keep it as "pure" as possible then all you fundamentally need is a function that generates a random integer of n - uniformly distributed - digits (or bits - which unit you choose is a tradeoff between putting the digit/bit conversion stuff into the random engine or elsewhere in the code) for arbitrary n, which makes the integer uniformly distributed between 0 and D^n - 1 (or 2^n - 1, if bits). At that point your random engine is finished, and you can specialize a distribution class to take those integers and generate any distribution you'd like from them, such as a more specific range, floating-point values, even numbers only, etc, etc.. whatever.

So that kind of builds on top of the design of <random>, and in fact with some work it might be possible to implement your class such that it can be directly dropped into existing <random> classes so you can seamlessly leverage the functionality that's already there, but I don't know enough to tell you how to do that. And of course it's not the only approach, but it's the one I find the most logical from a design point of view (that you should try to make your integer class behave like an integer, such that you only need to implement a transitional function that generates the simplest uniform distribution possible from a bitstring, at which point all the bit-fiddling is over and you can use existing numerical methods to transform that distribution into anything you need).


I guess I could remove the most general function; it would have generated a number up to as large as allowed by the implementation. If one really wanted this behavior, they could generate a random number with a bit count of either (digit_bits * max_digits) or 2^size_type_bits - 1, whichever is less, ignoring the fact that it'd potentially fail to allocate. The other two have an upper bound defined by either a number that is the divisor, or a number of bits.

Yes, I think you should remove that function. Arbitrary precision arithmetic suggests there is no upper bound (besides available memory) and conceptually it doesn't make much sense to speak of generating a random number on an unbounded interval. The n-digit/bit version I mentioned previously is general enough.

(btw: have you considered using the Baillie-PSW primality test? it's a bit of the unsung hero but tends to be much more efficient than Miller-Rabin computationally speaking from a practical perspective, though it is somewhat of a conjecture mathematically speaking - anyway, just throwing it out there)

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Two of your three functions are easly, luckily.

Generating a full 2048-bit random number isn't particularly hard, nor is generating a N-bit number. Simply generate random bytes or words (or whatever is appropriate, according to the max value generated by your PRNG) and copy the raw bits into the memory buffer where the arbitrary precision number (or 2048 bit number) lives.

No special treatment necessary other than choosing a quality PRNG.

Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?

The only valid approach is to use the while-loop technique. Generate numbers and reject the ones that don't fit. As much as you may hate that idea, but if bias is not acceptable, there's hardly a way around.

Turning a double into a 2048 bit number works of course, but is an even worse approach than modulo. Modulo will bias your distribution, but it is at least mostly pseudo-random (actually it's not just "mostly" pseudo-random, it is pseudo-random, only biased), and it contains all possible numbers.

You only have 64 bits of information in a double, so deriving a 2048 bit number from that, 96.8% of the bits in that number will be non-random, no matter what you try (multiply, hash, whatever... the entropy isn't getting any bigger).

That's 1984 bits unused, so only about one in 10597 possible numbers can be generated (read as: you are using 0% of the possible numbers). For pretty much every application, that's catastrophic.

In order to make the while loop a bit less prone to having infinite runtime, you should first determine how many bits you need, and use your generate-N-bits random function rather than the full-range function. That way, the worst case likelihood that a number must be rejected can be reduced by about 50% (not more, unluckily, since 50% of the numbers with N bits fall into the range of the highest order bit, so in the worst case, if you chose a limit of 2n+1, you will reject 49.999% of your numbers -- nasty, but that's still much better than rejecting 99.999%).

Not much more you can do about it unless it is acceptable to have a bias. If that is acceptable, you could just clear the highest bit if the generated number is above the threshold, or you could only regenerate the most significant NUM_WORD_BITS bits and try again, or you could subtract a pseudorandom constant. All of these will still produce "mostly random" values, but will be just as badly biased (or even worse!) as using a modulo would be.

I would not be tempted to use a while loop, experience says that "if it can go wrong, it will.... at the most inopportune time".

I also don't like reinventing the wheel.

Have a look at this page.

https://www.openssl.org/docs/crypto/rand.html

It explains how they generate large random numbers 20 bits at a time.

I would be tempted to at least try using OpenSSL and see what happens.

This topic is closed to new replies.

Advertisement