• Create Account

## Random Arbitrary Precision Numbers

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

26 replies to this topic

### #1Ectara  Members

3097
Like
0Likes
Like

Posted 16 March 2014 - 10:11 PM

I have an arbitrary precision number class, and I'm looking for a way to generate numbers in a thread-safe way by having the caller pass in the RNG state. What do you think is the best way to generate a random arbitrary precision number? I have three random functions, one that generates a random number, one that generates a random number modulo a specified number, and one that generates a random number that is a certain number of bits in size. I considered randomly generating each piece of the number separately, or filling it all with random bytes, but I haven't yet decided on what the interface for the RNG state should look like, given that the number class is templated, so each "digit" in the class can vary in size, so the either the RNG class or the method that uses it has to account for this disparity.

Edited by Ectara, 16 March 2014 - 11:40 PM.

### #2Bacterius  Members

13100
Like
3Likes
Like

Posted 16 March 2014 - 10:27 PM

What does "generating a random number" mean? Is it a arbitrary precision decimal class, otherwise what should the maximum number be? Anyway, each of these can be achieved efficiently by simply generating each digit at random followed by some post-processing. I don't think having a templated digit size should be a problem - the generation process is much more efficient if the digits are powers of two, but I think an algorithm can handle arbitrary digit sizes (and you can always specialize as needed).

So for the interface I'd go with a function like "int<digit_size> rand_range(int<digit_size> n, random_engine engine)" or something like that, along with "rand_bits" that takes the number of bits needed (ideally as a plain unsigned integer, not an arbitrary precision integer).

The RNG state itself should IMHO be separate from the way it is used in the code, it's not a "random digit generator", it's a PRNG and so should behave that way, with functions to generate random bits, a random integer, a random double, and so on... (or just random bits, if you opt for separating the underlying random generator from the distribution to generate, which has its advantages). Another alternative is to extend your RNG class with a random arbitrary precision integer function, but I don't really like that approach as it couples the two classes.

I would try and promote interoperability with other RNG states, how were you thinking of proceeding? Is your RNG state custom-made specifically for your class, or were you planning on basing it on something standard (e.g. the C++11 random engines)?

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #3Ectara  Members

3097
Like
0Likes
Like

Posted 16 March 2014 - 11:01 PM

What does "generating a random number" mean? Is it a arbitrary precision decimal class, otherwise what should the maximum number be?

I guess I could remove the most general function; it would have generated a number up to as large as allowed by the implementation. If one really wanted this behavior, they could generate a random number with a bit count of either (digit_bits * max_digits) or 2^size_type_bits - 1, whichever is less, ignoring the fact that it'd potentially fail to allocate. The other two have an upper bound defined by either a number that is the divisor, or a number of bits.

I would try and promote interoperability with other RNG states, how were you thinking of proceeding? Is your RNG state custom-made specifically for your class, or were you planning on basing it on something standard (e.g. the C++11 random engines)?

I was kind of torn between an approach where each RNG class has a unified interface, with functions to return an integer, return a normalized float, fill an array of characters... etc., and an approach where the class could have a templated get function, which could have specializations to efficiently return valid values of a given type, and fallback to writing over the value with random characters. Both have pros and cons.

The standard approach is clearly the most flexible, but also the least practical; it provides values in a way that need a large amount of processing and shuffling in order to be used, by generally returning a single value, which falls short when you need a large, complicated value, or a lot of such values. Worse is the fact that all of the types are templated everywhere, so there's the fact that the digit type could be larger or smaller than the RNG's generated value, requiring more complicated code to fill the values.

Hence, why I'm leaning toward the first two mentioned approaches. Ideally, I'd like this RNG class to have the ability to be used with other classes, with ease. If possible, I'd like for the majority of the bit fiddling to be in the RNG's implementation, rather than the caller having to know too much about what will come out of it. Do you have any suggestions?

### #4L. Spiro  Members

24826
Like
1Likes
Like

Posted 16 March 2014 - 11:33 PM

one that generates a random number modulo a specified number

Then it isn’t very random. Modulus is absolutely the wrong way to give a range of random numbers from 0 to N, as it favors lower numbers.

This explains why:
http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator

However do not use a while loop to generate an unbiased range of random numbers as he suggested; by definition that could be an infinite loop, or at least be severely (order of seconds) slow. Imagine that the range of rand() is 0-268,435,456 (which RAND_MAX is allowed to be by the standard) and you looped until it picked a number below 3.

The correct way is:

unsigned int Rand( unsigned int _uiMax ) {
double dRand = static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
return static_cast<unsigned int>(dRand * (_uiMax - 1.0)); // Remove “ - 1.0” to be inclusive instead of exclusive.
}

L. Spiro

Edited by L. Spiro, 17 March 2014 - 07:44 AM.

### #5Ectara  Members

3097
Like
0Likes
Like

Posted 16 March 2014 - 11:39 PM

Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?

Edited by Ectara, 16 March 2014 - 11:40 PM.

### #6Bacterius  Members

13100
Like
0Likes
Like

Posted 16 March 2014 - 11:47 PM

Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?

You don't. You basically do what BlueRaja suggested in the comments of that stackoverflow question, which is an improved form of rejection sampling which sort of does the floating-point division using integers. Btw, I didn't mention that because when you said "modulo" I thought you meant "generating integers in the range 0..n - 1" without referring to any specific generation algorithm. Obviously just taking the modulo tends to have a slight bias, unfortunately.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #7Ectara  Members

3097
Like
0Likes
Like

Posted 17 March 2014 - 12:00 AM

I'm using this call mostly for the Rabin-Miller Primality Test; is a uniform distribution actually required for this task?

Moreover, I understand what skews the uniformity of a random number generator. I'm just looking to write an interface, here.

### #8Bacterius  Members

13100
Like
4Likes
Like

Posted 17 March 2014 - 12:40 AM

I'm using this call mostly for the Rabin-Miller Primality Test; is a uniform distribution actually required for this task?

Yes. If the base is not chosen uniformly then the worst-case bounds of the primality test no longer hold. Though as usual, in practice, a very small bias is unlikely to make a difference in the operation of the program, but if you can get it right, you should, because it will screw you over eventually (the bias actually gets quite measurable in some cases). But, yeah, let's get this thread back on its rails.

In my opinion, if you'd like to keep it as "pure" as possible then all you fundamentally need is a function that generates a random integer of n - uniformly distributed - digits (or bits - which unit you choose is a tradeoff between putting the digit/bit conversion stuff into the random engine or elsewhere in the code) for arbitrary n, which makes the integer uniformly distributed between 0 and D^n - 1 (or 2^n - 1, if bits). At that point your random engine is finished, and you can specialize a distribution class to take those integers and generate any distribution you'd like from them, such as a more specific range, floating-point values, even numbers only, etc, etc.. whatever.

So that kind of builds on top of the design of <random>, and in fact with some work it might be possible to implement your class such that it can be directly dropped into existing <random> classes so you can seamlessly leverage the functionality that's already there, but I don't know enough to tell you how to do that. And of course it's not the only approach, but it's the one I find the most logical from a design point of view (that you should try to make your integer class behave like an integer, such that you only need to implement a transitional function that generates the simplest uniform distribution possible from a bitstring, at which point all the bit-fiddling is over and you can use existing numerical methods to transform that distribution into anything you need).

I guess I could remove the most general function; it would have generated a number up to as large as allowed by the implementation. If one really wanted this behavior, they could generate a random number with a bit count of either (digit_bits * max_digits) or 2^size_type_bits - 1, whichever is less, ignoring the fact that it'd potentially fail to allocate. The other two have an upper bound defined by either a number that is the divisor, or a number of bits.

Yes, I think you should remove that function. Arbitrary precision arithmetic suggests there is no upper bound (besides available memory) and conceptually it doesn't make much sense to speak of generating a random number on an unbounded interval. The n-digit/bit version I mentioned previously is general enough.

(btw: have you considered using the Baillie-PSW primality test? it's a bit of the unsung hero but tends to be much more efficient than Miller-Rabin computationally speaking from a practical perspective, though it is somewhat of a conjecture mathematically speaking - anyway, just throwing it out there)

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #9samoth  Members

8927
Like
0Likes
Like

Posted 17 March 2014 - 07:38 AM

Two of your three functions are easly, luckily.

Generating a full 2048-bit random number isn't particularly hard, nor is generating a N-bit number. Simply generate random bytes or words (or whatever is appropriate, according to the max value generated by your PRNG) and copy the raw bits into the memory buffer where the arbitrary precision number (or 2048 bit number) lives.

No special treatment necessary other than choosing a quality PRNG.

Then it isn’t very random.

Ordinarily, I'd agree, but how do I cast a 2048 bit number to a double?

The only valid approach is to use the while-loop technique. Generate numbers and reject the ones that don't fit. As much as you may hate that idea, but if bias is not acceptable, there's hardly a way around.

Turning a double into a 2048 bit number works of course, but is an even worse approach than modulo. Modulo will bias your distribution, but it is at least mostly pseudo-random (actually it's not just "mostly" pseudo-random, it is pseudo-random, only biased), and it contains all possible numbers.

You only have 64 bits of information in a double, so deriving a 2048 bit number from that, 96.8% of the bits in that number will be non-random, no matter what you try (multiply, hash, whatever... the entropy isn't getting any bigger).

That's 1984 bits unused, so only about one in 10597 possible numbers can be generated (read as: you are using 0% of the possible numbers). For pretty much every application, that's catastrophic.

In order to make the while loop a bit less prone to having infinite runtime, you should first determine how many bits you need, and use your generate-N-bits random function rather than the full-range function. That way, the worst case likelihood that a number must be rejected can be reduced by about 50% (not more, unluckily, since 50% of the numbers with N bits fall into the range of the highest order bit, so in the worst case, if you chose a limit of 2n+1, you will reject 49.999% of your numbers -- nasty, but that's still much better than rejecting 99.999%).

Not much more you can do about it unless it is acceptable to have a bias. If that is acceptable, you could just clear the highest bit if the generated number is above the threshold, or you could only regenerate the most significant NUM_WORD_BITS bits and try again, or you could subtract a pseudorandom constant. All of these will still produce "mostly random" values, but will be just as badly biased (or even worse!) as using a modulo would be.

Edited by samoth, 17 March 2014 - 07:44 AM.

### #10Stainless  Members

1875
Like
0Likes
Like

Posted 17 March 2014 - 08:57 AM

I would not be tempted to use a while loop, experience says that "if it can go wrong, it will.... at the most inopportune time".

I also don't like reinventing the wheel.

https://www.openssl.org/docs/crypto/rand.html

It explains how they generate large random numbers 20 bits at a time.

I would be tempted to at least try using OpenSSL and see what happens.

### #11samoth  Members

8927
Like
0Likes
Like

Posted 17 March 2014 - 09:33 AM

I would not be tempted to use a while loop, experience says that "if it can go wrong, it will.... at the most inopportune time".

I also don't like reinventing the wheel.
[...]
It explains how they generate large random numbers 20 bits at a time.
This is not the problem, though, and they do an entirely different thing with an entirely different goal in mind. The only "special" thing on that page is that they use a cryptographic hash function as PRNG. That's exactly the same approach that e.g. the Linux kernel uses for /dev/random.

The goal there is to generate a pseudorandom sequence that extracts true random from a pool of true random entropy. This isn't needed in the OP's case (at least, there was no mention of that).

Generating large random numbers 20 bits (it's bytes, by the way, so rather 160 bits, but that does not matter) or in fact any number of bits at a time is trivial. I would use 32 or 64 bits since fast quality PRNGs that output 32/64 bits at a time are readily available, but that's only an implementation detail. Using a 160 bit cryptographic hash works just fine, too (it is however considerably slower).

The point where the problem gets absolutely not trivial is if you need random numbers in a particular range which doesn't correspond to exactly some number of bits.

Say you have a PRNG that produces 8-bit random numbers and you need a 24-bit "big" random number (works the same for 24 bits as it does for 4096 bits). Simply run the generator 3 times, and write out 8 bits every time, no problem. Want only 21 bits? Or 18 bits? No problem. Use logical AND to mask out the bits that you don't want.

Now assume you don't want numbers to range from 0 to 16777215 or another 2n-1 value, but only from 0 to 16772083. Using 23 bits won't do since that would be 8383475 numbers short, but using 24 bits will give you 5132 numbers outside the allowed range.Unluckily, that number doesn't map quite so good to "bits".

What do you do now?

### #12Ectara  Members

3097
Like
0Likes
Like

Posted 17 March 2014 - 03:37 PM

Simply generate random bytes or words (or whatever is appropriate, according to the max value generated by your PRNG) and copy the raw bits into the memory buffer where the arbitrary precision number (or 2048 bit number) lives.

No special treatment necessary other than choosing a quality PRNG.

This topic is quickly veering off course. I'm asking for the best way to use a PRNG with a templated output size to randomize a AP number class with templated digit size. The 2048 bits was an example to throw out the floating point conversion, please pay it no mind. I will eliminate bias later, that is a problem that is impossible to solve if I haven't yet begun to generate random numbers.

In my opinion, if you'd like to keep it as "pure" as possible then all you fundamentally need is a function that generates a random integer of n - uniformly distributed - digits (or bits - which unit you choose is a tradeoff between putting the digit/bit conversion stuff into the random engine or elsewhere in the code) for arbitrary n, which makes the integer uniformly distributed between 0 and D^n - 1 (or 2^n - 1, if bits). At that point your random engine is finished, and you can specialize a distribution class to take those integers and generate any distribution you'd like from them, such as a more specific range, floating-point values, even numbers only, etc, etc.. whatever.

I'm not sure that I follow. The standard random classes return a templated type, requiring significant legwork for anyone to actually make use of it unless they provide their own parameters for that particular use, essentially making their own RNG that can't be used elsewhere. I hope to alleviate this issue by having the RNG class do most of the work.

(btw: have you considered using the Baillie-PSW primality test?

Not yet, I'll look into it once this class is up and running again. Thank you for the suggestion.

Edited by Ectara, 17 March 2014 - 03:42 PM.

### #13Oolala  Members

1102
Like
0Likes
Like

Posted 19 March 2014 - 07:12 AM

Two ways I'm reading "arbitrary precision number", and perhaps a bit of clarification would help others to answer.

1- A known-at-call-time fixed precision.  Arbitrary in the sense that you can pass in a desired precision, and get a number back that satisfies that precision.
"Arbitrary" in the polymorphic sense.

2- An unknown-at-call-time variable precision number.  Arbitrary in the sense that it grows more digits if your expression requires more digits.  This is the more conventional use of "arbitrary precision math" on computers.

If case 1, your answer is clear from above posters.  You break it down to the task of generating 1 digit, which can in turn be broken down into the task of generating 1 bit, then repeat as necessary.

If case 2, you're hosed, since the probability of any particular number selected from the space of potentially infinite numbers having less than X digits is effectively zero, for really any fixed value of X.  Your computer has fixed memory in which to store the generated number, thus X is defined, thus you're either screwed or you're going to bound your number in some way which renders it no longer "arbitrary-precision".

### #14samoth  Members

8927
Like
0Likes
Like

Posted 19 March 2014 - 09:30 AM

Two ways I'm reading "arbitrary precision number", and perhaps a bit of clarification would help others to answer.
I think this line from the last post explains:

use a PRNG with a templated output size to randomize a AP number class with templated digit size

So basically, it is exactly as expected, running a simple for(...) loop for the easy case. Still, that leaves the "not so easy" case which was described as "generates a random number modulo a specified number", so basically generating a random number with some upper limit that isn't incidentially a power of two.

And as clarified later:

I'm using this call mostly for the Rabin-Miller Primality Test

an unbiased distribution is needed (so... no modulo).

### #15Ectara  Members

3097
Like
0Likes
Like

Posted 19 March 2014 - 05:51 PM

If case 2, you're hosed, since the probability of any particular number selected from the space of potentially infinite numbers having less than X digits is effectively zero, for really any fixed value of X. Your computer has fixed memory in which to store the generated number, thus X is defined, thus you're either screwed or you're going to bound your number in some way which renders it no longer "arbitrary-precision".

It grows, and there are functions that specify an upper bound. I'm not hosed in any way.

running a simple for(...) loop

No, not a simple for loop. Not only do the types not necessarily match, but who knows if all of the bits are used? For instance, rand() returns an int, often 32 bits, but RAND_MAX can be a 16 bit value, meaning that if you try to guess based on the size of the return type, you'll get garbage if you fail to take into account that not all of the bits are used.

Still, that leaves the "not so easy" case which was described as "generates a random number modulo a specified number", so basically generating a random number with some upper limit that isn't incidentially a power of two.

And as clarified later:

I'm using this call mostly for the Rabin-Miller Primality Test

an unbiased distribution is needed (so... no modulo).

I'm still trying to generate numbers, here. Can we please forget the modulus? I'm starting to become frustrated.

### #16samoth  Members

8927
Like
0Likes
Like

Posted 19 March 2014 - 06:33 PM

No, not a simple for loop. Not only do the types not necessarily match, but who knows if all of the bits are used? For instance, rand() returns an int, often 32 bits, but RAND_MAX can be a 16 bit value, meaning that if you try to guess based on the size of the return type, you'll get garbage if you fail to take into account that not all of the bits are used.

I fail to see your problem. Nobody said something about guessing. Simply use a random number generator with a known number of valid output bits. This isn't hard, these exist in many flavors (such as MT, to name one well-known generator, but a 64-bit xorshift which is like 3 lines of code would do just fine, too).

So your generator outputs, say, 32 bits, and you need a 1024 bit number? Initialize an int32_t pointer to the beginning of your AB number, and run a for() loop with 32 iterations, writing out one number every time and incrementing the pointer. Need a 2048 bit number? Well, run 64 iterations. Need a 2040 bit number? Run 63 iterations and fill the last 3 bytes either one-by-one, or something similar.

Why doesn't this work?

Edited by samoth, 19 March 2014 - 06:34 PM.

### #17Bacterius  Members

13100
Like
0Likes
Like

Posted 19 March 2014 - 06:41 PM

Technically the random generator should just return an endless stream of uniformly distributed bits, and not any particular word size. From that you can get anything you need by sampling it appropriately. As I said previously, that is what I would do, so that the PRNG itself does not need to deal with digit sizes and is as general as possible, and then have another class (or even a method of the integer class) take that PRNG and use it to generate random digits to produce integers of a given bit length, digit length, inside a range, etc...

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #18Ectara  Members

3097
Like
0Likes
Like

Posted 19 March 2014 - 06:50 PM

Simply use a random number generator with a known number of valid output bits.

Templated. I don't know what the RNG class will be, so I can't control it.

Technically the random generator should just return an endless stream of uniformly distributed bits, and not any particular word size.

However, it returns data in discrete words. I don't even know what data type to use to hold an endless stream of bits; at some point, I must use it piece by piece. Perhaps I am misunderstanding, can you show pseudocode?

### #19Bacterius  Members

13100
Like
0Likes
Like

Posted 19 March 2014 - 07:14 PM

However, it returns data in discrete words. I don't even know what data type to use to hold an endless stream of bits; at some point, I must use it piece by piece. Perhaps I am misunderstanding, can you show pseudocode?

It's not a data type, it's an interface. Even something as simple as a method which returns the N next bits in a buffer, but I suppose things like streams and so on can work too. That is the RNG, nothing about integers here yet. I'm still a bit confused, what exactly do you have control over? Can you change the RNG's interface?

Something like this (C++ish, bear with me):

class RNG
{
// updates the RNG and returns len uniformly distributed bits in buffer
void next_bits(void *buffer, size_t len);
}

template <size_t N> // digit size (bits? number of symbols in each digit?)
class MyInteger<N>
{
// generates an integer of length "len" bits (this could also be rand_digits, if it's easier)
MyInteger<N> rand_bits(RNG &rng, size_t len)
{
// build up your integer digit by digit using make_digit below
// and then process the last digit to pad it to the right length
}

// generates an integer between 0 and range exclusive
MyInteger<N> rand_range(RNG &rng, MyInteger range)
{
// use rand_bits to get a sufficiently large integer, and then
// sample it (rejection sampling, etc...) to convert that to a
// uniform distribution between 0 and range
}
}

// helper function to generate templated digits
// (assuming you have a digit type, adjust as needed)

template <size_t N>
private digit make_digit(RNG &rng)
{
// bit-fiddling using an RNG with the digit size N to generate a uniformly distributed digit
// for reasonable values of N, this could be as simple as taking the first N bits from the generator verbatim
}


As you can see here the integer class never directly uses the RNG, it just defers to make_digit() to do what is necessary to generate a uniformly distributed digit. Then the RNG interface can change from the one I suggested, to generating bits 64 bits at a time (requiring some modification of make_digit to handle that), and so on. The point is that the bit fiddling conversion stuff happens only in make_digit (which can be specialized, should you care to) - once you're past that, you can generate proper integers with a given digit or bit size, that are uniformly distributed up to some arbitrary range, and you are then free to do whatever you want without being dependent on the RNG.

That's just a back-of-the-napkin design, it can (and probably should) be improved, because I'm sure there are ways to make it better. The "next N bits" RNG interface is about as theoretically pure as it gets, but practically speaking it may be better for most digit sizes for make_digit() to handle them one chunk of bits at a time, so I suppose it depends on your needs. Maybe some of the functions could be moved around as well, I don't know if it's a good idea for the integer class to generate itself randomly, *shrugs*. I'm just giving some ideas here.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #20Ectara  Members

3097
Like
0Likes
Like

Posted 19 March 2014 - 08:34 PM

My method has a signature like this:

template <typename digit_>
class BigNumber{
template <class rng_ = ectara::Random>
static BigNumber<digit_> genRandomNumber(sizeType bits, rng_ & rand);
};


There's a RNG object of some sort that gets passed in; I haven't decided on its interface. I am trying to figure it out, in the context of generating these digits. All the standard random classes provide is a single unsigned integer return value at a time; should I mandate that these RNG objects have a method that fills a buffer of bytes?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.