Random Arbitrary Precision Numbers

Started by
21 comments, last by iMalc 10 years ago

I would not be tempted to use a while loop, experience says that "if it can go wrong, it will.... at the most inopportune time".

I also don't like reinventing the wheel.
[...]
It explains how they generate large random numbers 20 bits at a time.
This is not the problem, though, and they do an entirely different thing with an entirely different goal in mind. The only "special" thing on that page is that they use a cryptographic hash function as PRNG. That's exactly the same approach that e.g. the Linux kernel uses for /dev/random.

The goal there is to generate a pseudorandom sequence that extracts true random from a pool of true random entropy. This isn't needed in the OP's case (at least, there was no mention of that).

Generating large random numbers 20 bits (it's bytes, by the way, so rather 160 bits, but that does not matter) or in fact any number of bits at a time is trivial. I would use 32 or 64 bits since fast quality PRNGs that output 32/64 bits at a time are readily available, but that's only an implementation detail. Using a 160 bit cryptographic hash works just fine, too (it is however considerably slower).

The point where the problem gets absolutely not trivial is if you need random numbers in a particular range which doesn't correspond to exactly some number of bits.

Say you have a PRNG that produces 8-bit random numbers and you need a 24-bit "big" random number (works the same for 24 bits as it does for 4096 bits). Simply run the generator 3 times, and write out 8 bits every time, no problem. Want only 21 bits? Or 18 bits? No problem. Use logical AND to mask out the bits that you don't want.

Now assume you don't want numbers to range from 0 to 16777215 or another 2n-1 value, but only from 0 to 16772083. Using 23 bits won't do since that would be 8383475 numbers short, but using 24 bits will give you 5132 numbers outside the allowed range.Unluckily, that number doesn't map quite so good to "bits".

What do you do now?

Advertisement

Simply generate random bytes or words (or whatever is appropriate, according to the max value generated by your PRNG) and copy the raw bits into the memory buffer where the arbitrary precision number (or 2048 bit number) lives.



No special treatment necessary other than choosing a quality PRNG.

This topic is quickly veering off course. I'm asking for the best way to use a PRNG with a templated output size to randomize a AP number class with templated digit size. The 2048 bits was an example to throw out the floating point conversion, please pay it no mind. I will eliminate bias later, that is a problem that is impossible to solve if I haven't yet begun to generate random numbers.


In my opinion, if you'd like to keep it as "pure" as possible then all you fundamentally need is a function that generates a random integer of n - uniformly distributed - digits (or bits - which unit you choose is a tradeoff between putting the digit/bit conversion stuff into the random engine or elsewhere in the code) for arbitrary n, which makes the integer uniformly distributed between 0 and D^n - 1 (or 2^n - 1, if bits). At that point your random engine is finished, and you can specialize a distribution class to take those integers and generate any distribution you'd like from them, such as a more specific range, floating-point values, even numbers only, etc, etc.. whatever.

I'm not sure that I follow. The standard random classes return a templated type, requiring significant legwork for anyone to actually make use of it unless they provide their own parameters for that particular use, essentially making their own RNG that can't be used elsewhere. I hope to alleviate this issue by having the RNG class do most of the work.


(btw: have you considered using the Baillie-PSW primality test?

Not yet, I'll look into it once this class is up and running again. Thank you for the suggestion.

Two ways I'm reading "arbitrary precision number", and perhaps a bit of clarification would help others to answer.

1- A known-at-call-time fixed precision. Arbitrary in the sense that you can pass in a desired precision, and get a number back that satisfies that precision.
"Arbitrary" in the polymorphic sense.

2- An unknown-at-call-time variable precision number. Arbitrary in the sense that it grows more digits if your expression requires more digits. This is the more conventional use of "arbitrary precision math" on computers.

If case 1, your answer is clear from above posters. You break it down to the task of generating 1 digit, which can in turn be broken down into the task of generating 1 bit, then repeat as necessary.

If case 2, you're hosed, since the probability of any particular number selected from the space of potentially infinite numbers having less than X digits is effectively zero, for really any fixed value of X. Your computer has fixed memory in which to store the generated number, thus X is defined, thus you're either screwed or you're going to bound your number in some way which renders it no longer "arbitrary-precision".

Two ways I'm reading "arbitrary precision number", and perhaps a bit of clarification would help others to answer.
I think this line from the last post explains:

use a PRNG with a templated output size to randomize a AP number class with templated digit size

So basically, it is exactly as expected, running a simple for(...) loop for the easy case. Still, that leaves the "not so easy" case which was described as "generates a random number modulo a specified number", so basically generating a random number with some upper limit that isn't incidentially a power of two.

And as clarified later:

I'm using this call mostly for the Rabin-Miller Primality Test

an unbiased distribution is needed (so... no modulo).

If case 2, you're hosed, since the probability of any particular number selected from the space of potentially infinite numbers having less than X digits is effectively zero, for really any fixed value of X. Your computer has fixed memory in which to store the generated number, thus X is defined, thus you're either screwed or you're going to bound your number in some way which renders it no longer "arbitrary-precision".

It grows, and there are functions that specify an upper bound. I'm not hosed in any way.

running a simple for(...) loop

No, not a simple for loop. Not only do the types not necessarily match, but who knows if all of the bits are used? For instance, rand() returns an int, often 32 bits, but RAND_MAX can be a 16 bit value, meaning that if you try to guess based on the size of the return type, you'll get garbage if you fail to take into account that not all of the bits are used.

Still, that leaves the "not so easy" case which was described as "generates a random number modulo a specified number", so basically generating a random number with some upper limit that isn't incidentially a power of two.

And as clarified later:

I'm using this call mostly for the Rabin-Miller Primality Test


an unbiased distribution is needed (so... no modulo).

I'm still trying to generate numbers, here. Can we please forget the modulus? I'm starting to become frustrated.


No, not a simple for loop. Not only do the types not necessarily match, but who knows if all of the bits are used? For instance, rand() returns an int, often 32 bits, but RAND_MAX can be a 16 bit value, meaning that if you try to guess based on the size of the return type, you'll get garbage if you fail to take into account that not all of the bits are used.

I fail to see your problem. Nobody said something about guessing. Simply use a random number generator with a known number of valid output bits. This isn't hard, these exist in many flavors (such as MT, to name one well-known generator, but a 64-bit xorshift which is like 3 lines of code would do just fine, too).

So your generator outputs, say, 32 bits, and you need a 1024 bit number? Initialize an int32_t pointer to the beginning of your AB number, and run a for() loop with 32 iterations, writing out one number every time and incrementing the pointer. Need a 2048 bit number? Well, run 64 iterations. Need a 2040 bit number? Run 63 iterations and fill the last 3 bytes either one-by-one, or something similar.

Why doesn't this work?

Technically the random generator should just return an endless stream of uniformly distributed bits, and not any particular word size. From that you can get anything you need by sampling it appropriately. As I said previously, that is what I would do, so that the PRNG itself does not need to deal with digit sizes and is as general as possible, and then have another class (or even a method of the integer class) take that PRNG and use it to generate random digits to produce integers of a given bit length, digit length, inside a range, etc...

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”


Simply use a random number generator with a known number of valid output bits.

Templated. I don't know what the RNG class will be, so I can't control it.


Technically the random generator should just return an endless stream of uniformly distributed bits, and not any particular word size.

However, it returns data in discrete words. I don't even know what data type to use to hold an endless stream of bits; at some point, I must use it piece by piece. Perhaps I am misunderstanding, can you show pseudocode?


However, it returns data in discrete words. I don't even know what data type to use to hold an endless stream of bits; at some point, I must use it piece by piece. Perhaps I am misunderstanding, can you show pseudocode?

It's not a data type, it's an interface. Even something as simple as a method which returns the N next bits in a buffer, but I suppose things like streams and so on can work too. That is the RNG, nothing about integers here yet. I'm still a bit confused, what exactly do you have control over? Can you change the RNG's interface?

Something like this (C++ish, bear with me):


class RNG
{
    // updates the RNG and returns len uniformly distributed bits in buffer
    void next_bits(void *buffer, size_t len);
}

template <size_t N> // digit size (bits? number of symbols in each digit?)
class MyInteger<N>
{
    // generates an integer of length "len" bits (this could also be rand_digits, if it's easier)
    MyInteger<N> rand_bits(RNG &rng, size_t len)
    {
        // build up your integer digit by digit using make_digit below
        // and then process the last digit to pad it to the right length
    }

    // generates an integer between 0 and range exclusive
    MyInteger<N> rand_range(RNG &rng, MyInteger range)
    {
        // use rand_bits to get a sufficiently large integer, and then
        // sample it (rejection sampling, etc...) to convert that to a
        // uniform distribution between 0 and range
    }
}

// helper function to generate templated digits
// (assuming you have a digit type, adjust as needed)

template <size_t N>
private digit make_digit(RNG &rng)
{
    // bit-fiddling using an RNG with the digit size N to generate a uniformly distributed digit
    // for reasonable values of N, this could be as simple as taking the first N bits from the generator verbatim
}

As you can see here the integer class never directly uses the RNG, it just defers to make_digit() to do what is necessary to generate a uniformly distributed digit. Then the RNG interface can change from the one I suggested, to generating bits 64 bits at a time (requiring some modification of make_digit to handle that), and so on. The point is that the bit fiddling conversion stuff happens only in make_digit (which can be specialized, should you care to) - once you're past that, you can generate proper integers with a given digit or bit size, that are uniformly distributed up to some arbitrary range, and you are then free to do whatever you want without being dependent on the RNG.

That's just a back-of-the-napkin design, it can (and probably should) be improved, because I'm sure there are ways to make it better. The "next N bits" RNG interface is about as theoretically pure as it gets, but practically speaking it may be better for most digit sizes for make_digit() to handle them one chunk of bits at a time, so I suppose it depends on your needs. Maybe some of the functions could be moved around as well, I don't know if it's a good idea for the integer class to generate itself randomly, *shrugs*. I'm just giving some ideas here.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

My method has a signature like this:


template <typename digit_>
class BigNumber{
    template <class rng_ = ectara::Random>
    static BigNumber<digit_> genRandomNumber(sizeType bits, rng_ & rand);
};

There's a RNG object of some sort that gets passed in; I haven't decided on its interface. I am trying to figure it out, in the context of generating these digits. All the standard random classes provide is a single unsigned integer return value at a time; should I mandate that these RNG objects have a method that fills a buffer of bytes?

This topic is closed to new replies.

Advertisement