Simple and fast random number generator

Started by
11 comments, last by LorenzoGatti 10 years, 10 months ago

Hey everyone,

I was not impressed with the random quality nor speed nor usability of the default rand() C function. So I took the best non-cryptographic RNG function I know of, and implemented it.

Here it is for your enjoyment! Usage is straightforward, speed varies between 5-12 times as fast than rand(), depending on the situation (inlining, register pressure, etc). Memory usage is low - only 16 bytes per RNG (you usually only need one).


#include <cstdint>
#include <ctime>


class FastRNG {
    /* Implementation of the Tyche-i RNG by Samuel Neves and Filipe Araujo.
       Period is expected to be 2^127, with a very good uniform distribution. */


public:
    /* If no seed is passed to the constructor it will use the current time
       in seconds as the seed. */
    FastRNG(uint64_t seed = time(0)) {
        a = seed << 32;
        b = seed & 0xffffffffu;
        c = 2654435769u;
        d = 1367130551u;


        for (int i = 0; i < 20; ++i) {
            randint();
        }
    }


    /* Returns a random integer between 0 and 2^32 - 1, inclusive. */
    uint32_t randint() {
        b = rotl32(b, 25) ^ c;
        d = rotl32(d, 24) ^ a;
        c -= d;
        a -= b;
        b = rotl32(b, 20) ^ c;
        d = rotl32(d, 16) ^ a;
        c -= d;
        a -= b;


        return b;
    }


    /* Returns a number between min and max, inclusive. Small statistical
       biases might occur using this method - expected bias is < 2^-32. */
    int32_t randrange(int32_t min, int32_t max) {
        if (min > max) {
            uint32_t t;
            t = min;
            min = max;
            max = t;
        } 


        return min + uint32_t((randint() * uint64_t(max - min + 1)) >> 32);
    }


    /* Returns a random double between 0.0 and 1.0 inclusive. */
    double rand() {
        return randint() * (1. / 4294967296.);
    }


private:
    uint32_t a;
    uint32_t b;
    uint32_t c;
    uint32_t d;


    static inline uint32_t rotl32(uint32_t x, const int n) {
        return (x << n) | (x >> (32 - n));
    }
};
Advertisement

I did an implementation of a linear congruential generator. It's at least twice as fast rand() on my machine, and the period is 2^63 instead of 2^32, which is the glibc implementation.

It's also reversible, meaning you have both int next() and int prev().

Tyche-i also seems reversible, how is the quality compared to an lcg?

My reversible lcg is on github


#ifndef RLCG_HPP
#define RLCG_HPP

#include <cstdint>

namespace rlcg {

namespace details {

template<class T>
constexpr bool isPowerOfTwo(T x){
    return (x & (x - 1)) == 0;
}

//constexpr implementation of euclids algorithm
constexpr uint64_t extendedEuclidY(uint64_t a, uint64_t b);
constexpr uint64_t extendedEuclidX(uint64_t a, uint64_t b){
    return (b==0) ? 1 : extendedEuclidY(b, a - b * (a / b));
}
constexpr uint64_t extendedEuclidY(uint64_t a, uint64_t b){
    return (b==0) ? 0 : extendedEuclidX(b, a - b * (a / b)) - (a / b) * extendedEuclidY(b, a - b * (a / b));
}

//modulus M, multiplicand A, increment C, least significant bits to discard D
template<uint64_t M = 1ul<<63ul, uint64_t A = 6364136223846793005, uint64_t C = 1442695040888963407, uint64_t D = 32>
class ReversibleLCG {
    static_assert(isPowerOfTwo(M), "M is not a power of two as it should be");
    uint64_t x;
public:
    ReversibleLCG(unsigned int seed) : x(seed){}
    unsigned int next() {
        x = (A * x + C) & (M - 1);
        return x >> D;
    }
    unsigned int prev() {
        const uint64_t ainverse = extendedEuclidX(A, M); //don't worry, this is computed at compile time
        x = ainverse * (x - C) & (M - 1);
        return x >> D;
    }
    unsigned int max() const {
        return (M - 1) >> D;
    }
};

} // end namespace details

using ReversibleLCG = details::ReversibleLCG<>;

} // end namespace rlcg

#endif // RLCG_HPP

I have a quite fast, excellent quality crypto-based PRNG that I always use for my OpenCL projects (it lends itself well to SIMD and MIMD architectures, even both at the same time). I might as well post it here (this is OpenCL code but should be readily convertible to whatever language you use, it uses vector types). The code below is rather barebones because it was designed to be used inside a GPU kernel, and is a bit redundant because the "wrapper" is unfinished:


#pragma once

/** @file prng.cl
  * @brief Kernel PRNG implementation.
**/

/** This macro converts a 64-bit integer, to a [0..1) float. **/
#define TO_FLOAT(x) ((float)x / (ulong)(18446744073709551615UL))

/** This indicates how many rounds are to be used for the pseudorandom permutation
  * function, higher means greater quality but at a higher computational cost,
  * 3 recommended at a minimum, 9 is more than enough.
**/
#define ROUNDS 4

/* This is a 256-bit -> 256-bit keyed permutation function. */
/** Keyed permutation function.
  * @param state The internal state.
  * @param seed The PRNG's seed.
  * @returns A 256-bit pseudorandom output.
**/
void renew(ulong4 *state, constant ulong4 *seed)
{
    /* Retain the PRNG's state. */
    ulong4 block = *state + *seed;

    #pragma unroll
    for (ulong t = 0; t < ROUNDS; t++)
    {
        /* Round 2t + 0 (×4 mix & permutation). */
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(14, 16));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(52, 57));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(23, 40));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)( 5, 37));
        block.hi ^= block.lo; block = block.xywz;

        /* Key addition. */
        block += *seed;

        /* Round 2t + 1 (×4 mix & permutation). */
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(25, 33));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(46, 12));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(58, 22));
        block.hi ^= block.lo; block = block.xywz;
        block.lo += block.hi; block.hi = rotate(block.hi, (ulong2)(32, 32));
        block.hi ^= block.lo; block = block.xywz;

        /* Key addition. */
        block += *seed;
    }
}

/** @struct PRNG
  * @brief PRNG internal state.
  *
  * This structure contains an instance of PRNG, which is enough information to
  * generate essentially infinitely many unbiased pseudorandom numbers.
**/
typedef struct PRNG
{
    /** @brief The 256-bit internal state. **/
    ulong4 state;
    /** @brief A pointer to the PRNG's seed, common to all instances. **/
    constant ulong4 *seed;
} PRNG;

/** This function creates a new PRNG instance, and initializes it to zero.
  * @param ID The ID to create the PRNG instance with, must be unique.
  * @param seed A pointer to the PRNG's seed.
  * @returns The PRNG instance, ready for use.
**/
PRNG init(ulong ID, constant ulong4 *seed)
{
    PRNG instance;
    instance.state = (ulong4)(ID);
    instance.seed = seed;
    return instance;
}

/** This function returns a vector of uniform pseudorandom numbers in [0..1).
  * @param prng A pointer to the PRNG instance to use.
  * @returns An vector of unbiased uniform pseudorandom numbers between 0 and 1 exclusive.
**/
float4 rand(PRNG *prng)
{
    renew(&prng->state, prng->seed);
    return TO_FLOAT(prng->state);
}

The word size and vector width (here, 64 bits and 4) can be tweaked easily. I actually have a WIP document (including better code) on this that I may turn into a Gamedev article if I have time, but it's not ready yet. Two rounds is far better than an LCG and mighty fast, three rounds is pretty much all you'll need, four if you want to be certain (it depends a bit on the other parameters). Anything above that is overkill, really.

Do not use for cryptographic purposes. The design may be derived from a cipher but.. just don't.

It is also reversible (there is an inverse permutation). Period is expected to be 2^255 on average for this particular set of parameters, in general 2^(n - 1) where n is the internal state size in bits.

I think I was the not the first to use this particular cipher (Threefish) as a performance-oriented PRNG and I know people have been playing with Serpent and Twofish for a while, but I like having things I can rely on in any situation and that I can actually understand. This is one of them.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

I just wanted to point out that using a state as small as 16 bytes is more a liability than a feature. In my limited experience writing PRNGs and hash functions, a large state is one of the easiest ways to avoid some common pitfalls.

Have you guys tried something like the dieharder test suite to evaluate the quality of the pseudo-random numbers generated by your code?

What's wrong with well-known algorithms such as (complementary) Multiply-With-Carry, or Xor-Shift? Both roughly 20 years old, and both incidentially from the same author. Simple arithmetic operations, and few of them, and you can lag them any way you want if a period of 264-1 isn't enough for you (though in my opinion, that's plenty, maybe not if you're doing particle systems that simulate global weather, but for a game it is anyway).

Unless one needs a cryptographic RNG, I see no real reason to use anything more complicated or to even try inventing something better. The difference, if there is any, will probably not be observable.

I have often used small RNGs for things like hashing functions.

say, for example:

seed=seed*4093+1;

val=(seed>>12)&4096;

or:

seed=seed*65521+1;

val=(seed>>16)&65535;

or, ...

while the period isn't necessarily great, the level of "randomness" is typically pretty usable, and when used as the basis for something like a hashing function, the deterministic behavior is pretty useful. it is also pretty fast as well.

say, for hashing a string:

s=str; h=0;

while(*s)h=(h*4093)+(*s++);

elsewhere I have an RNG that mostly just works by multiplying a large (IIRC 1024 bit) value by a very large number and adding a constant (with the high-order bits used as output), and is continuously seeded via an "entropy miner" which basically feeds in delta-values from the RDTSC instruction (which tend to be fairly noisy).

this is mostly used as a TRNG. I don't know if "cryptographically secure" or anything, but it seems to work reasonably ok for generating UUID-style numbers, as well as for things like reseeding smaller/faster PRNGs.

But just why do people care so much about periods? There is no difference between a 264 period and a 264000 period. Assuming your computer runs at 2GHz and consumes one random number every cycle (rather unlikely!), it can run for 292 years on a 264 period.

How many people do you know who live 292 years? How many people do you know who can remember what numbers they saw 292 years ago?

But just why do people care so much about periods? There is no difference between a 264 period and a 264000 period. Assuming your computer runs at 2GHz and consumes one random number every cycle (rather unlikely!), it can run for 292 years on a 264 period.

How many people do you know who live 292 years? How many people do you know who can remember what numbers they saw 292 years ago?

for most typical uses, a period of 2^32 or 2^64 is fine...

for a lot of PRNG algorithms, the average period is considerably smaller than the size of the seed-value though (for example, a 64-bit seed might only give a 32-bit average period, or a 32-bit PRNG giving a 16 bit period, ...).

an overly short period may also interfere with some use-cases.

...

the solution then is to use a bigger value and get a larger period, granted, past a certain point it may not matter much, and be offset by the added costs of doing arithmetic with very large numbers or similar.

so, a lot depends on what is "good enough" for the given use-case.

the main other situations are things like cryptography, where you want to be fairly certain that your numbers are actually pretty random (such that people trying to crack things will be less likely to be able to guess them...), and things like UUID values, where you generally want to have at least some confidence that the randomly generated ID number isn't being used anywhere else (meaning the RNG state needs to be considerably larger than the UUID being generated, and also seeded with reasonably unique values).

for example, a 128-bit UUID would be a bit of a fail if a given implementation could only produce in-total 2^32 unique UUIDs.

and, if this happened for cryptography, then the key could likely be broken in under 1 second.

so, a lot has also has to do with the use-case...

I just wanted to point out that using a state as small as 16 bytes is more a liability than a feature. In my limited experience writing PRNGs and hash functions, a large state is one of the easiest ways to avoid some common pitfalls.

Have you guys tried something like the dieharder test suite to evaluate the quality of the pseudo-random numbers generated by your code?

I have done extensive testing with dieharder, in fact I was trying to find the minimum number of rounds needed. Turns out three rounds passes the dieharder tests with no problems. I did not invent the internal design myself, this is Bruce Schneier's cipher, I just simplified it slightly for use as a PRNG, rewrote it with vector instructions, and did some checking to ensure I didn't break anything. Inventing a PRNG from scratch is just asking for trouble, imho, but deriving it from an existing, proven construction is a hobby of mine.

But just why do people care so much about periods? There is no difference between a 264 period and a 264000 period. Assuming your computer runs at 2GHz and consumes one random number every cycle (rather unlikely!), it can run for 292 years on a 264 period.

I don't really understand that either, to be honest. This is also why I dislike the MT. It just looks over-engineered with its 2^19937 period, I mean, who cares? If anything, having an exact, predictable period is more of a failure than a feature since true random number generators do not exhibit regular periodicity. But it doesn't matter anyway because nobody will ever reach said period. 16 bytes of internal state is cutting it a bit close if your PRNG is particularly bad, but 32 bytes is more than enough.

And then you see people using MT for CUDA.. good grief, what a waste of memory.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

This beats what you have

What's wrong with well-known algorithms such as (complementary) Multiply-With-Carry, or Xor-Shift? Both roughly 20 years old, and both incidentially from the same author. Simple arithmetic operations, and few of them, and you can lag them any way you want if a period of 264-1 isn't enough for you (though in my opinion, that's plenty, maybe not if you're doing particle systems that simulate global weather, but for a game it is anyway).

Unless one needs a cryptographic RNG, I see no real reason to use anything more complicated or to even try inventing something better. The difference, if there is any, will probably not be observable.

QFT!

XorShift beats what I see here, in speed (i.e. fewer operations), cycle length (2128-1), and passes the diehard tests.

http://en.wikipedia.org/wiki/Xorshift

A reversible LCG is certainly an interesting toy, but programs don't tend to run in reverse. wink.png

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement