Jump to content

  • Log In with Google      Sign In   
  • Create Account

Links to Pseudo-random Theory?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
50 replies to this topic

#21 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 28 April 2012 - 07:59 PM

1) How fast, relatively is addition and subtraction compared to and/or/xor?

Same speed.

How fast, relatively is multiplication, division, modulus compared to addition and subtraction?

Multiplication:
- depends on the processor, can be almost as fast as addition or up to 30 times slower (if you multiply by a constant, you can emulate using shifts and additions, so it would be 3-4x slower) - if you multiply by a power of two, as fast as addition.
Division:
- if the divisor isn't a power of two, SLOW (if you divide by a constant, there are ways to make it faster, but it's quite difficult and often not worth it), up to 50 times as slow as addition. if the divisor is a power of two, as fast as addition.
Modulus:
- if the modulus is a power of two, as fast as addition. Otherwise, same as division (note that if you need both the quotient and the modulus, you can do it all in one shot, saves some time)


1.5) Could someone ball park a number of bit operations that would be acceptable to meet my goals for speed?
I'm building the RNG entirely out of and, or, xor, not, <<, >>... so I need to know a good maximum number of these I'm allowed to have.

You want to make sure the PRNG is actually good before optimizing it for speed, but in general you want to aim for, say, 20 cycles per byte maximum for decent speed. This is computed by taking the number of cycles taken by your PRNG (on average) to calculate N bytes of data, and dividing that number by N. Quick chart to calculate cycles (this is just a back of the envelope calculation, as CPU's are good at rearranging instructions to maximize efficiency and do things in parallel):
- addition, subtraction, logical operations: 1 cycle
- multiplication: 4 cycles
- division: 25 cycles (1 cycle if power of two)
- modulus: 25 cycles (1 cycle if power of two)

So suppose your PRNG outputs one 32-bit number, it can do so in at most 80 cycles, which represents for instance 20 additions, 20 subtractions, five multiplications and a bunch of logical operations.

2) Are two bitshifts as fast as one of equal size?
Like is { a << 5 } faster than { a << 2; a << 3; } or is it done one at a time and thus equally fast?

The speed is the same regardless of the shift (so if you do it twice, it's twice as slow as doing it once directly, unless your compiler optimizes it away). Bit rotation is also as fast as a shift (it is a permutation too, which are almost instant in hardware)

Also remember these operations aren't the only ones that exist, there is a very powerful concept called permutation (bit rotations are a subset of this and since they're fast it's usually the best way) and substitution (look-up tables! somewhat slow if not used properly, but provides extremely good diffusion)

*** This is assuming a modern/half-modern x86 processor, it will not be true for specialized or embedded hardware, in which division or multiplication may not even be available at all ***

Edited by Bacterius, 28 April 2012 - 08:02 PM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


Sponsor:

#22 Álvaro   Crossbones+   -  Reputation: 13318

Like
2Likes
Like

Posted 28 April 2012 - 08:05 PM

I want my RNG to be nearly as fast as a congruential generator like:
{ R = ((aR + b) mod c) }.
It doesn't have to be as lightweight or as good, of course.

1) How fast, relatively is addition and subtraction compared to and/or/xor?


They are all equally [and very] fast.

How fast, relatively is multiplication, division, modulus compared to addition and subtraction?

Multiplication is a bit slower (say, 3 cycles instead of 1, or something of that order), and division and modulus are significantly slower. However, if you are dividing or taking modulus with respect to a constant, the compiler might be able to optimize it by converting it into other operations.

EDIT: Of course, multiplying by a power of 2, dividing by a power of 2 and taking a number modulo a power of 2 are all very fast, since they can be implemented with trivial bit operations.

1.5) Could someone ball park a number of bit operations that would be acceptable to meet my goals for speed?
I'm building the RNG entirely out of and, or, xor, not, <<, >>... so I need to know a good maximum number of these I'm allowed to have.

This depends entirely on how slow you are willing to make your function. If you can't put that into a number, I can't give you a number of operations.

2) Are two bitshifts as fast as one of equal size?
Like is { a << 5 } faster than { a << 2; a << 3; } or is it done one at a time and thus equally fast?

Probably, but I think you are worrying about the performance of something that isn't a real concern. Why would you write it as "shift 2, then shift 3" instead of "shift 5"?

Thoughts?

Yes: Any intuition for what's fast and what's slow is bound to be obsolete in a few years, as hardware changes. So instead of trying to think of what's fast and what's slow, test your program. Ask your PRNG to generate 10 million numbers and see how long it takes. You'll get a good sense of what operations slow things down by playing with that setup.

Edited by alvaro, 28 April 2012 - 08:07 PM.


#23 Blind Radish   Members   -  Reputation: 355

Like
1Likes
Like

Posted 28 April 2012 - 10:51 PM

Thanks guys, I was just wondering what it looks like down at that level, ya know for personal growth and everything.

I should easily be able to come up with something random in around 80 simple operations.
I was thinking the limit might be 30, so this is actually really good news.

#24 Blind Radish   Members   -  Reputation: 355

Like
1Likes
Like

Posted 28 April 2012 - 10:59 PM

Thanks so much guys, you all have been so helpful!

I should easily be able to come up with something random in around 80 simple operations.
I was thinking the limit might be 30, so this is awesome news.

Why would you write it as "shift 2, then shift 3" instead of "shift 5"?.


Mainly I was just wondering what it looks like down at that level, ya know for personal growth and everything.
But I was planning on scraping a byte by using an extra shift after a xor. It's complicated lol, but I'd do anything to avoid modulus cycles. Posted Image


1.67) Oh one more question I guess - are if/else branches 1 cycle as well or are those more complex?
My gut tells me they are bad for RNG lol.

Edited by Expert Novice, 28 April 2012 - 11:32 PM.


#25 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 28 April 2012 - 11:42 PM

1.67) Oh one more question I guess - are if/else branches 1 cycle as well or are those more complex?
My gut tells me they are bad for RNG lol.

Yeah, they aren't that slow but they are very unpopular because they make the runtime undeterministic, which is quite undesirable (it makes the PRNG hell to analyze too, because conditionals are tricky to formalize mathematically, so it's not usually a desirable feature).


Thanks so much guys, you all have been so helpful!

I should easily be able to come up with something random in around 80 simple operations.
I was thinking the limit might be 30, so this is awesome news.

Note my 80 simple operations is assuming you are outputting 32 bits at a time. Many PRNG's kind of bypass this by maintaining massive internal states (think Mersenne Twister, of even a 256-bit internal state), which allows them to simultaneously maximize parallelism (good!) and increase their instruction margin (even better). It also helps to work on the biggest word size your computer can handle (usually 32 bits or 64 bits), because it's more efficient.

Edited by Bacterius, 29 April 2012 - 03:27 AM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#26 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 29 April 2012 - 03:19 AM

By the way, here is a repository of many shortcuts you can use to achieve various goals when it comes to bit manipulation. These include many performance boosters (although you probably won't find much use for most of them, some are quite handy), clicky!

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#27 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 07 May 2012 - 01:03 AM

Hey everyone! So I'm thinking I'm about done with my Random Number generator. It's looking pretty good.

I'd like to run tests on it, but I've never done that before so I'm not quite sure how.
I wanna know how fast it is, how random it is, the works.

I've heard of tests like the Diehard and the u01, but I'd like whatever is current and industry standard or what have you.

I'm working with CSharp and I was just wondering if anyone knew how I'd go about testing it with it being in CSharp.

Layman's terms please! This is pretty much the first real program I've done on my own.
Thanks for all your help everyone. Really, thank you.

#28 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 07 May 2012 - 03:25 AM

There really isn't any industry standard for testing. Pretty much, people make PRNG's, they turn out to be high-quality and efficient, other people start using them and spreading the word around. Once a few peers have done the analysis on their side and concluded the RNG was good, there's pretty much no reason to take a second look.

The language itself doesn't matter. To test the PRNG empirically, all you need is a stream of pseudorandom data from it (the exact size depends on your test, for instance DIEHARD only reads 16MB iirc whereas the ultimate tests like crushu01 will happily consume gigabytes).

These are my steps to test a PRNG:
- generate a 64MB binary file from it (the file needs to be binary, e.g. literally writing the output bytes in the file, not just the hexadecimal representation or whatever)
- open the file with your favorite hex editor, and hit Statistics, which should give you a histogram of the frequency of all values 0 through 255 for each byte in the file. This histogram should be almost flat, to account for any variance in the data. If it isn't, the PRNG is not good and will automatically fail DIEHARD. If your hex editor doesn't have that, I recommend HxD Editor. You can skip this step if you like but it's recommended.
- fire up DIEHARD, passing it the file and do all tests. Wait till it finishes (5-20 seconds), then read the output file. Are any p-values equal to 1.0000, or do many of them look suspiciously high like 0.9991? If so, the PRNG is not good. p-values should be uniformily distributed in the 0-1 interval, so it's normal for some of them to be high and others low (variance), but if they are consistently high or low then there is a problem.
- run all other tests, increasing the file size as needed (I think they all use the same p-value method) and do the same. Note some are quite painful to get running on Windows as you have to compile them yourself, which can suck big time. If you can get your hands on a Linux box to run those, you'll probably be better off (VM's work, of course).

If your RNG has survived all tests, good work, it seems to be pretty good. Bonus points if it is fast. Then you need to create a reference implementation for it in a few languages, write up some documentation, then release the algorithm so other people can test/verify it and ultimately use it if it's better than the alternatives.

As for testing the speed, the best way seems to repeatedly output a certain amount of random bytes into memory, and then divide the time taken by the number of MB's output. This will give you a rough estimate for the MB/s speed of your PRNG. It isn't definitive, and it can probably be optimized significantly, just make sure your implementation is decent before jumping to conclusions. For C# you'd probably use an array of whatever the output size of the PRNG is, loop over the whole array a few times to fill it in, measuring the total time and calculating it that way. You can get the clock per byte timings by using your processor's frequency, too, but there are a few gotchas.

Also, don't feel bad if your random number generator fails the tests, or is too slow, etc... It's not an easy task and the best way to go forward is to learn from mistakes, learn what works and what doesn't, and some theory doesn't hurt either. After all, failure is a cornerstone of the learning process - if we could never fail, why would we bother learning anything?

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#29 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 16 May 2012 - 12:30 AM

I've decided that I want to try to do my own (monte carlo?) pi approximation using my PRNG. I basically want to eventually recreate most of the current tests.
I know this isn't the best use of my time, but I'm doing this mostly for my own learning. I know I could just grab one but where's the fun in that?


I had a few questions.


1.1) I've read that if I approximate pi, I need 100x the random numbers for 10x the accuracy. Is this true?
If so, can I say that my approximation of pi should be between
2.6414 and 3.6414 for 100 numbers
3.0914 and 3.1914 for 10,000 numbers?
3.1364 and 3.1464 for 1,000,000 numbers?
1.2) Or am I doing the math there wrong?
1.3) And what would the range be if I generated 400 numbers? I know it's something to do with logarithms but that's as far as I can get with my limited math knowledge.
Like pi ~= pi +|- 400 log(10) or something.


2.1) Secondly, if I use a sphere instead of a circle, do I need more random numbers for a fair approximation, or can I use the same number?

Because I was seriously considering using random points inside of an n-cube and finding if they fall into or out of an n-sphere.
I know it seems like a hassle to do it the hard way but I'm thinking this 2.2) might catch something a simple circle might not catch. Am I mistaken to believe this? I'm guessing this is how they proved that the Mersenne Twister works in up to 623-space, but it's just a guess.

If not, I know less points will fall into an n+1-sphere in an n+1-cube, than will fall into an n-sphere in an n-cube - I've got the equation for finding the volume of any n-sphere and the volume of any n-cube but I want to make sure I'm not overlooking something. 2.3) Will I need more random samples to maintain the same amount of accuracy or will any number always give me about the right accuracy?

Edited by Expert Novice, 16 May 2012 - 02:21 AM.


#30 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 16 May 2012 - 03:38 AM

Disclaimer: I most likely screwed up some of the math below, so take it with a grain of salt, but the concepts still apply.

I've read that if I approximate pi, I need 100x the random numbers for 10x the accuracy. Is this true?

Yes. Monte Carlo accuracy increases with if you use a uniform random distribution (it's possible to do better with different kinds of distributions, though)

Because I was seriously considering using random points inside of an n-cube and finding if they fall into or out of an n-sphere.
I know it seems like a hassle to do it the hard way but I'm thinking this might catch something a simple circle might not catch. Am I mistaken to believe this?

You are mistaken to believe that using n-cubes instead of just the unit cube would reveal anything, since floating-point numbers are typically obtained from PRNG's by dividing their output by the maximum possible output (which intrinsically gives a [0..1) pseudorandom distribution). So it will not change anything. As for using spheres instead of circles, yes, that is correct, since the circle test would only test correlation between every two successive outputs (first and second, third and fourth, etc...) whereas the sphere test would test correlation between every three successive outputs. You can also use higher dimensions but it tends to get messy.

It's possible to work out what kind of accuracy can be expected from a truly random sequence (for a circle here). Let's have a look - let be the number of points plotted, and the number of points which fell inside the unit quadrant (we could use a full circle but quadrants are more convenient).

The probability density function of a uniform random variable between and is just which simplifies the next few integrals. We are interested in the quantity for two uniform random variables and , which represents the distance of a random point in the unit square, to the origin.

The mean of will be , and the variance . So the standard deviation of will be .

Of course all of the above was useless, as we are interested in the proportion of points for which (i.e. which are in the unit circle). We know from geometry that this proportion should tend to , the question being how many samples are needed to reach a given accuracy. Assuming we use enough samples, we should get a normal distribution (from CLT), which makes the underlying distribution (i.e. the integrals above) irrelevant, even though the latter comes into play with more advanced analysis.

, and of course the standard deviation will be . So after samples the sample standard deviation would be .

The following interpretation is going to suck if you don't like statistics: if you use samples, then you have a 95% chance of the ratio to points inside the unit circle to the total number of points being within of the "true" ratio (this interpretation isn't even correct, it should be: in the long run, 95% of all trials with samples will fall within this distance of the true ratio). You can use a different confidence interval by adjusting the z-value, though.

If you prefer to view it in terms of the number of accurate digits, you can take the base 10 logarithm of the margin of error to obtain a measure on the number of accurate digits with a given sample size (still subject to the conditions above, though). So your initial intuition was in essence correct, but required some more work to obtain the correct constants.

Secondly, if I use a sphere instead of a circle, do I need more random numbers for a fair approximation, or can I use the same number?

You can perform the analysis done above using a sphere instead of a circle, and see how this changes the margin of error. You can also try it with higher dimensions. You'll probably find that as you use higher and higher dimensions, the accuracy gets worse and worse, this is because the euclidian metric (distance formula) is a terrible metric when dealing with high-dimensional data (more here). And of course, you will need more numbers anyway since you'd need three numbers to plot a point instead of two (it helps if you think in terms of samples e.g. points instead of actual numbers).

But using this kind of test as a PRNG test is not the proper way to do it, as it doesn't tell you much at all and is very inefficient. It just simply doesn't take a deep enough look at the underlying pseudorandom distribution to say anything. And it is extremely redundant, it's like hitting a nail with a 5-ton anvil. You want small, quick tests which each test a given property the pseudorandom sequence should have, and return a human-readable result. Does it make sense?

There is no fool-proof test that can be fed some numbers are deterministically tell you "yes, this is random" or "no, it isn't". The closest we can get is feed our sequence to a bunch of tests which will tell you "it looks ok" or "no, this sequence definitely doesn't look random", and hope that the sequence passes many, many tests. And with PRNG's, we are in luck, since we can produce pseudorandom sequences very quickly - you generally don't get that luxury when debugging a true random number generator.

I believe DIEHARD is open-source, so you can check out the source code to see the various tests it does (but you will probably need some knowledge of statistics to be able to understand them as they use concepts such as p-values, chi-squared tests, etc..). They aren't all theoretical though.

EDIT: I realise you are looking for a layman's explanation but it really gets difficult to lay it down on paper in a way that doesn't involve math, at some point you will need to pick up the stats knowledge if you want to implement some of the common pseudorandom tests, you just can't make this stuff up.

Edited by Bacterius, 16 May 2012 - 03:47 AM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#31 Álvaro   Crossbones+   -  Reputation: 13318

Like
2Likes
Like

Posted 16 May 2012 - 03:48 AM

1) If you pick a random point in a 2x2 square, the probability of the point being inside the unit circle is p = pi/4 ~= 0.785398163 . If you now sample this n times, and count how many points are inside the unit circle, you'll get a probability distribution whose mean is p*n and whose variance is p*(1-p)*n. If n is large, this distribution is well approximated by a normal distribution with that mean and variance. So you can compute your number, subtract the mean, divide by the square root of the variance and the result should follow a standard normal distribution pretty closely.

2) In the procedure described above, each sample will give you the most information when p is close to 0.5. If you use a 3-dimensional sphere, p = 4/3*pi/8 ~= 0.523598776 , so things should work better than for a 2-dimensional sphere. Above that, p gets small quickly as a function of the dimensionality (0.308425138, 0.164493407, 0.0807455122...).

#32 Álvaro   Crossbones+   -  Reputation: 13318

Like
2Likes
Like

Posted 16 May 2012 - 03:53 AM

One comment about testing for randomness: A quick and dirty test would be to try to compress (e.g. gzip) a large chunk of binary data generated by your PRNG and see if the compressed file is any smaller than the original. If it is, there is a good chance the PRNG is flawed (because the compressor found some structure in the data to exploit).

#33 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 16 May 2012 - 02:50 PM

Right I've been reading about some of these tests in some PDF on one of the official cryptography standards (not sure it's not the only one, just know that it is one). I'm really starting to see the benefit of analyzing the 1's and 0's and not using weird birthday tests or pi tests or such things. I still kind of wanted to try my hand at it though lol. ;]

The math really is above me, I never learned what ʃ or Σ mean, and I'm really not used to reading it. The worst part is that when people tell me what formula to use they hardly ever explain how they derived that formula, they just expect me to take it on faith lol. And when they do it's all in technical mathy terms that I have no way of understanding in my current position. I'm getting it though, it's rather slow progress but I'm getting it.

I don't intend to give up until I've learned more about randomness. It's kind of a new found passion of mine.


As always, thanks guys. I really appreciate it.

#34 Bacterius   Crossbones+   -  Reputation: 8884

Like
2Likes
Like

Posted 16 May 2012 - 03:09 PM

The math really is above me, I never learned what ʃ or Σ mean, and I'm really not used to reading it. The worst part is that when people tell me what formula to use they hardly ever explain how they derived that formula, they just expect me to take it on faith lol. And when they do it's all in technical mathy terms that I have no way of understanding in my current position. I'm getting it though, it's rather slow progress but I'm getting it.

Ah, yes. Well if you have no notion of calculus it'll be hard to understand, but if you do it does make sense intuitively, because basically, when you want to find the mean of a sequence of numbers, you add them up, then divide the total by the number of elements, right? Well what if there are in fact infinitely many possible real numbers between 0 and 1 that the variable can take? In that case instead of adding them all up manually (which isn't possible) you integrate over the whole domain (which gives the sum of all those possible variables = the area under the curve) and divide the result by the continuous analogue of the "number of elements in the sequence", which is a construct which basically tells you what is the probability that the random variable of interest takes on a given continuous value (for a uniform distribution, the latter is in fact constant, which helps). Now what if you have a quantity derived from two independent random variables? In that case you integrate twice, once for each variable. And to obtain the variance it's the same, just using a different formula.

But Alvaro's explanation is better than mine (especially the comment on compressing the algorithm's output).

Right I've been reading about some of these tests in some PDF on one of the official cryptography standards (not sure it's not the only one, just know that it is one). I'm really starting to see the benefit of analyzing the 1's and 0's and not using weird birthday tests or pi tests or such things. I still kind of wanted to try my hand at it though lol. ;]

Indeed, all possible randomness tests are a subset of one very particular statement, which asserts that if a sequence is truly random, it is unconditionally impossible to predict the next bit output with probability better than 0.5. For cryptography, we often don't get true random numbers, so for pseudorandom numbers there is a weaker (but still useful) formulation of the statement: if a sequence is output from a strong cryptographic pseudorandom number generator, it should be impossible to predict the next bit output with probability better than 0.5 given all previous outputs, in reasonable time ("reasonable" means beyond the computational reach of any attacker). If a PRNG satisfies this property, it will pass *all* other randomness tests.

But birthday tests/etc.. are still very useful because testing the above usually requires some really nasty math (involving cryptanalysis), which is beyond me anyhow! And note that many good non-cryptographic PRNG's definitely don't satisfy the above property (mersenne twister, etc...), yet are used, because it's not an if-and-only condition: it's possible to pass many small tests and fail the cryptographic one, but it is not possible to pass the cryptographic test and subsequently fail any other. The cryptographic property is overkill anyway in most cases not involving crypto.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#35 taby   Members   -  Reputation: 336

Like
2Likes
Like

Posted 17 May 2012 - 01:32 PM

Right I've been reading about some of these tests in some PDF on one of the official cryptography standards (not sure it's not the only one, just know that it is one). I'm really starting to see the benefit of analyzing the 1's and 0's and not using weird birthday tests or pi tests or such things. I still kind of wanted to try my hand at it though lol. ;]

The math really is above me, I never learned what ʃ or Σ mean, and I'm really not used to reading it. The worst part is that when people tell me what formula to use they hardly ever explain how they derived that formula, they just expect me to take it on faith lol. And when they do it's all in technical mathy terms that I have no way of understanding in my current position. I'm getting it though, it's rather slow progress but I'm getting it.

I don't intend to give up until I've learned more about randomness. It's kind of a new found passion of mine.


As always, thanks guys. I really appreciate it.


The US National Institute of Standards and Technology (NIST) has a suite of tests that are also popular:

http://csrc.nist.gov.../rng/index.html

Some brief descriptions of the tests are at:

http://csrc.nist.gov...tats_tests.html

The one test that seems closer than most to the compression-based test that alvaro suggested is the "Approximate Entropy Test". The basic idea is to take your bitstream and break it into m-bit chunks, where m is 1, 2, etc. and see if there is an imbalance in how frequent each chunk is. For instance, take the bitstream 01010101. If you break it into m=1 bit chunks, you'll find that the chunk 0 and the chunk 1 both occur with the same frequency, so there is no imbalance. Awesome. Next, break it into m=2 bit chunks. You'll notice right away that the chunk 01 occurs four times, but not once did the chunk 00, 10, or 11 occur. Not awesome. That's a big imbalance, and obviously the bitstream is nowhere near random. Of course, your bitstream would contain more than just 8 bits, and your m-bit chunks would get larger than just m=2.

Essentially, a greater entropy means a greater balance in the frequency of symbols and a greater incompressibility. Get your toes wet with the Σ (summation) symbol by getting to understand how to calculate the entropy of a bunch of symbols: http://en.wikipedia....eory)#Rationale

Here's the equation for Shannon entropy in natural units. Don't think too hard about it until after you read the next few paragraphs:

where is the natural logarithm (ie. log() in C++).

It's pretty straightforward. Say you have a group of 6 symbols altogether. They could be binary digits, decimal digits, characters, words, images, paragraphs, etc., but we'll use characters: 'a', 'b', 'c', 'd', 'e', 'a'.

In the end you only have n = 5 distinct symbols; the letter 'a' is repeated once. In terms of probability, the letter 'a' occurs th of the time, 'b' occurs th of the time, 'c' occurs th, 'd' occurs th, and 'e' occurs th of the time. Just so you know, these probabilities for all of these 5 distinct symbols should add up to 1:

In expanded form, this is:

Now that you have a basic idea of what Σ (summation) means, let's calculate the entropy. In expanded form, it's:





You can divide this number 1.56071 by to get the entropy in binary units: 2.25. This tells you that these 5 distinct symbols (from a set of 6 symbols total) would each require roughly 2.25 bits of information to encode, on average. It's worth noting that the total number of data (6) and the distinct number of data (5) are whole numbers, but the average information content per datum (1.56, or 2.25 in binary units) is not. Yes indeed, data are not at all the same thing as information, and the popular concept of "a piece/indivisible unit of information" is not generally valid. I just find these confusions to be so very often perpetuated by math/comp sci/physics experts, and it annoys me a little bit.

Try the group of 8 symbols '0', '1', '2', '3', '4', '5', '6', '7'. Note that there are 8 symbols and all are distinct. You should get a natural entropy of . Divide that by to get the binary entropy of 3. This basically says that you need 3 bits of information to encode 8 distinct (and equally frequent) symbols. If you understand that an n-bit integer can represent 2^n symbols, then this should not be a shock to you (ie. a 16-bit integer can represent 2^16 = 65536 distinct values, which corresponds to a binary entropy of 16). Yes, the binary information content per symbol can be a whole number, but this is only a special case. Just remember that data are not the same thing as information.

Now try the group of 8 symbols '0', '0', '0', '0', '0', '0', '0', '0'. The natural entropy is simpy . No need for distinctness, no need for information. Yes, the binary information content per symbol can be a whole number, but again, this is only a special case. Just remember that data are not the same thing as information.

Essentially entropy is information, and it is a measurement taken of data. It's definitely not the same thing as data. I mean, the length of your leg is obviously not the very same thing as your leg (one's a measurement, one's bone and flesh).

Also, can you see how the group '0', '0', '0', '0', '0', '0', '0', '0' would be greatly compressible compared to the group '0', '1', '2', '3', '4', '5', '6', '7'? If not, you should look up run-length encoding (a type of compression) and then dictionary-based compression. Remember: very high entropy data is largely incompressible. In the context of that m-bit chunk test, we'd also cut the set of 8 symbols into subsets that are larger than one digit each (ie. not just subsets of single digits like '0', '1', '2', '3', '4', '5', '6', '7', but also subsets like '01', '23', '45', '67', etc.), so there would be multiple measures of entropy. I hope that isn't too confusing. If it is too confusing, then just ignore this paragraph altogether.

Have you not done any calculus at all? This is where you will see ʃ (indefinite integral / antiderivative / opposite of local slope, definite integral / area / volume / hypervolume / anything else that can be added up using infinitesimally small chunks) pop up a lot. You owe it to yourself to get informed. Upgrading your knowledge from algebra to algebra+calculus is kind of like upgrading from arithmetic to arithmetic+algebra. It's a very powerful upgrade, and there are a bajillion ways to apply that knowledge.

Edited by taby, 17 May 2012 - 03:25 PM.


#36 taby   Members   -  Reputation: 336

Like
1Likes
Like

Posted 17 May 2012 - 02:58 PM

Got quite a few positive ratings for that post, so here's a juicy C++ code that uses map to calculate the entropy of a string of characters. You can use it to calculate the entropy of "abcdea" and "01234567" just as well as "hello world":

#include <map>
#include <string>
#include <iostream>
using namespace std;

int main(void)
{
	 string stuff = "hello world";
	 map<char, size_t> symbols;

	 // For each symbol (character) in the symbol stream (string)...
	 for(string::const_iterator i = stuff.begin(); i != stuff.end(); i++)
	 {
		  // Search for symbol in map.
		  if(symbols.find(*i) == symbols.end())
		  {
			   // If it doesn't exist, insert and give it a count of 1.
			   symbols[*i] = 1;
		  }
		  else
		  {
			   // If it does exist, increment its count.
			   symbols[*i]++;
	 	 }
	 }

	 float entropy = 0;

	 cout << "Input string: \"" << stuff << "\" contains " << stuff.length() << " symbols, of which " << symbols.size() << " are distinct.\n" << endl;

	 for(map<char, size_t>::const_iterator i = symbols.begin(); i != symbols.end(); i++)
	 {
	 	 float p_i = static_cast<float>(i->second) / static_cast<float>(stuff.length());

	 	 cout << "Symbol '" << i->first << "' occurred " << i->second << '/' << stuff.length() << " = " << p_i << "th of the time." << endl;

	 	 entropy += p_i * log(p_i);
	 }

	 // Don't bother negating if already 0.
	 if(0 != entropy)
		  entropy = -entropy;

	 cout << "\nEntropy: " << entropy << " (binary: " << entropy/log(2.0) << ')' << endl;

	 return 0;
}

This spits out...

Input string: "hello world" contains 11 symbols, of which 8 are distinct.

Symbol ' ' occurred 1/11 = 0.0909091th of the time.
Symbol 'd' occurred 1/11 = 0.0909091th of the time.
Symbol 'e' occurred 1/11 = 0.0909091th of the time.
Symbol 'h' occurred 1/11 = 0.0909091th of the time.
Symbol 'l' occurred 3/11 = 0.272727th of the time.
Symbol 'o' occurred 2/11 = 0.181818th of the time.
Symbol 'r' occurred 1/11 = 0.0909091th of the time.
Symbol 'w' occurred 1/11 = 0.0909091th of the time.

Entropy: 1.97225 (binary: 2.84535)

Edited by taby, 17 May 2012 - 03:07 PM.


#37 taby   Members   -  Reputation: 336

Like
1Likes
Like

Posted 17 May 2012 - 06:10 PM

Here's a simple introduction to some of the major words behind calculus (derivative, integral), but it's not very general, so expect to learn more on your own:

Consider the extremely simple function . The function is a straight line. Go to wolframalpha.com and type in "plot x from 0 to 1" to see it.

The derivative of this function is (ie. the change in with respect to the change in x). Go to wolframalpha.com and type "derivative x" to see it. It may be helpful to think of the derivative as being somewhat analogous to the slope, because they do measure essentially the same thing. The difference is that slope is rise over run, but the derivative technically does not have a run -- it's calculated for an infinitesimally small run, which is effectively no run at all. Of course, keep in mind that functions are generally not as simple as this and so derivatives are not as simple as this , but hopefully you get the idea that derivative is related to the idea of slope.

The indefinite integral of is . Go to wolframalpha.com and type "integral x" to see it. Note that the derivative of is x, so you can say that the indefinite integral is the antiderivative.

The definite integral adds bounds to the indefinite integral (ie. it defines bounds, which is why it's called definite) and can be used to calculate the area underneath the line drawn by . For instance, the area under the line in the interval between x = 0 to x = 1 is given by the definite integral . Go to wolframalpha.com and type "integral x from 0 to 1". Consider that if you take a square that is 1x1 and cut it in half along the diagonal, you get a right angle triangle of area 1/2. Basically the definite integral adds up an infinite number of infinitesimally small "rectangular" regions that stretch from y = 0 to y = x in the interval between x = 0 and x = 1. Look at the concept of Reimann sum, but imagine that the rectangles are so thin (in terms of width) that they are in fact just lines. Consider that the word integrate means to add together. The definite integral is basically fancy summing. This was mentioned in another post above.

After you've got your toes wet with this, you'll need to backtrack and see how you can go from classic slope (a large run) to derivative (an infinitesimally small run) via the concept of limits. You may wish to visit the concept of secant line before you tackle limits -- basically the derivative is analogous to the slope that you'd get by making a fancy secant line where the separation between the two points (ie. the run) is infinitesimally small. At the core of it all is the fundamental theorem of calculus. You'll need to learn your differentiation rules. When you study the very simple power rule, you'll see how the derivative (anti indefinite integral, if you like) of is x, and how the derivative of x is 1 (it's a special case), which is precisely what we covered here in the past few paragraphs. I really did use the most extremely simple example for f(x) that still has a non-zero derivative. Consider f(x) = 1. The derivative of that is just zero (ie. no change in f(x) with respect to the change in x), which would have been too simple to be useful as example.

Now try using something more complicated for , like where and .

I learned from The Complete Idiot's Guide to Calculus, and it worked very well for me, so don't be too proud when picking out training material. Of course, by now you probably have got at least one point hammered down: it's all about making measurements using infinitesimally small things. It also helps that there are things like wolframalpha.com (based on Mathematica) to help check your work, as well as other relatively dirt cheap offline CAS apps for most smartphones too (MathStudio is a really decent one). Sage is pretty amazing if you want a free CAS for your PC. There are also free Sage sites that let you do your work using their installation -- no hassle on your part, really, and you still get to save and reload long workbooks for long complicated jobs.

Now remember your mention of the term "Monte Carlo". I'm sure you know this already, but the origin of the term is "Monte Carlo integration". For those not familiar with this, say you want to find the area under f(x) = x in the interval x = 0 to x = 1, but you don't want to go through the bother of working out the definite integral, etc. Well, randomly place n points in the 1x1 square region bound by x = 0 to x = 1, y = 0 to y = 1. Now, assign each point an area of A = 1x1/n = 1/n. Finally, count how many points lie on or below the line drawn by f(x) = x and multiply that count by A. You should get a final answer of roughly 1/2. Hence, Monte Carlo (random, as in like, a casino game) integration (adding). Try it for something much more complicated than . Also, consider that sometimes you have a plot of points but you don't actually have the f(x) that was used to generate them. You can't get the definite integral if you don't know f(x), so Monte Carlo integration (or Reimann sum) can help you if you need to know the area under the curve in that case. Plus, there are some f(x) that just simply do not have a definite integral to begin with, so you are absolutely forced to use numerical methods like Monte Carlo (or Reimann sum) for these. I learned about this (and about Towers of Hanoi, and, and, and...) in a wicked awesome book called "C for Scientists and Engineers" by Johnsonbaugh and Kalin. I notice that Amazon.com only gives this book a 4.5 out of 5. That's utterly ridiculous. The rating should be 5 out of 5, no questions asked.

Edited by taby, 17 May 2012 - 08:04 PM.


#38 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 17 May 2012 - 08:01 PM

Ah, yes. Well if you have no notion of calculus it'll be hard to understand, but if you do it does make sense intuitively, because basically, when you want to find the mean of a sequence of numbers, you add them up, then divide the total by the number of elements, right? Well what if there are in fact infinitely many possible real numbers between 0 and 1 that the variable can take? In that case instead of adding them all up manually (which isn't possible) you integrate over the whole domain (which gives the sum of all those possible variables = the area under the curve) and divide the result by the continuous analogue of the "number of elements in the sequence", which is a construct which basically tells you what is the probability that the random variable of interest takes on a given continuous value (for a uniform distribution, the latter is in fact constant, which helps). Now what if you have a quantity derived from two independent random variables? In that case you integrate twice, once for each variable. And to obtain the variance it's the same, just using a different formula.


I almost followed you, Bacterius. ;D



I think I do get SOME of the standard tests in a layman's kind of way. In all of them you're looking for two things - is the normal curve (which I know from rolling an infinite number of dice) adequately represented by the PRNG within some margin of error, and do the extremes of the normal curve occur more often than expected (like ever in a short period?).

So you basically pretend that the result is inherently random, and check for the odds of getting the result you got.
If you didn't get a result you should get 99% of the time, like a 1 ever, then it's 99% probably not random.
(same thing as above: If you did get a result you shouldn't get 99% of the time, like all 0's, then it's 99% probably not random.)
Otherwise, it cannot be proven not to be random (null hypothesis i think). You can only prove that it probably isn't random.

The key is that there's no way to know, only a way to say that a PRNG is 99% probably random, even though it absolutely is not.
I just love that thought process. I'm having a lot of fun. Posted Image


So anyway for a basic PRNG you're just trying to make sure no one can intuit patterns by watching the results. For a Crypt-PRNG you want to make sure a computer can't detect patterns in a reasonable amount of computation.
The trick is to find all of the obvious patterns (which could still potentially occur in randomness) and eliminate them.

At least, that's how it goes for randomness. (At least I think that's how it goes!) There may be other factors a client is more interested in, like me, I'm somewhat interested in finding QRNGs which have more equally distributed but still unpredictable points. So I guess the more Q in the RNG, the more you can guess where a random number won't appear. ;]

Posted Image

Edited by Expert Novice, 17 May 2012 - 08:18 PM.


#39 Blind Radish   Members   -  Reputation: 355

Like
1Likes
Like

Posted 17 May 2012 - 08:24 PM

The US National Institute of Standards and Technology (NIST) has a suite of tests that are also popular:

http://csrc.nist.gov.../rng/index.html

Some brief descriptions of the tests are at:

http://csrc.nist.gov...tats_tests.html


Yep, that's the one I found earlier. Thanks for the link though, I had misplaced it recently.



..."Approximate Entropy Test". The basic idea is to take your bitstream and break it into m-bit chunks, where m is 1, 2, etc. and see if there is an imbalance in how frequent each chunk is. For instance, take the bitstream 01010101. If you break it into m=1 bit chunks, you'll find that the chunk 0 and the chunk 1 both occur with the same frequency, so there is no imbalance. Awesome. Next, break it into m=2 bit chunks. You'll notice right away that the chunk 01 occurs four times, but not once did the chunk 00, 10, or 11 occur. Not awesome. That's a big imbalance, and obviously the bitstream is nowhere near random. Of course, your bitstream would contain more than just 8 bits, and your m-bit chunks would get larger than just m=2.


Hey wouldn't {0000 through 1111} automatically contain every combination of {00 through 11} exactly twice?
I mean if you didn't overlap the bits.


Essentially, a greater entropy means a greater balance in the frequency of symbols and a greater incompressibility. Get your toes wet with the Σ (summation) symbol by getting to understand how to calculate the entropy of a bunch of symbols: http://en.wikipedia....eory)#Rationale

Here's the equation for Shannon entropy in natural units. Don't think too hard about it until after you read the next few paragraphs:


Thank you for the effort here sir! Gonna need a while to mull this over though, but seriously, thank you.

Edited by Expert Novice, 17 May 2012 - 08:26 PM.


#40 taby   Members   -  Reputation: 336

Like
0Likes
Like

Posted 17 May 2012 - 08:27 PM

If you get how the standard normal distribution applies to random variables, and that it has a definite integral from -infinity to infinity of 1, then perhaps you know more than you're letting on (or more than you consciously realize)? ;)

Perhaps the following is also a way to look for randomness, in the continuous case, but who knows if I'm right: If you have a set S of n values from the continuous interval [min(S), max(S)], and there's no repetition in the values, and there's no repetitions in the first derivatives, and no repetitions in the second derivatives, etc, no repetitions in the (n - 1)th derivatives, then I guess it's maximally random because nothing was repetitious at all. This is to say that the entropy of S would be , the entropy of the set of first derivatives would be , etc. Of course, you're looking at storing/working on an upper triangular matrix of non-null elements (containing eventually ubergargantuan numbers for the highest derivatives), which could get pretty nasty for very large n.

Some example matrices for some sets of n = 5 values would be something like:
Values:		 0.2  0.4  0.7  0.1  0.5
1st derivatives: ...  0.2  0.3 -0.6  0.4
2nd derivatives: ...  ...  0.1 -0.9  1.0
3rd derivatives: ...  ...  ... -1.0  1.9
4th derivatives: ...  ...  ...  ...  2.9

Values:		 0.2  0.4  0.7  0.1  0.3
1st derivatives: ...  0.2  0.3 -0.6  0.2 <-- repetition on this row!!!
2nd derivatives: ...  ...  0.1 -0.9  0.8
3rd derivatives: ...  ...  ... -1.0  1.7
4th derivatives: ...  ...  ...  ...  2.7

Values:		 0.1  0.2  0.3  0.4  0.5
1st derivatives: ...  0.1  0.1  0.1  0.1 <-- repetition on this row!!!
2nd derivatives: ...  ...  0.0  0.0  0.0 <-- repetition on this row!!!
3rd derivatives: ...  ...  ...  0.0  0.0 <-- repetition on this row!!!
4th derivatives: ...  ...  ...  ...  0.0

I guess if you wanted to get really hardcore, you could look for repetition not only in terms of each individual row, but in terms of the entire matrix. In this case, maximal randomness would only be achieved if the entropy of the non-null elements was . In this case, the first example matrix would be slightly repetitious because 0.1, 0.2, and 0.4 all appear in multiple rows, and so the entropy / randomness would not be maximal.

Then again, perhaps these results could give something other than a normal distribution, yet still be considered "maximally random" by my definition, which would clearly be an issue for those who rely on the normal distribution method. So, I'm probably entirely wrong from the start, but I thought that I'd put it out there anyway.

Edited by taby, 17 May 2012 - 10:04 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS