Pi decimals as a random number generator

Started by
13 comments, last by aregee 9 years, 8 months ago

Completely off-topic, but this thread is reminding me of pifs (Pi as a file storage mechanism).

Given that you can actually find every sequence of bytes in PI no matter how long, which I doubt, (you can't find PI inside PI, for instance, since it is a transcendental number, and if you could, you would actually have an irrational number, which is algebraic numbers) I wonder the amount of metadata you would have to write down to "store your file in PI". biggrin.png

Advertisement


Given that you can actually find every sequence of bytes in PI no matter how long, which I doubt, (you can't find PI inside PI, for instance, since it is a transcendental number, and if you could, you would actually have an irrational number, which is algebraic numbers) I wonder the amount of metadata you would have to write down to "store your file in PI".

Yes, that is the joke, the metadata must be at least as large as the data itself, as expected from a simple pigeonhole argument (but some people still believe you can get something for nothing, which makes for some hilarious bug reports/issues) smile.png

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Yes, that is the joke, the metadata must be at least as large as the data itself, as expected from a simple pigeonhole argument (but some people still believe you can get something for nothing, which makes for some hilarious bug reports/issues) smile.png

I was refreshing my weak knowledge on Huffman coding, since I am making a MP3 decoder just for "fun", and stumbled upon this quote that made me think about your mention of the pigeonhole principle in this regards:

"In general, data cannot be compressed. For example, we cannot losslessly represent all m-bit strings using (m ? 1)-bit strings, since there are 2(pow)m possible m-bit strings and only 2(pow)m?1 possible (m?1)-bit strings."

This is out of context though, if it sounds weird that "In general, data cannot be compressed". Here is the link to get the context:

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2012/lecture-notes/MIT6_046JS12_lec19.pdf

Yes, that is the joke, the metadata must be at least as large as the data itself, as expected from a simple pigeonhole argument (but some people still believe you can get something for nothing, which makes for some hilarious bug reports/issues) smile.png


I was refreshing my weak knowledge on Huffman coding, since I am making a MP3 decoder just for "fun", and stumbled upon this quote that made me think about your mention of the pigeonhole principle in this regards:

"In general, data cannot be compressed. For example, we cannot losslessly represent all m-bit strings using (m ? 1)-bit strings, since there are 2(pow)m possible m-bit strings and only 2(pow)m?1 possible (m?1)-bit strings."

This is out of context though, if it sounds weird that "In general, data cannot be compressed". Here is the link to get the context:

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2012/lecture-notes/MIT6_046JS12_lec19.pdf


That is correct. All that lossless compression schemes do is look at the input to discover that some patterns are more likely than others, and then process the data such that these common patterns map to a shorter representation while less common patterns map to a (necessarily) longer representation. They feed on redundancy in the input, and operate on the assumption that stuff that human beings like to use, like text files and music, are not just white noise but have structure to them. In some abstract sense, their goal is to extract information from data. If you feed any lossless compression scheme a stream of uniformly random bits, it will never, ever succeed in compressing it on average, as it should be. This mirrors the laws of thermodynamics, which is not surprising given that thermodynamics deal with disorder and structure, which is very much what information is (in fact, this is why information content is also called entropy).

Lossy compression schemes, on the other hand, simply trade correctness for efficiency, so that you can get better compression if you don't need 100% of the data intact. This is often the case when dealing with perceptual media like pictures and audio, because our brains have some rather powerful built-in error-correction features, but is typically unacceptable when it comes to structured data like source code or even simple text (we can also correct mangled text to some extent, but it's much more work and still relies on context being present, as usual).

I know that there is a very old gamedev.net thread where the thread author claims he's come up with a lossless compression scheme which always shortens its input. It's a fun read if you are interested in the subject. Here, I dug it out for you: clicky.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

That is correct. All that lossless compression schemes do is look at the input to discover that some patterns are more likely than others, and then process the data such that these common patterns map to a shorter representation while less common patterns map to a (necessarily) longer representation. They feed on redundancy in the input, and operate on the assumption that stuff that human beings like to use, like text files and music, are not just white noise but have structure to them. In some abstract sense, their goal is to extract information from data. If you feed any lossless compression scheme a stream of uniformly random bits, it will never, ever succeed in compressing it on average, as it should be. This mirrors the laws of thermodynamics, which is not surprising given that thermodynamics deal with disorder and structure, which is very much what information is (in fact, this is why information content is also called entropy).


Lossy compression schemes, on the other hand, simply trade correctness for efficiency, so that you can get better compression if you don't need 100% of the data intact. This is often the case when dealing with perceptual media like pictures and audio, because our brains have some rather powerful built-in error-correction features, but is typically unacceptable when it comes to structured data like source code or even simple text (we can also correct mangled text to some extent, but it's much more work and still relies on context being present, as usual).

I know that there is a very old gamedev.net thread where the thread author claims he's come up with a lossless compression scheme which always shortens its input. It's a fun read if you are interested in the subject. Here, I dug it out for you: clicky.

Thank you for the link! it was quite entertaining. :)

My original answer got lost, so I got a bit discouraged to repeat.

Mental note to self... Do not use Google Chrome Canary to write in discussion forums.

awsnap.png

This topic is closed to new replies.

Advertisement