Yes, that is the joke, the metadata must be at least as large as the data itself, as expected from a simple pigeonhole argument (but some people still believe you can get something for nothing, which makes for some hilarious bug reports/issues)
I was refreshing my weak knowledge on Huffman coding, since I am making a MP3 decoder just for "fun", and stumbled upon this quote that made me think about your mention of the pigeonhole principle in this regards:
"In general, data cannot be compressed. For example, we cannot losslessly represent all m-bit strings using (m ? 1)-bit strings, since there are 2(pow)m possible m-bit strings and only 2(pow)m?1 possible (m?1)-bit strings."
This is out of context though, if it sounds weird that "In general, data cannot be compressed". Here is the link to get the context:
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2012/lecture-notes/MIT6_046JS12_lec19.pdf
That is correct. All that lossless compression schemes do is look at the input to discover that some patterns are more likely than others, and then process the data such that these common patterns map to a shorter representation while less common patterns map to a (necessarily) longer representation. They feed on redundancy in the input, and operate on the assumption that stuff that human beings like to use, like text files and music, are not just white noise but have structure to them. In some abstract sense, their goal is to extract information from data. If you feed any lossless compression scheme a stream of uniformly random bits, it will never, ever succeed in compressing it on average, as it should be. This mirrors the laws of thermodynamics, which is not surprising given that thermodynamics deal with disorder and structure, which is very much what information is (in fact, this is why information content is also called entropy).
Lossy compression schemes, on the other hand, simply trade correctness for efficiency, so that you can get better compression if you don't need 100% of the data intact. This is often the case when dealing with perceptual media like pictures and audio, because our brains have some rather powerful built-in error-correction features, but is typically unacceptable when it comes to structured data like source code or even simple text (we can also correct mangled text to some extent, but it's much more work and still relies on context being present, as usual).
I know that there is a very old gamedev.net thread where the thread author claims he's come up with a lossless compression scheme which always shortens its input. It's a fun read if you are interested in the subject. Here, I dug it out for you:
clicky.