Jump to content
  • Advertisement
Sign in to follow this  
vaneger

Binary search as compression method

This topic is 3256 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I had an odd idea of trying to use a binary search as a compression algorithm. The basic idea is that given say a 32 bit integer you can find it by using a binary search in at most, 33 iterations. You can cut this down to 31 iterations and use 2 bits to specify which of the four remaining integers is the correct one. So need to know information is: 1. how many iterations to take 2. what 'section' of the possible domain the number is in 3. which end of the section the number is closest to 4. and of the final 4 what one is correct. Item 4 requires 2 bits, item 3 requires 1 bit, item 2 requires 5 bits and item 1 requires 5 bits for a total of 13 bits. An example: say you have the integer 2055 The section is equal to the least number of bits required to store the integer, in this case 12. This puts us in the section 4096-2048. 2055 is closer to 2048, so we mark down 0 for item number 3. Now we know that the integer lies between 3072 and 2048. Doing a binary search, we can get this down to four remaining possibilities in 8 iterations. The four possibilities are 2056(00), 2055(01), 2054(10) and 2053(11). Therefor we need to store 0(binary digit), 01(2 binary digits), section number 12, and 8 iterations. Am I messing something up, or is this a viable algorithmic compression scheme?

Share this post


Link to post
Share on other sites
Advertisement
It's similar to arithmetic coding.

Quote:
Therefor we need to store 0(binary digit), 01(2 binary digits), section number 12, and 8 iterations.


Sadly, this information will take, on average, 32 bits to store (for 32 bit integers of equal probabilities).

In your case it would be simpler to just encode the needed bits directly. Integer can be 32 bits long (encoded in 5 bits), the remainder are the actual bits of the integer. So 5 bits + 12 bits could encode 2055 (and up to 4095) in 17 bits.


Also, integers can be considered a form of a binary tree. Consider the following:
              |
+------+-------+
| |
+-----+-----+ +-----+-----+
| | | |
0 1 2 3
Parse the bits of an int one by one. If the bit is 0, move left. If the bit is 1, move right.

If the tree is leaf, print the number and stop.

2 in binary is 10. In above tree, you'd first move right, then left, then print two.

Share this post


Link to post
Share on other sites
You have to encode the binary search as bits.

The problem comes when needing to tell the difference between 0001 and 01. It requires extra information, basically the bits you are elminating.

I went through this a while back. tho it was in the range of 8 bits. I found I could use at most 7 bits (basically the path you took through the binary tree) to encode most of the numbers, yah there's edge cases, but I figured id handle those separately.

I figured if I could eliminate 1 bit per iteration of encoding, it be a pretty damn good recursive compression alg.

It just doesnt work in the end.

What I was trying sounds along the lines of what you're trying.



Share this post


Link to post
Share on other sites
If you want to make a compression algorithm, make sure you understand the pigeonhole principle.

From wikipedia: Given two natural numbers n and m with n > m, if n items are put into m pigeonholes, then at least one pigeonhole must contain more than one item. This principle proves that any general-purpose lossless compression algorithm that makes at least one input file smaller will make some other input file larger. Otherwise, two files would be compressed to the same smaller file and restoring them would be ambiguous.

Share this post


Link to post
Share on other sites
First make sure you understand what Wiggin posted above. Then notice that it doesn't mean that compression is impossible. This is how lossless compression always works: Of all the possible inputs, some are more probable than others, so if we encode the more probable ones with shorter representations and the less probable ones with longer representations, we can make the average representation length smaller than the original length of the message.

In other words, you need to understand something about what things happen often and what things almost never happen in order to be able to compress (express common things with few bits).

[Edited by - alvaro on October 26, 2009 2:32:40 PM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!