• Content count

  • Joined

  • Last visited

Community Reputation

134 Neutral

About ragonastick

  • Rank
    Advanced Member
  1. Compression algorithm

    I don't know if this has a fancy name or anything, it is just an idea which I came up with. In terms of compression ratios it doesn't do brilliantly (it compressed the text of dracula to about 70% of its original size) but there is a lot of room for improvement. The basic idea is that you can encode a number in binary in such a way that it is self delimiting, so when you write two of these numbers next to each other, you can actually tell that there are two numbers. I won't go into full details, but instead of having a 1, 2, 4, 8, 16... "column", you have 1, 2, 3, 5, 8, 13... (Fibonacci numbers). But, the restriction is that two 1s never appear in a row except to signal that this is the end of a number. eg 1 -> 11 = 1 2 -> 011 = 2 3 -> 0011 = 3 4 -> 1011 = 1 + 3 5 -> 00011 = 5 6 -> 10011 = 1 + 5 7 -> 01011 = 2 + 5 ... So, with this encoding scheme, you can see that smaller numbers get smaller representations. My naive compression routine had a table of size 256, which initially contained the numbers 0-255. I would read a byte, and then look for that byte in the table. Once I found it, I would write down where it was in the table and then put it at the front of the table, shifting everything else back. What happens is that the common letters get put at the start, while the uncommon things don't, hence common things have shorter representations. The compression isn't brilliant, but it is easy to understand and implement. You can also extend it by various methods: 1) Rather than moving it to the front of the list, is there a better place to put something when it has been used (possibly based on its history) 2) Meta-characters - have a bigger table with things like "caps-lock" or "capitalise next value" or "store next 3 bytes to buffer" and "put buffer here" There may be a proper name for it which might help you. But since this idea was just one of my thoughts (probably unoriginal too), you'll have to find it yourself. Andy