Archived

This topic is now archived and is closed to further replies.

Russell

Compression algorithm comparison

Recommended Posts

Russell    118
I was reading about different compression algorithms tonight and I got to understand the basic idea of several. Huffman is what most people suggested and it seems easy enough to implement. I am curious though, how the top few algorithms compare in terms of compression and efficiency. Is there any algorithm that is significantly better than the rest? Or is it more of an, "it depends on the data your compressing" issue? If I just went with Huffman encoding for general compression needs, am I making a bad decision? Thanks, Russell

Share this post


Link to post
Share on other sites
EvilCrap    134
it probably depends on the data, for instance, you cannot compress bits in bytes if the first bit of a byte is always 1 (otherwise you could have a string of bytes that = 0, which is compressable).

Share this post


Link to post
Share on other sites
LessBread    1415
It also depends on what you want to use the compression for. For example, you might be more concerned with decompression speed than with compression speed - as with the case of game bitmaps. For that situation, I''ve read that LZH performs quite well.

Share this post


Link to post
Share on other sites
Beer Hunter    712
The usual claim I hear about huffman is that it compresses human-readable data by an average of 30%, which is pretty good. Another compression algorithm that''s good for learning is arithmetic encoding.

If you''re just looking for a compression library, try UCL.

Share this post


Link to post
Share on other sites
TangentZ    450

"It depends on the data".

There are two types of compression: lossless and lossy

Lossless is for data compression where you want to decompress and get back 100% of the original data. e.g. GZ, BZ2, LZW

Lossy is for compression where you don''t need 100% decompression. e.g. JPEG, MPEG, MP3

Some algorithms compress fast and decompress slow and vice versa. So just pick what is best for your needs and stop searching for the "best" compression algorithm.

It''s just like there is no "best" sorting algorithm.


Premature optimizations can only slow down your project even more.

Share this post


Link to post
Share on other sites
mossmoss    326
quote:
Original post by Russell
If I just went with Huffman encoding for general compression needs, am I making a bad decision?



You need to examine your data to determine an answer to this questions.

The basic Huffman algorithm works by counting the frequencies of your input symbols. With this information, each symbol can be represented with a proportional, integral number of bits. It gives no consideration to patterns, and requires storage of your frequencies (or a known, fixed frequency table). One Huffman variation I know of is more dynamic, does away with storing the table and dynamically changes the bit representation of symbols during compression/decompression.

Algorithmic encoding is logically similar to Huffman; however, it can encode a symbol with a FRACTIONAL number of bits. For example, let''s say for your data, the most optimial encoding of ''e'' is 2.8 bits. Arithmetic can do this; Huffman would have to be satisfied with 2 or 3 bits (integral), slightly less optimal.

But again, arithmetic and Huffman are frequency-based compression schemes; they ignore local patterns. I can imagine that even something as simple as RLE (run-length encoding) could beat them for certain kinds of data.

There are dictionary coders, of which things like Pkzip and gnuzip are examples. These work not on frequency of input symbols, but rather on the patterns formed by sequences of them. So something like "abcabcabcabcabcabc" would stump a Huffman encoder (because all input symbols "a", "b", and "c" have the same frequency), a dictionary coder would immediately see the pattern and would compress this very well.

Of course, a commercial product like pkzip or similar actually encompasses several different compression algorithms and variants, and determines which is the best for any particular file. But if you know the characteristics of the data you''re putting in, you can choose an appropriate algorithm.


---- --- -- -
Blue programmer needs food badly. Blue programmer is about to die!

Share this post


Link to post
Share on other sites
Beer Hunter    712
I took a bunch of files (mostly text, but a few bitmaps), and stuck them together (I just used winzip with no compression). It came to 4,254,987 bytes. I tested uclpack.exe and bzip2.exe with a quick program.

uclpack:
  compression time: 10.449 seconds
  compressed into 2100547 bytes
  decompression time: 0.344 seconds
bzip2:
  compression time: 18.394 seconds
  compressed into 2013856 bytes
  decompression time: 3.240 seconds

Can anyone suggest some others for me to test?

Share this post


Link to post
Share on other sites