Archived

This topic is now archived and is closed to further replies.

Compression Technologies

This topic is 5660 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all, i''m researching into compression, lossless compression that is, and I would like to know more about the compression techniques used "out there"... I know Run Length Encoding, which is pretty basic... I''ve heard about "wave compression", or something like it... What other lossless mainstream compression techniques are used, in today''s most popular compressors, like WinACE, WinZIP, rar, etc...? Thanks,

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
by "wave compression", perhaps you mean wavelet based encoding, which is a <gack> "lossy" coder, that is supposed to provide better quality in less space than jpg''s. I think png''s may use it. The wavelet transform is a substitution for the DCT (Discrete Cosine Transformation) phase of lossy encoding.

About all I know about lossless encoding, is RLE, Hoffman, and zlib is free.

Share this post


Link to post
Share on other sites
The compression in PNG is strictly lossless (deflate, same as zlib). JPEG 2000 (the new/next version of JPEG) uses wavelets for its encoding, though.

[edited by - spock on June 5, 2002 12:41:41 AM]

Share this post


Link to post
Share on other sites
When doing lossless compression on unknown data most algorithms can be derived from one of three compression types:

Run Length Encoding: Basically encoding a repeating set of characters as a length value and a character value. This is done either by assuming most of the file has repeating characters and simply having a long string of value+character records, or by using an escape character to indicate that a character/pattern is going to be repeated. RLE is the fastest of the three types (for both encoding and decoding), however it produces the worst results most of the time. RLE is best for compressing things like flat bitmap images (like a screenshot of your desktop without wallpaper).

Huffman: Huffman works by storing characters/patterns in varying bit lengths. Instead of storing all characters as 8 bits it stores "e" as something like "01" and "z" as something like "0011010101010" (There is an algorithm to determine the best bit codes for characters in a file based their frequency). Huffman has the best overall compression of the three, however due to the bit manipulation is usually the slowest.

Lempel-Ziv: LZ works by find repetitve patterns in data. Whenever a repetition is found, rather than storing it again, it is stored as an address and length reference to the first instance of the repetion. This type of compression gets results similar to huffman.

A fourth type of compression that is not really in use (though you hear about it whenever a lab say they''ve found a way to compress data better than everyone else, they just have to test it on a second file...) is genetic/neural net compression. All compression algorithms seek to remove redundent information from a file (RLE and LZ are the better examples). The 4th type of compression uses AI to try and find patterns in data that we cannot see. For instance the file "123456789" has a pattern, however without hardcoding detection, compression and decompression for this exact type of pattern we can''t compress it. Using genetic algorithms (or similar ai technology), a program can be created that regenerates the original file, using any type of pattern it wants (the ai creates a special case decompressor).

Share this post


Link to post
Share on other sites
quote:
Original post by Magmai Kai Holmlor
by "wave compression", perhaps you mean wavelet based encoding, which is a "lossy" coder, that is supposed to provide better quality in less space than jpg''s.



Wavelets aren''t necessarily lossy, they are simply a data transformation. However, in order to get general lossless compression using wavelets (or at least, wavelet-like constructions), WITHOUT actually increasing the size of the data, you need some serious whacking and thwacking, and using strange boundary-case properties of integer arithmetic.

Most wavelet compression is in fact lossy, therefore, and also tends to lead to the most horrible visual artifacts because the actual wavelet base is badly chosen.

There are some nice websites around demonstrating it in fact.

People might not remember what you said, or what you did, but they will always remember how you made them feel.

Share this post


Link to post
Share on other sites
We''ll, i don''t want to sound arrogant or anything,
but those compression schemes seem to me quite weak.

What is the best general-purpose compressor tool
in the world, currently? (even if it takes 10m to
compress a 80Kb file...)

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
...and another question... if a new compression tool
for the home user came up that compressed 10% better than
zip and alike, would it have an impact, or is the market
saturated?

I think you can never have "enough" compression!

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
quote:
Original post by Michalson
When doing lossless compression on unknown data most algorithms can be derived from one of three compression types


I''m no expert at this, but isn''t arithmetic coding pretty common too?

quote:
Original post by pentium3id
What is the best general-purpose compressor tool
in the world, currently?


I assume you are talking about compression ratios. The problem is that it depends on the data you are trying to compress. All algorithms for lossless compression make assumptions about what kind of redundancy there will be in the data (removing that redundancy is the whole idea) and will fail to compress if those assumptions does not hold.

Share this post


Link to post
Share on other sites
quote:
Original post by pentium3id
What is the best general-purpose compressor tool
in the world, currently? (even if it takes 10m to
compress a 80Kb file...)



Something like ZIP: a hybrid combination of Huffman and LZ. If you want to use it in your program I would suggest looking for a dll/lib for doing ZIP compression.

Share this post


Link to post
Share on other sites
spock: yes, AC is common, esp. in speech coding.

Pentium3d:
quote:
...and another question... if a new compression tool
for the home user came up that compressed 10% better than
zip and alike, would it have an impact, or is the market
saturated?


More''s always better. But if you''re talking about data in general, there''s only so much you can do; if you improve for one type of data, you generally degrade for another.

quote:

I think you can never have "enough" compression!


You can if your compression complexity approaches infinity. This is why many Huffman codecs use a truncated Huffman code.

This is still one of the most active areas of study and experimentation in my industry (signal processing), and there are incremental improvements made annually.

quote:

We''ll, i don''t want to sound arrogant or anything,
but those compression schemes seem to me quite weak.


Yup, you sound pretty arrogant. I''m guessing you don''t even know what entropy is, but would like to dismiss current technologies as weak. Thank you, I needed a chuckle after staying up all night working on my class.

Share this post


Link to post
Share on other sites
well, the thing is...
i''m developing a compression algorithm for 6 years now,
and it beats zip and alike right out of the water (in
the theory, a working beta is still to be developed), but
i was sure there would be other good competitors out there...

if you guys say that the mainstream is comprised mainly
of huffman and lempel-zif then... i think i''m safe.

this is a generalistic lossless compression algorithm.

...also don''t ask me to divulge any more info on this...
you guys know how fast "miraculous" compression tools
come and go, so... (please please don''t flame me, this isn''t
something i started doing yesterday...)

...but, if i do get, in general, a better result than
arj, rar and zips, what should my next move be? patenting?

thanks for the tips...

PS: my algo compressed the string "BDBCAAAA" into 4 bytes
and 4 bits... (with a 3 bit header, and not saving filename and
that kind of stuff)... can others do better?

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
quote:
Original post by pentium3id
if a new compression tool for the home user came up that compressed 10% better than zip and alike, would it have an impact, or is the market saturated?


There are quite a few tools already that will often compress common types of data (text files, binaries etc) a little better than zip. Most of them need a little more time and/or memory when processing, but the real reason for people preferring zip (and gzip) is because those formats are well established among users; everyone knows how to deal with zip files.

quote:

I think you can never have "enough" compression!



See Stoffels answer, and consider this: would you rather

* wait 40 seconds to download a 4 meg uncompressed file
* wait 10 seconds to download a 1 meg (exotic compression method here) compressed file and then wait 40 seconds for it to uncompress?

There are diminishing returns. Speed and convenience are usually important.

quote:

my algo compressed the string "BDBCAAAA" into 4 bytes
and 4 bits... (with a 3 bit header, and not saving filename and
that kind of stuff)... can others do better?


As long as that is all it has to do, I can easily construct an algorithm compressing that string into 1 bit (no header needed).

Obviously you have NO idea what you are talking about. I''m not interested in participating in yet another compression flame war, thank you very much.

Share this post


Link to post
Share on other sites
quote:
Original post by pentium3id
well, the thing is...
i''m developing a compression algorithm for 6 years now,
and it beats zip and alike right out of the water (in
the theory, a working beta is still to be developed)




6 years, and no working beta?

*dodges all the red flags popping up*


Share this post


Link to post
Share on other sites
yes, lets not get into a flame war here...
and of course you could compress that string into a
bit, if it was in a dictionary... but my example was
for a file with only those bytes. As i said before,
miraculous compressors come and go every day...

i''m more interested in developing the algorithm and seing
how it competes with other compressors. I think this one
has potential. is international patenting costly?

thanks for the tips, keep them coming...

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
quote:
Original post by pentium3id
well, the thing is...
i''m developing a compression algorithm for 6 years now,
and it beats zip and alike right out of the water


Not another mp3 beating compression thread...

Share this post


Link to post
Share on other sites
"6 years and no working beta?"

lame hum? but true. I just like graphics
programming way much better than this
compression theory thing... it just happens i
stumbled uppon an algo that may actually work.

I just lost interest over the years, and re-started
this project recently, after trying to compress
some textures for my 3d opengl engine...

the 6 years and no beta has no excuse though...

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
this isn''t about mp3, its a generalistic algorithm,
and mp3 isn''t lossless...

also, i''ve read most of the other mp3-beating compression
algo thread, and the author had made a mistake in its
algo. I haven''t.

Just ive me a few tips about what to do if i actually pull
it off, and what other competitors to be warned about...

[Hugo Ferreira][Positronic Dreams][Stick Soldiers]
"Redundant book title: "Windows For Dummies".
"Camouflage condoms: So they won''t see you coming".

Share this post


Link to post
Share on other sites
How about this? About a month ago a friend and I thought of a good algo for compression, but it may make very high compresion for some files and very low compresion for others, is it worth anything?

Share this post


Link to post
Share on other sites