Jump to content
  • Advertisement
Sign in to follow this  
Lode

Huffman encoding

This topic is 4220 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, When Huffman encoding, would it give a higher compression, to use a single huffman tree for the whole datastream, based on the frequency of the values in the whole stream, or is it better to calculate a separate tree for different chunks of the data stream (e.g. every 64K)? Also, what is a good way to store such a Huffman tree in C++ code? Currently I use an array that contains of each bit to what node of the tree to jump, but I was wondering if there exist better and/or faster implementations. Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
Quote:
Original post by Lode
Hi,

When Huffman encoding, would it give a higher compression, to use a single huffman tree for the whole datastream, based on the frequency of the values in the whole stream, or is it better to calculate a separate tree for different chunks of the data stream (e.g. every 64K)?


Depends on the data. If you don't expect the character distribution to vary much depending on location, a single tree is probably better if you want smaller output. If you expect the frequencies to change a lot every 64k or so, the second method might be better.

It kind of depends on if you are compressing some special kind of data, or just want to make a generic compressor.

Quote:

Also, what is a good way to store such a Huffman tree in C++ code? Currently I use an array that contains of each bit to what node of the tree to jump, but I was wondering if there exist better and/or faster implementations.

Thanks.


I remember doing it by storing the frequency table itself (scaled to 8 bit values, of course) in a 256 byte array from which the tree can be built again.

I don't remember the details, but google "canocical huffman". Unless i misunderstood it completely when i heard about it, you ahve to make sure you build the "canonical huffman tree" for your frequencies, which it is possible to store in smaller space. As far as I know, Zip and friends use it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
Hi,

When Huffman encoding, would it give a higher compression, to use a single huffman tree for the whole datastream, based on the frequency of the values in the whole stream, or is it better to calculate a separate tree for different chunks of the data stream (e.g. every 64K)?

If the chunks each have a different structure (meaning the characteristic symbol frequencies are different), then encoding them seperately can result in better compression. If your chunks are just arbitrarily chosen (i.e. every 64kb from a 1mb file), then compression ratio would probably decrease compared to compression all at once.

Quote:

Also, what is a good way to store such a Huffman tree in C++ code? Currently I use an array that contains of each bit to what node of the tree to jump, but I was wondering if there exist better and/or faster implementations.

Read here:
http://en.wikipedia.org/wiki/Canonical_Huffman_code

Share this post


Link to post
Share on other sites
Storing the tree explicitly is surely the easiest method to understand.

For decoding, I think a more optimized version might build lookup tables. The keys would be bitstrings of up to a certain length and the values would contain information about the decoded characters and the state the decoder should enter to continue decoding.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!