Sign in to follow this  
dcosborn

zip LZW decompression

Recommended Posts

I'm working on a zip-file loader for my game. At the moment, I'm dealing with compression method #1 (shrink) which uses a slightly modified LZW algorithm. I've never dealt with LZW and I'm wondering how to determine when I get to the end of the compressed data stream. In most cases, the "local file header" provides the size of the compressed data which follows it, but sometimes, its deferred until after the data block. So in some cases, I need to parse the LZW data without knowing its compressed size. I'm guessing there's some kind of special EOF-like marker I haven't read about. Does anyone know something about this that I don't? I realize there's mini-zip and info-zip, but AFAIK neither provides both a library interface and up-to-date zip format support, so I'm writing my own.

Share this post


Link to post
Share on other sites
Unless you're doing this purely for educational reasons, use zlib. You can't imagine how widely used this library is. If you are doing this for educational reasons, I'd suggest you design your own format based on LZ77, especially as LZW is patented. LZ77 is slow to compress, but extremely fast to decompress, which makes it quite suitable for static content in games. My implementation takes around 3 minutes to compress a sample 125MB file, and 3.6 seconds to decompress and load that same file into memory. The resulting file is 43.6MB too, compared to the 35.5MB zip file on maximim compression, which is good enough for me.

Share this post


Link to post
Share on other sites
Where have you been; the LZW patent has expired. ;)

I'm using zlib for the "deflate" compression method (#8), but I'm trying to deal with the "shrink" method ATM, which is a variation on LZW (I'm doubtful that zlib can parse it). The PKWARE zip spec details all this. Anyway, I probably going to attempt some hex-editor examination of a shrink formatted zip and see if I can find an EOF marker. Otherwise, I'll just reject any zip files that try to defer the necessary information (these should be rare anyway).

Your LZ77 implementation sounds quite fast. I'll have to look into that. Do you happen to know of any file formats that deal primarily with LZ77? I may want to add loaders for them as well. Thanks for your help.

Share this post


Link to post
Share on other sites
Thanks I hadn't seen that before. I'm still a bit confused over how to optimize the search part of the algorithm so this should help. I saw one source suggest hashing; of course I have no idea how hashes actually work. So this should all be pretty educational if I can get it to work.

Share this post


Link to post
Share on other sites
Quote:
Where have you been; the LZW patent has expired. ;)

Thank god. You'll have to excuse me, I got my grounding in compression theory back in 2001/2002.

Quote:
I'm using zlib for the "deflate" compression method (#8), but I'm trying to deal with the "shrink" method ATM, which is a variation on LZW (I'm doubtful that zlib can parse it). The PKWARE zip spec details all this. Anyway, I probably going to attempt some hex-editor examination of a shrink formatted zip and see if I can find an EOF marker. Otherwise, I'll just reject any zip files that try to defer the necessary information (these should be rare anyway).

Is there any reason you need to be able to load every possible form? At any rate, if someone makes an archive that uses a tool-specific (or at least, not widely supported) form of compression, they shouldn't expect your program to be able to load it.

Quote:
Do you happen to know of any file formats that deal primarily with LZ77? I may want to add loaders for them as well.

If you mean standard compression formats, not off the top of my head. My interest in compression centers around the individual methods and theories, not on composite "smart" formats like zip, rar, and the like.

Share this post


Link to post
Share on other sites
Quote:
Original post by dcosborn
Thanks I hadn't seen that before. I'm still a bit confused over how to optimize the search part of the algorithm so this should help. I saw one source suggest hashing; of course I have no idea how hashes actually work. So this should all be pretty educational if I can get it to work.

To speed up searches in my LZ77 compressor (searching is by far the biggest bottleneck in LZ77 compression), I used an array of linked lists, with a two-byte index. The way it works is for every location within your compression window, you read the two-byte pair at that location, and use it as an index into the table. As locations enter the window, you add that location. As locations fall out of the window, they are removed. When searching for a match, you don't check every location within the sliding window, you simply index into the table and check each stored location. This greatly accelerated the compression process.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this