Back to General and Gameplay Programming

Micro-compression algorithms

Prozak · 2005-12-20T06:16:35

Here's some background. I'm coding a micro-database for Pocket PCs for my company. Yes, it has to be something done in-house. Basicly I chose an auto-balancing AVL Tree ([1] [2]), wich stores the index fields of that particular table, and then each node in the tree points to rest of the fields on that record. All of this is stored in a file format that is designed in such a way that if I chose to, I do not need to load the AVL tree into memory, if all I want to do is search for a particular record. In case I need to add or delete nodes, then I'll need to load the AVL tree, but I'm still working on that bit too. So, one of the final things to take into consideration is the final size of the file. Due to the adoption of this new DB format, the file size has grown a bit, and so I'm looking for easy to implement, fast lossless compression agorithms. The compression algorithm would be applied per record so that, whenever I searched for a certain item in the DB, it would go something like this: * Locate Client record with Client ID = 11223344 * (Client record is found) Get pointer from Node to the remaining fields (remember, the Node only stores the index field, in this case the Client ID) * Decompress the remaining fields * We're ready to access the Client's Data So, I'm looking for micro-compression algorithms, something similar to Tiny Encryption Algorithm (TEA), but in the compression world. Thanks for any tips on this... [wink]

General and Gameplay Programming Programming

Started by Prozak December 19, 2005 09:02 AM

10 comments, last by eq 18 years, 4 months ago

Nemesis2k2

1,045

December 20, 2005 12:59 AM

Quote:The compression algorithm would be applied per record

Unless your records are of a substantial size, you'll inflate your file trying to compress each record seperately. Compression works by removing repetition. If there's almost no repetition within a given sample of data, almost no gain can be made from compressing it. If your records are small, the sample will be too small to compress very much. You'll be working with a simple compression scheme too, which unlike composite formats like zip, won't default to uncompressed data when there's no bonuses to be made. Instead, it'll inflate your record size. A worst case for LZ77 compression for example would inflate your data by 12.5%.

In order to get substantial benefits from compression, you would need to compress sets of records together. This would require increased processing power to decompress/compress this larger data block, and also increased memory to store the decompressed data. I think you'd probably find compression will prove more of a hinderence than a help. It depends on a lot of details we don't have however.
-How many records do you expect to have?
-What is the nature of the data within them (eg, mostly strings, lots of integers with low numbers)
-How big do you expect them to be, in bytes?
-How large are the tables?
-What do you expect the overall filesize of the database to be?
-What are the specs of these units in terms of storage space, RAM, and processing power?
-Will users need to modify records (I'm guessing yes) or simply view them?

At any rate, if your records are going to be less than 80 bytes, I'd say you'll need to compress sets of records, perhaps entire tables, together in one block. This will increase processing power required to access and modify these records significantly. If storage space and CPU usage are both equally at a premium, I simply don't think your units are powerful enough.

656

December 20, 2005 06:16 AM

Made an slightly more advanced implementation that predicts the number of bits to use for length/offset based on the previous byte.
It needs 256 bytes on the stack and is a bit slower.
The same files compresses like this:

Source: C:\WINDOWS\ideinit.cfg Compress: 146 to 115 bytes (78.77%) Decompress: 115 to 146 bytes (126.96%) Data verified, all ok!Source: C:\WINDOWS\Blue Lace 16.bmp Compress: 1272 to 428 bytes (33.65%) Decompress: 428 to 1272 bytes (297.20%) Data verified, all ok!Source: C:\WINDOWS\win.ini Compress: 1014 to 521 bytes (51.38%) Decompress: 521 to 1014 bytes (194.63%) Data verified, all ok!Source: C:\WINDOWS\hh.exe Compress: 10752 to 4634 bytes (43.10%) Decompress: 4634 to 10752 bytes (232.02%) Data verified, all ok!Source: C:\WINDOWS\system.ini Compress: 227 to 135 bytes (59.47%) Decompress: 135 to 227 bytes (168.15%) Data verified, all ok!Source: C:\WINDOWS\desktop.ini Compress: 2 to 3 bytes (150.00%) Decompress: 3 to 2 bytes (66.67%) Data verified, all ok!Source: C:\WINDOWS\Runservice.exe Compress: 2560 to 768 bytes (30.00%) Decompress: 768 to 2560 bytes (333.33%) Data verified, all ok!Source: C:\WINDOWS\clock.avi Compress: 82944 to 42413 bytes (51.13%) Decompress: 42413 to 82944 bytes (195.56%) Data verified, all ok!

Ping if you're interesetd in source.

Micro-compression algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Micro-compression algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines