Micro-compression algorithms

Started by
10 comments, last by eq 18 years, 4 months ago
Quote:The compression algorithm would be applied per record

Unless your records are of a substantial size, you'll inflate your file trying to compress each record seperately. Compression works by removing repetition. If there's almost no repetition within a given sample of data, almost no gain can be made from compressing it. If your records are small, the sample will be too small to compress very much. You'll be working with a simple compression scheme too, which unlike composite formats like zip, won't default to uncompressed data when there's no bonuses to be made. Instead, it'll inflate your record size. A worst case for LZ77 compression for example would inflate your data by 12.5%.

In order to get substantial benefits from compression, you would need to compress sets of records together. This would require increased processing power to decompress/compress this larger data block, and also increased memory to store the decompressed data. I think you'd probably find compression will prove more of a hinderence than a help. It depends on a lot of details we don't have however.
-How many records do you expect to have?
-What is the nature of the data within them (eg, mostly strings, lots of integers with low numbers)
-How big do you expect them to be, in bytes?
-How large are the tables?
-What do you expect the overall filesize of the database to be?
-What are the specs of these units in terms of storage space, RAM, and processing power?
-Will users need to modify records (I'm guessing yes) or simply view them?

At any rate, if your records are going to be less than 80 bytes, I'd say you'll need to compress sets of records, perhaps entire tables, together in one block. This will increase processing power required to access and modify these records significantly. If storage space and CPU usage are both equally at a premium, I simply don't think your units are powerful enough.
Advertisement
Made an slightly more advanced implementation that predicts the number of bits to use for length/offset based on the previous byte.
It needs 256 bytes on the stack and is a bit slower.
The same files compresses like this:
Source: C:\WINDOWS\ideinit.cfg Compress: 146 to 115 bytes (78.77%) Decompress: 115 to 146 bytes (126.96%) Data verified, all ok!Source: C:\WINDOWS\Blue Lace 16.bmp Compress: 1272 to 428 bytes (33.65%) Decompress: 428 to 1272 bytes (297.20%) Data verified, all ok!Source: C:\WINDOWS\win.ini Compress: 1014 to 521 bytes (51.38%) Decompress: 521 to 1014 bytes (194.63%) Data verified, all ok!Source: C:\WINDOWS\hh.exe Compress: 10752 to 4634 bytes (43.10%) Decompress: 4634 to 10752 bytes (232.02%) Data verified, all ok!Source: C:\WINDOWS\system.ini Compress: 227 to 135 bytes (59.47%) Decompress: 135 to 227 bytes (168.15%) Data verified, all ok!Source: C:\WINDOWS\desktop.ini Compress: 2 to 3 bytes (150.00%) Decompress: 3 to 2 bytes (66.67%) Data verified, all ok!Source: C:\WINDOWS\Runservice.exe Compress: 2560 to 768 bytes (30.00%) Decompress: 768 to 2560 bytes (333.33%) Data verified, all ok!Source: C:\WINDOWS\clock.avi Compress: 82944 to 42413 bytes (51.13%) Decompress: 42413 to 82944 bytes (195.56%) Data verified, all ok!

Ping if you're interesetd in source.

This topic is closed to new replies.

Advertisement