Sign in to follow this  

Very large file compression?

This topic is 2336 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I had an idea just now about compressing large files and I wanted to see whether it would be possible or not. Files are represented as a string of bytes, which means that a file should be able to be converted into a very large number right? So if you converted your file into a very large number, then divided that number by another number to produce a number that could fit into a small preset file size (like say 1kb). Couldn't you just store the number you divided by, and the result of the division in the file. Then undo when you load the file again? Wouldn't this make it possible to compress any file into 1kb?

Share this post


Link to post
Share on other sites
No.

Try it as an example. Suppose your input file converts to the number 2895389, by whatever magic. Now divide by some other number, like 58883. This comes out to 49.171900208888813409642851077561...

Clearly, it takes more data to [i]precisely[/i] represent the result of the division than it does to represent the original file - and that's just for a really small input!

Share this post


Link to post
Share on other sites
[quote name='3DModelerMan' timestamp='1311130995' post='4837767']
I had an idea just now about compressing large files and I wanted to see whether it would be possible or not. Files are represented as a string of bytes, which means that a file should be able to be converted into a very large number right? So if you converted your file into a very large number, then divided that number by another number to produce a number that could fit into a small preset file size (like say 1kb). Couldn't you just store the number you divided by, and the result of the division in the file. Then undo when you load the file again? Wouldn't this make it possible to compress any file into 1kb?
[/quote]

lets take a simple example for this:

Lets say your file is the number 10000
if you divide it by 10 you can store

10
1000

this takes more space than 10000

if the number is 11916 and you divide by 9 you have to store
9
1324

which is the same size,

but, since your numbers will be arbitrarily large you also need to store the size of the first number which means that even in the cases where you can find an integer divisor for your files number the end result will always be larger than the original. (in the cases where there is no integer divisor it simply won't work at all).

The only way to achieve compression is to either remove information (this form of compression is used for audio, video, etc where some quality loss is acceptable or unnoticable) or to locate repeat patterns or frequency of symbols/patterns and either describe the patterns or find a way to represent commonly occuring symbols using less data (This requires that you use more data to describe the symbols that occur less frequently) (or combinations of both, if you take an image and adjust the colors so that you get larger blocks or same color pixels you can describe those blocks using less data for example)

For text files a good method of compression is to build a binary tree with all the characters used in the leaves, commonly occuring characters should be in leaves high up in the tree while rare ones get placed at the bottom, then you just have to store the tree itself and the "path" one takes in the tree to reach the characters (0 for left, 1 for right). for other types of data other methods can be far better though, Some types of data benefit greatly from run length encoding where you basically store the number of times a symbol repeats itself before the symbol itself (so aaaaabbaaaaaa would be stored as 5a2b6a).

completely random data is naturally uncompressable.

Share this post


Link to post
Share on other sites
It still puzzles me that programmers don't understand the [url="http://en.wikipedia.org/wiki/Lossless_data_compression#Lossless_compression_methods"]pigeon-hole principle[/url]. No magic in this world will ever violate that.

Share this post


Link to post
Share on other sites

This topic is 2336 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this