Very large file compression?

Started by
4 comments, last by szecs 12 years, 9 months ago
I had an idea just now about compressing large files and I wanted to see whether it would be possible or not. Files are represented as a string of bytes, which means that a file should be able to be converted into a very large number right? So if you converted your file into a very large number, then divided that number by another number to produce a number that could fit into a small preset file size (like say 1kb). Couldn't you just store the number you divided by, and the result of the division in the file. Then undo when you load the file again? Wouldn't this make it possible to compress any file into 1kb?
Advertisement
What would you do if that file's number turned out to be prime?
No.

Try it as an example. Suppose your input file converts to the number 2895389, by whatever magic. Now divide by some other number, like 58883. This comes out to 49.171900208888813409642851077561...

Clearly, it takes more data to precisely represent the result of the division than it does to represent the original file - and that's just for a really small input!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

that is simply impossible.the core of comp is finding out replica and then kill them

I had an idea just now about compressing large files and I wanted to see whether it would be possible or not. Files are represented as a string of bytes, which means that a file should be able to be converted into a very large number right? So if you converted your file into a very large number, then divided that number by another number to produce a number that could fit into a small preset file size (like say 1kb). Couldn't you just store the number you divided by, and the result of the division in the file. Then undo when you load the file again? Wouldn't this make it possible to compress any file into 1kb?


lets take a simple example for this:

Lets say your file is the number 10000
if you divide it by 10 you can store

10
1000

this takes more space than 10000

if the number is 11916 and you divide by 9 you have to store
9
1324

which is the same size,

but, since your numbers will be arbitrarily large you also need to store the size of the first number which means that even in the cases where you can find an integer divisor for your files number the end result will always be larger than the original. (in the cases where there is no integer divisor it simply won't work at all).

The only way to achieve compression is to either remove information (this form of compression is used for audio, video, etc where some quality loss is acceptable or unnoticable) or to locate repeat patterns or frequency of symbols/patterns and either describe the patterns or find a way to represent commonly occuring symbols using less data (This requires that you use more data to describe the symbols that occur less frequently) (or combinations of both, if you take an image and adjust the colors so that you get larger blocks or same color pixels you can describe those blocks using less data for example)

For text files a good method of compression is to build a binary tree with all the characters used in the leaves, commonly occuring characters should be in leaves high up in the tree while rare ones get placed at the bottom, then you just have to store the tree itself and the "path" one takes in the tree to reach the characters (0 for left, 1 for right). for other types of data other methods can be far better though, Some types of data benefit greatly from run length encoding where you basically store the number of times a symbol repeats itself before the symbol itself (so aaaaabbaaaaaa would be stored as 5a2b6a).

completely random data is naturally uncompressable.
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!
It still puzzles me that programmers don't understand the pigeon-hole principle. No magic in this world will ever violate that.

This topic is closed to new replies.

Advertisement