Can't reverse new compression algorithm

sooner123 · 2011-04-27T17:56:26

I'm working on a new type of compression algorithm and have run into some difficulties. Basically what I do is this: Given raw data: D I compute the MD5 hash: H = MD5(D) At this point I've compressed by a factor of Size(D) / Size(H) (which for any data larger than 128-bits is significant) My problem is uncompressing the data. Currently what I'm doing is as follows: (keep in mind i'm working with bit values here) G = {0}[trillion] while (MD5(G) != H) G++; I know this has the problem of unintentionally hashing the preceding zeros but the main issue I'm running into is runtime. So far 20-30 bit lengths have run in feasible time but when I tried to run this with a 32-bit guess, it took several minutes to verify. My MD5 implementation may be inefficient but it's also possible that this algorithm suffers from a runtime of 2^n. Any tips? Or is this simply an unfeasible compression algorithm?

For Beginners

Started by sooner123 April 26, 2011 05:33 PM

21 comments, last by sooner123 12 years, 11 months ago

sooner123

269

Author

April 27, 2011 03:56 PM

[quote name='sooner123' timestamp='1303918985' post='4803576']
I actually know very little about the mechanics of the hashing algorithm itself. And I couldn't find much online. I'll keep looking.

If you can't find anything, then you are obviously trolling, because you aren't trying very hard. Wikipedia alone has tons of information as well as pseudocode implementations of both md5, sha-1, and sha-2 (all of which I have used to implement these in C++), along with links to the FIPS or RFC documents which explain the concepts and inner workings of them all.

You have asked several times "Is this infeasible", to which every time the answer has been "Yes, it is infeasible", but it is obviously falling off deaf ears.
[/quote]

Sorry to waste your time then. It was an idea. Nothing more.

(have a lot to learn) != (trolling) imo.

It's hard to know the right places to look and what to look for when you are just getting into this stuff.

Even so, http://en.wikipedia.org/wiki/MD5#Pseudocode is pretty damn cryptic to me.

I won't lie though. All the links and references so far in this thread have taught me a lot.

I'm still not convinced things can't be done the way I'm suggesting. But I can see that MD5 is not the way to do it.

KulSeran

3,272

April 27, 2011 04:14 PM

I would really like to see if there would be a way to work on a hash algorithm that was not focused on nonreversability but more focused on uniformity to avoid the (small change) -> (same hash) problem.

That wouldn't change anything. All a hash does is mix the bits of source data to attempt to come up with a "unique" number that identifies that data, but in doing so, it throws away information. In throwing away information, you insure that your hash isn't really unique. It is just hard to make a file that has the same hash. Hard, but not impossible. If you have a 4byte perfect hash, and hash a 5 byte string, there are 256 4byte prefixes depending on the value of the 5th byte. Without knowing what the 5th byte's value is, you can't figure out the value of the first 4 bytes by using the hash function. Now, compound that problem when you add more bytes.

To give an example from school, we were shown how you can take a c++ "hello world" program and make it say "goodbye world" but have the same CRC32 hash value. Simple, replace the string, and toss a /* garbage comment */ at the end of the file. Fill the comment with with bits till it comes up with the same hash. Without a visual inspection by a human, you'd never know the file'd been tampered with. You can compile it to verify that it works, and the same CRC32 hides the fact that you did tamper with it. You can do the same thing for most files. You can edit a JPG image, and then fill the comments with garbage till the hash is the same. With a bitmap you could make some big edits, then twiddle around with the least significant bits of every pixel, as the human eye isn't going notice the differences. Now you have an image and a shopped image that both have the same SHA-256 hash. A good shop is hard to tell from the origional, and they have the same hash, so what image is the one you were looking for?

My Blog

sooner123

269

Author

April 27, 2011 05:56 PM

Wow. I really feel like a complete idiot.

Everything you guys have been saying was going right over my head.

Not sure why but something just clicked.

If I had 256-bit data, hashed it to 128-bit hash. I'd have to check 2^128 guesses just to find a collision with even probability (assuming hash function had a uniform range)

And even then.. there would be on average about 2^128 collisions ( 2^(256 - 128) )

edit: i think the "how long this will take to run even when implemented efficiently" part of my brain simply turned off.
edit: and also.. no matter how uniform the hash function is.. obviously there should still (statistically speaking) be tons of instances of (small change) -> (different hash) rendering this utterly infeasible.

Can't reverse new compression algorithm

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Can't reverse new compression algorithm

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines