Sign in to follow this  
RuneLancer

Generating an ID

Recommended Posts

I'm trying to acheive something at work but can't quite figure out how to do it. Frankly, I'm not even sure where to start looking. A few pointers or some advice would be quite welcomed. I'm essentially trying to generate a sort of identifier I can tag onto a file that will vary very little as the file changes. Kinda like MD5, but not something that would change completely if you were to change an 'A' into a 'B' or insert a few bytes, or whatnot. For instance, with a theorical algorithm, I could take a file containing "The quick brown fox jumped over the lazy dog." and it would output, say, A0 F2 55 1C 00 08 E5 32. Then I could change the file to contain "The quick brown fox jumped over the lazy cat." and I'd end up with A0 F2 55 1C 00 08 E6 4E. For the sake of context, this would more or less be to create some sort of "difference map" where one file could quickly be looked up by this ID in a database and similare files could be acquired by looking for those with a similare ID.

Share this post


Link to post
Share on other sites
So you want a hash that changes proportionally to the amount of change in the file? You could just sum the data and do a modulus or something like that.

Are you comparing text or binary data?

Share this post


Link to post
Share on other sites
I haven't read about this kind of stuff, but an easy way to do something like that, would be to take the sum of all characters. If the sum gets larger than your last ID you just start at 0 again. So if we use ASCII to translate the characters to numbers and we take your example:
The quick brown fox jumped over the lazy cat. = 84 + 104 + 101 + 32 + 113 + 117 + 105 + 99 + 107 + 32 + 98 + 114 + 111 + 119 + 110 + 32 + 102 + 111 + 120 + 32 + 106 + 117 + 109 + 112 + 101 + 100 + 32 + 111 + 118 + 101 + 114 + 32 + 116 + 104 + 101 + 32 + 108 + 97 + 122 + 121 + 32 + 99 + 97 + 116 + 46 = 4187.

Now if we change car to dog.
The quick brown fox jumped over the lazy dog. = 84 + 104 + 101 + 32 + 113 + 117 + 105 + 99 + 107 + 32 + 98 + 114 + 111 + 119 + 110 + 32 + 102 + 111 + 120 + 32 + 106 + 117 + 109 + 112 + 101 + 100 + 32 + 111 + 118 + 101 + 114 + 32 + 116 + 104 + 101 + 32 + 108 + 97 + 122 + 121 + 32 + 100 + 111 + 103 + 46 = 4189
There is only 2 in difference because dog and cat is so similar. If we instead try with a rhino.
The quick brown fox jumped over the lazy rhino. = 84 + 104 + 101 + 32 + 113 + 117 + 105 + 99 + 107 + 32 + 98 + 114 + 111 + 119 + 110 + 32 + 102 + 111 + 120 + 32 + 106 + 117 + 109 + 112 + 101 + 100 + 32 + 111 + 118 + 101 + 114 + 32 + 116 + 104 + 101 + 32 + 108 + 97 + 122 + 121 + 32 + 114 + 104 + 105 + 110 + 111 + 46 = 4419
Now there is over 200 in difference (each lowercase letter gives around 100 and rhino is two more letters).

This might not be such a good idea for small files since you will get a very low idea. Maybe if you give characters values like 40036, 40100 etc. then it would be better to use with small files.

EDIT: Damn Scuppy beat me because I just had to write a little program to convert a string to a lot of numbers.

Share this post


Link to post
Share on other sites
Interesting.

Consider word count, and word comparison where you keep the number of words that are different in the same sequence. The issue is what if the files are opposites?

a b c
c b a

Kuphryn

Share this post


Link to post
Share on other sites
I tried summing up the bytes, but the results weren't very satisfactory (particularly for exceedingly large files, such as in the order of a few hundred megs; we don't expect to work with these too often though ;) ) The main problem was with strings like...

"This is a test."
"Tbs hsi ruteh ."

The two strings are very different but sum up the same. Still, it's a good start.

The link nmi posted seems VERY insightful, however. My day's pretty much over but I'll give it a shot at home and see what comes out of it. Thanks :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this