hashing/checksum proof

aboeing · 2003-11-15T03:35:37

Hi, Is there some way to "prove" the "goodness" of a hashing/checksum algorithm? For example, say I was calculating a checksum in two different ways: 1) byte checksum += byte_buffer[i]; 2) byte checksum ^= byte_buffer[i]; Would there be some way of determining which one is "better"? Is the only way to do this by passing the actual data into the functions and seeing how well distributed the resulting checksums are? Then, given that building the entire possible dictionary of data is impossible, whats the best way of manually constructing a subset (in order to ensure you try out all the 'tricky' cases)? edit: added code tags [edited by - aboeing on November 10, 2003 1:36:53 AM]

Math and Physics Programming

Started by aboeing November 09, 2003 10:16 PM

11 comments, last by aboeing 20 years, 5 months ago

aboeing

180

Author

November 14, 2003 09:23 AM

PS Wouldn''t it be >= LARGE?
Yep. =P

The adding method isn''t very good either. consider: "ABS" "BAS"
Thats a very good point.
The method you propose would probably work very well for generating a single character hashing value, but what I actually had in mind was something like

uint32 x;char *string;uin8 *z=&xfor (i=0;i<strlen(string);i++)z[i&3]+=string;<br>//but this is probably better in all cases:<br>z[i&3]+=5*z[i&3] + string;<br></pre><!–ENDSCRIPT–><br>i want to keep the hashing function fast..<br>thanks for the help guys! ill try out a couple <img src="smile.gif" width=15 height=15 align=middle><br><br><br>

me:www.adrianboeing.com

Xai

1,851

November 14, 2003 09:42 AM

another thing to consider when judging a hash is the hash range, compared to the data set variation ... for instance, if hashing down to 1 byte codes, the absolute BEST you can do is cause hashing of different sets of data to seperate into the 256 buckets fairly evenly ... so the key for a good algorithm like this is an algorithm that looks good on YOUR DATA (because there is no algorithm that looks good on ALL DATA) ...

it is impossible to create a hash for which some types of data will not hash improperly (meaning bunch up onto a smaller subset of hash results) ...

all hashes can only (easily) be judges by the following - how they hash RANDOM DATA, or how they hash data with a certain shape (aka WORDS, or SORTED DATA, or whatever you are dealing with) ...

the key is NOT always how good the algorithm handles SMALL CHANGES, because sometimes that is not what your data set consists of ... that IS a way to objectively judge between fairly equal hashing algorithms for RANDOM DATA that are already proven to distrubute a large set fairly evenly (once the overall distrobution is detemined, the small change rule is usefull because of tradition, traditionally people were using checksums to find (usually small) errors in saving or transmission ... in these cases small error detection is critical ... but for a given domain, this may not apply)

for instance ... many domain specific compression formats such as MP3 and JPG do NOT change based on 1 bit source changes, because their goal is the big picture become small ... which is the generic goal of all hashing ... but they want similar to stay similar, oposite of hashing used to distribute a set into buckets ...

the real thing I wanted to say is ... an algorithm which turns ABC and CAB into the same thing may be fine for random data, but would NOT be acceptable to most uses dealing with text exclusively ... when dealing with text, it is almost always benificial to have some form of shift in the algorithm, in order to move the subset domain (ASCII does not use all 256 source characters) into a complete destination range ...

sadwanmage

122

November 15, 2003 03:35 AM

Having a hash change signifcantly for a small change practically guarrantees all the various other things, like even distribution. (Thats not to say its impossible, just unlikely).

I don''t believe MP3s and JPGs using hashing algorithsm. I think the disticntion is hashes never involve recoverable data purely from the hash.

hashing/checksum proof

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

hashing/checksum proof

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines