perfect string hashing function?

Started by
6 comments, last by transformation 19 years, 10 months ago
are there any? I currently use this most of the time: int h = 0; while( *str ) { h = 3 * h + *str; str++ } // h is the hashed up string then. I havent experience any collisions so far, but i bet i will someday and i use this quite extensively. So what''s the best string hashing function?
Advertisement
That will work for long strings but may fail for short strings. Try using a standard crc-32 algorithm. There are lots of places to find one (but I don't know any off the top of my head). A very popular one is the pkzip crc. Search for "pkzip crc". Never mind, here it is:
/// This function creates a nearly unique 4-byte value from a string./// The chances of two different strings returning the same value is/// 1 / 2^32. The primary purpose of this function is to create a hash/// table key from the string. Hash tables require a unique key, but this/// is probably good enough.////// @param	sString	The string////// @return		A 4-byte value generated from the stringUINT32 StringToKey( char const * sString ){	UINT32 key	= 0xffffffff;	while ( *sString != 0 )	{		key = _HashStep( key, *sString );		++sString;	}	return key;}// Generates a hash value for the given key.UINT32 _Hash( UINT32 key ){	// 32-bit CRC calculation based on the PKZip CRC-32 algorithm. 	UINT32	crc;	crc = 0xffffffff;	crc = _HashStep( crc, key       );	crc = _HashStep( crc, key >>  8 );	crc = _HashStep( crc, key >> 16 );	crc = _HashStep( crc, key >> 24 );	return ( ~crc );}// Generates an intermediate hash value for the given byte.static UINT32 _HashStep( UINT32 crc, UINT8 byte ){	// 32-bit CRC calculation based on the PKZip CRC-32 algorithm. 	static UINT32	crctable[ 256 ] =	{		0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,		0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,		0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,		0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,		0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,		0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,		0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,		0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,		0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,		0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,		0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,		0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,		0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,		0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,		0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,		0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,		0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,		0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,		0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,		0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,		0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,		0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,		0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,		0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,		0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,		0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,		0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,		0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,		0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,		0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,		0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,		0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d	};	return ( ( crc >> 8 ) ^ crctable[ ( crc ^ byte ) & 0xff ] );} 


John Bolton
Page 44 Studios
Current project: NHL Faceoff 2005 PS2

[edited by - JohnBolton on May 31, 2004 10:56:22 PM]
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
If you''re gonna use CRC32, I might also suggest you read up on it as a little fun exercise. While it may take a while to get used to all the low-level bit operations and how the algorithm works, it''s a lot of fun, and you learn something new
thanks for that. I read the article, understood a bit here and there.

JohnBolton:

With the code you gave, what''s the _Hash function for. It seems I wont need it to hash up the provided string with the StringToKey function. So I''m still a bit confused. One more question just to clear this _Hash function up should do it...

char* str1 = "some string to hash";char* str2 = "this is not the same";DWORD k1 = StringToKey( str1 );DWORD h1 = _Hash( k1 );DWORD k2 = StringToKey( str2 );DWORD h2 = _Hash( k2 );// now to compare the two strings would I do this?if( h1 == h2 ); //then strings are same// or would I do this?if( k1 == k2 ); //then strings are equal


thanks again.
"Perfect hashing" has a specific meaning and is probably not what you''re looking for. If you know in advance the complete set of strings you are using you can generate a perfect hashing function that is guaranteed to produce a different value for every one of those strings. You don''t normally know the complete set of strings you''ll be dealing with in advance though so something like CRC32 is more generally useful. A good general purpose string hashing function makes collisions very unlikely but can never guarantee there will be no collisions.

Game Programming Blog: www.mattnewport.com/blog

Perfect Hash Function Generator.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
quote:Original post by transformation
With the code you gave, what''s the _Hash function for. It seems I wont need it to hash up the provided string with the StringToKey function. So I''m still a bit confused.


They basically do the same thing. One generates a hash value for a string, the other generates a hash value for a 4-byte integer.



John Bolton
Page 44 Studios
Current project: NHL Faceoff 2005 PS2
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
The reason why the OP''s hash function is bad, especially for short strings, is that source bits don''t propagate to the higher bits of your hash result (since the multiplier is too low), so your hash key space isn''t used well.
You don''t have to change much though - 2 good functions are:
SDBM: multiply by 0x1003f;
FNV: multiply by 0x1000193 and xor-in the next character

See comparison of string hash functions.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement