Archived

This topic is now archived and is closed to further replies.

codemonkey

Hash tables...ah yes :)

Recommended Posts

codemonkey    122
Ok, I''ve worked with hash tables a little, but I have never implemented one by myself. I got the theory all down *except* one point....prime numbers! I understand why the elements have to be a prime number. Here is the issue... I want to make a scalable hash table so that if you make the hash with 7 values, and then you decide that the table is too small for you (it fills over halfway full) and want to resize it...what size do you make it? You need a prime number, but you don''t have one... The best and what I would consider the "cheesies" solution was to build a list of prime numbers that are all roughly double in size. Then if I need to expand the table, pick the next highest one. Is there a better way? Or should I just make the table one big size, and allow for more "collisions" (link lists of entries in each hash). Or am I being a bone head and not seeing the obvious? Thanks, Codemonkey

Share this post


Link to post
Share on other sites
Dogfood    122

The reason you use prime numbers is for certain collision resolving algorithms(mostly). Anyway, they don''t _have_ to be that size, it just helps certain algorithms. If you use stacks to resolve collisions, then the benefits are mostly gone.


If you resize the table later, you still have to rehash everything. I would suggest that if you know the order of magnitude of the "average" case, optimize your initial table size for that range.


A good book on algorithms(available from any university bookstore of a school offering first year computer science courses) is a must have for any programmer(reference value alone), and it explains why primes are good for certain hashing algorithms. Anyway, you''re hash function ought to not make a lot of collisions .

Share this post


Link to post
Share on other sites
codemonkey    122
I was just thinking as a general utility a simple hash table would be nice to have for my needs. I guess I could just pass the size that I believe the table will be needed to be during the initialization.

I believe I might have a dusty old Data Structures book floating around that should have them Have to see if I can find it.

Thanks,

Derrick

Share this post


Link to post
Share on other sites
Dogfood    122

You could always use a cryptographic hash(MD5?) function, they''re designed to prevent collisions . Actually, probably just take every 2 bytes of your string and keep a running XOR, pad the last block with the character from the beginning of the string. Mod your hash table size.



Oh yeah, I recalled why primes were good, FYI, if you jump ahead until you find an empty block, the algorithm people suggest using a quadratic function(look 1 ahead, 2 ahead, 9 ahead, 16 ahead). Anyway, you won''t run into factors of anything producing collisions, something like that, flame me, whatever, just trying to be informative .

Share this post


Link to post
Share on other sites