Jump to content
  • Advertisement
Sign in to follow this  
_Sigma

Hashtable size - prime or power of 2?

This topic is 3985 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

My prof is teaching that a hashtable should be a size m, where m is prime. A quick look on wikipedia says this:
Quote:
[...]Some older hashes are even worse, requiring table sizes to be a prime number rather than a power of two[...]such a requirement is a sign of a fundamentally weak function
Now, this makes me worry! In their explanation of why, the author of the wiki entry says that
Quote:
[...][the]conversion from the hash value (typically 32 bits) to a bucket index for a particular-size hash table can be done simply by masking, preserving only the lower k bits for a table of size 2k (an operation equivalent to computing the hash value modulo the table size).
This
Quote:
[...]property enables the technique of incremental doubling of the size of the hash table - each bucket in the old table maps to only two in the new table.
Does a table that is prime in size not allow for this? As well, when we double the table size, how do we sort out which of the 2 spots a number can map to?

Share this post


Link to post
Share on other sites
Advertisement
The actual implementation of the hashing function isn't so important as learning the concept of using a one way hash to aid in finding an object in a collection.

Quote:

how do we sort out which of the 2 spots a number can map to?


Hash it again with the new function.

Share this post


Link to post
Share on other sites
As to the question of prime or power-of-two table size:
the tradeoffs are speed (& is faster than %) vs. increased usage of the hashed bits (& only considers the lowest bits, % folds in the upper ones as well).

*requiring* prime size is a sign of weakness, yes. However, that does not mean table sizes should always be prime; provided the lower bits of the hash values are adequately mixed, they can be used as the actual table index.

Share this post


Link to post
Share on other sites
To resize a table to prime sizes, you'd need a list of primes to use. That's just very slightly more complicated than doubling the size. I could imagine doing some tricky memory management to improve locality when doubling the table, taking advantage of each bucket mapping to only two buckets. With a prime size that sort of mapping would be undesirable because either elements from two buckets at one size will be pigeonholed into one bucket at the next size, or else some buckets will remain unused (for example: you could treat the table as if its size were the biggest power of two that fits in the prime sized table). Such a hash function would not distribute values very well.

Share this post


Link to post
Share on other sites
Old world:

A good hash function was slow, therefore the cost of doing a modulus division by a prime was an acceptable practice that minimized the bunching inherent in poor (but fast) hash functions.

New world:

A good hash function is fast, and doesnt have bunching issues regardless of its bit width. It is inconcievable to use the slowest operation imaginable (modulus) to guard against a non-existant problem.

Your professor is living in the past. Just make sure that you use a hash function that distributes the inputs with a normal distribution.

One sign of a good hash function is that a single bit change in the input effects about half of the output bits on average.

Share this post


Link to post
Share on other sites
Quote:
Original post by Rockoon1
Old world:

A good hash function was slow, therefore the cost of doing a modulus division by a prime was an acceptable practice that minimized the bunching inherent in poor (but fast) hash functions.

New world:

A good hash function is fast, and doesnt have bunching issues regardless of its bit width. It is inconcievable to use the slowest operation imaginable (modulus) to guard against a non-existant problem.

Your professor is living in the past. Just make sure that you use a hash function that distributes the inputs with a normal distribution.

One sign of a good hash function is that a single bit change in the input effects about half of the output bits on average.
This advice matches what I was going to say, almost perfectly. btw the technical term to look up here is 'funnels'. Here's an excellent article about it: http://burtleburtle.net/bob/hash/examhash.html

I think another reason they still teach usage of prime hash table sizes is that for beginners (whose hash function is, let face it, going to be crap) actually get a slightly decent distribution. It's still a poor-man's fix though.

I highly recommend CRC for the hash function. VERY simple, VERY fast, and VERY good!

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
I highly recommend CRC for the hash function. VERY simple, VERY fast, and VERY good!


I am a big fan of the Zobrist technique, myself. Not as fast (potential for cache misses), but has many advantages. These include:

It is easy to satisfy all the properties of a good hash function.

It is easy to construct an arbitrary number of Zobrist hash functions.

Zobrist hash functions can be of an arbitrary bit width.

Incremental changes to the key require only incremental changes to the hash (rather than a complete recalculation)

Share this post


Link to post
Share on other sites
It seems like it should be trivial to construct a better hash than "Zobrist" by using the same idea of a table of values for each position-state pair, but constructing the values through some tailored algorithm instead of a PRNG.

Unfortunately, I don't have the background to be able to suggest what such an algorithm might look like except that it'd probably be related to error correction codes.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
It seems like it should be trivial to construct a better hash than "Zobrist" by using the same idea of a table of values for each position-state pair, but constructing the values through some tailored algorithm instead of a PRNG.

Unfortunately, I don't have the background to be able to suggest what such an algorithm might look like except that it'd probably be related to error correction codes.


Unfortunately, once you follow that path you lose some of the advantages of Zobrist.

For instance, it is no longer trivial to construct an arbitrary number of hash functions. Some hash tables require many hash functions (such as a Bloom Hash.)

Further, in some applications of hashing it is important to keep your hashing function unknown to the public or you can become the target of a worst-case behavior attack. Many game developers don't think about problems related to this.

For instance, a problematic user might create many thousands of usernames that all map to the same table entry. This sort of thing can grind your MMORPG server into the dust if you do not guard against it. With Zobrist, even if they know that you are using Zobrist they still don't know your hash function and therefor cannot perform this kind of attack. (I recall a popular chess server, FICS, which was subject to worst-case attacks because the source code had been public)

Share this post


Link to post
Share on other sites
Quote:
Original post by Rockoon1
[...]Unfortunately, once you follow that path you lose some of the advantages of Zobrist.[...]
True, but with a tiny bit of extra processing power you can alleviate those problems as well. For example, you could construct a "perfect table" but instead of indexing it using table[state] use table[state ^ key] as a very simple "encryption" method, and then you can have unlimited functions and unknown worst case behavior while still theoretically having superior typical case behavior.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!