Jump to content
  • Advertisement
Sign in to follow this  
jwezorek

perfect hashing of a small set of strings

This topic is 2900 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is there a minimal perfect hashing algorithm that works well with a small set of strings?

I want a data structure that I give its constructor a set of n strings where n is about a dozen or so, have it compile them into a perfect hash function, and then just use the strings sort of in place of an enum i.e. go from string to int and int to string in guaranteed low constant time.

All the minimal perfect hashing algorithms I find by googling seem to be geared towards large sets of keys. I guess, for small sets people just figure a regular hash table is good enough?

[Edited by - jwezorek on July 11, 2010 7:32:41 PM]

Share this post


Link to post
Share on other sites
Advertisement
I've used a trie to map strings to an enum before, and it works really well if your strings are short. What kind of strings are they? (a few examples would do).

Share this post


Link to post
Share on other sites
gperf sums up one or two distinct, differing characters and looks them up in a little table. This works suprisingly well. It's stunningly primitive, but works.

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
I've used a trie to map strings to an enum before, and it works really well if your strings are short. What kind of strings are they? (a few examples would do).


The strings would be string id's for various assets and properties in a game I'm writing. Basically, I'm just using numeric IDs right now with #define's for human-readability, but I thought it would be nice to be able to compile to the game's asset/property structure from XML and be able to access the various properties from string keys without taking a performance hit over just using integer id's.

It isn't really a big deal, and I'm probably prematurely optimizing. I was just curious if there's a boost library or something for doing this. I mean, it just seems like if you know all the strings and there aren't that many of them, an algorithm should be able to figure out which string it has been given without even necessarily looking at all of the characters in lots of cases.

But anyway a trie would probably work.

Share this post


Link to post
Share on other sites
Quote:
Original post by samoth
gperf sums up one or two distinct, differing characters and looks them up in a little table. This works suprisingly well. It's stunningly primitive, but works.


Yeah, that looks like what I'm talking about. I don't think gperf creates a "minimal" hash, but actually, now that I think about it, I guess that doesn't really matter to me. I'll have to look at the source code...

Share this post


Link to post
Share on other sites
My usual solution to this problem is to create a map that automatically assigns IDs to strings as they are loaded. Each time a new string is detected, an ID counter is incremented, and the string is "attached" to that ID. Then, internally, all the containers use the integral IDs instead of strings to do lookups. If you have a piece of code that wants to look up a data element by its string ID, you simply consult the ID map first, then proceed as normal.

This is pretty easy to do in any language, doesn't require worrying about hashing, and is quite performant.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
My usual solution to this problem is to create a map that automatically assigns IDs to strings as they are loaded. Each time a new string is detected, an ID counter is incremented, and the string is "attached" to that ID. Then, internally, all the containers use the integral IDs instead of strings to do lookups. If you have a piece of code that wants to look up a data element by its string ID, you simply consult the ID map first, then proceed as normal.

This is pretty easy to do in any language, doesn't require worrying about hashing, and is quite performant.


What do you mean it "doesn't require worrying about hashing"? How do you figure out the id that corresponds to a given string? That's the problem we are talking about...

Given that the OP says this is not currently a performance bottleneck (it was when I implemented treas for this), I would recommend using an existing hash table implementation, like unordered_map in C++. That strikes a pretty good balance between performance and code simplicity.

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
How do you figure out the id that corresponds to a given string?


You just look it up in the map... I'm not sure where the confusion is coming from here?

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Quote:
Original post by alvaro
How do you figure out the id that corresponds to a given string?


You just look it up in the map... I'm not sure where the confusion is coming from here?


How is the map implemented? In most languages it will be a hash, which means that some strings will require looking in several places in a table, to resolve collisions. std::map is even worse, because it will require log(N) string comparisons if there are N strings in the map.

A perfect hash or a trie are natural ways to try to make the lookups faster.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Quote:
Original post by alvaro
How do you figure out the id that corresponds to a given string?


You just look it up in the map... I'm not sure where the confusion is coming from here?


The idea was to do better than a standard map for string ID -> integer ID by leveraging the fact that the set of string keys is relatively small and are all known ahead of time. I'm currently using num IDs everywhere but was thinking it would be nice to use string IDs everywhere if there was no performance hit.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!