Sign in to follow this  
Kest

Simple string indexing

Recommended Posts

I have a datafile system that works by keeping a table of filenames and offsets. The number of objects in this system keeps growing, so I'm looking to speed up filename searching. Up until now, I've simply been looping through a single list and comparing the strings until the target object filename is found. I just want a simple solution that will cut the search time down. The first thing that comes to me is to split the table up by using some arbitrary number that's calculated with the string itself. A lot of my datafile objects have a matching prefix, so the beginning is weak. Many also share matching extensions, so the end is pretty weak as well. I was considering doing something such as this..
int num_tables = 512;
......
int value = 0;
for(int v=0; v<string_length; v += 5)
   value += string_ascii[v];

int target_table = value % num_tables;
Would this work pretty well? Would there be any (possibly obvious) significant additions that could help it along?

Share this post


Link to post
Share on other sites
I believe what you are looking for is a hash table. Has tables can take a string, compute its hash value and place it in a special data structure so that a search will be nearly instanteneous. In C++ there is std::map, in C# there is Hashtable and StringDictionary, in other languages there might be something related to a dictionary. There is no need to reinvent the wheel, just use the standard containers in your language.

Share this post


Link to post
Share on other sites
I strongly recommend using std::map. It's faster and more feature-filled than anything any of us are likely to cook up. But assuming you're doing it out of interest rather than to really make a better map, here's some hashing algorithms. A simple google for "hash string" should give you some good results.

Share this post


Link to post
Share on other sites
Quote:

I believe what you are looking for is a hash table. In C++ there is std::map

Pedantically, std::map isn't a hash table, it's a tree (usually). The external interface is essentially the same, though; std::map is still perfectly acceptable in the OP's situation, however, and I would strongly recommend its use in production code over something hand-rolled.

If you specifically need a hash table, many vendors provide hash_map as an extension (usually in the stdext namespace). std::map would still have my vote as the first choice implementation though.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this