Simple string indexing

Started by
2 comments, last by jpetrie 17 years, 6 months ago
I have a datafile system that works by keeping a table of filenames and offsets. The number of objects in this system keeps growing, so I'm looking to speed up filename searching. Up until now, I've simply been looping through a single list and comparing the strings until the target object filename is found. I just want a simple solution that will cut the search time down. The first thing that comes to me is to split the table up by using some arbitrary number that's calculated with the string itself. A lot of my datafile objects have a matching prefix, so the beginning is weak. Many also share matching extensions, so the end is pretty weak as well. I was considering doing something such as this..
int num_tables = 512;
......
int value = 0;
for(int v=0; v<string_length; v += 5)
   value += string_ascii[v];

int target_table = value % num_tables;
Would this work pretty well? Would there be any (possibly obvious) significant additions that could help it along?
Advertisement
I believe what you are looking for is a hash table. Has tables can take a string, compute its hash value and place it in a special data structure so that a search will be nearly instanteneous. In C++ there is std::map, in C# there is Hashtable and StringDictionary, in other languages there might be something related to a dictionary. There is no need to reinvent the wheel, just use the standard containers in your language.
deathkrushPS3/Xbox360 Graphics Programmer, Mass Media.Completed Projects: Stuntman Ignition (PS3), Saints Row 2 (PS3), Darksiders(PS3, 360)
I strongly recommend using std::map. It's faster and more feature-filled than anything any of us are likely to cook up. But assuming you're doing it out of interest rather than to really make a better map, here's some hashing algorithms. A simple google for "hash string" should give you some good results.
Quote:
I believe what you are looking for is a hash table. In C++ there is std::map

Pedantically, std::map isn't a hash table, it's a tree (usually). The external interface is essentially the same, though; std::map is still perfectly acceptable in the OP's situation, however, and I would strongly recommend its use in production code over something hand-rolled.

If you specifically need a hash table, many vendors provide hash_map as an extension (usually in the stdext namespace). std::map would still have my vote as the first choice implementation though.

This topic is closed to new replies.

Advertisement