Fast string look up

Started by
8 comments, last by Hodgman 16 years, 2 months ago
Hi, I have a collection of around 3000-4000 items with string keys for my application. I am trying to come up with a method that given a string key, it will return me the item within no more than 5 millisec. What is the best way I can do it? I was thinking about STL map, but I was told that it doesnt do any hashing so it could be very slow. Any help will be much appreciated.
Advertisement
Try a hashtable or hashmap. It's not part of the current standard, but some STL vendors have included an implementation as an extension. I would guess boost might have something along those lines as well.

EDIT: 5 milliseconds is a *long* time for a simple lookup. Have you actually tried the STL map? You are assuming it's slow, but have you timed it?
Quote:Original post by guangwang1983
What is the best way I can do it?

A hash map.
Quote:Original post by guangwang1983 it will return me the item within no more than 5 millisec. What is the best way I can do it?


What kind of hardware are you running?

If it's anything close to modern PCs, then you should be able to do hundreds of lookups *per* millisecond using std::map or std::hash_map classes. YMMV depending on string size.

Of course, then there's different hash_map implementations, there was a very long thread around here regarding tries, and there's other lookups, including custom containers, or even direct mappings.

Quote:I was thinking about STL map, but I was told that it doesnt do any hashing so it could be very slow.


How about testing it yourself? Surely the time needed to post this question, wait for responses, and then evaluate different perspectives (several hours) will take more effort and time than simply writing a 5 line test to see the order of magnitude?
I havent tried timing it. I will try do it first before I do anything. What is the good hash algorithm you can recommend?
I'm with mossmoss... you could potentially just iterate through 3000-4000 strings and find the one you want in 5ms. That's a long time, and std::map should suffice.
Quote:Original post by guangwang1983
I havent tried timing it. I will try do it first before I do anything. What is the good hash algorithm you can recommend?


First google result for "string hash function" has several.
For strings, you could also try a trie...
Quote:Original post by Zahlman
For strings, you could also try a trie...


Sure, but let's try something first...

#include <map>#include <string>#include <iostream>#include <fstream>#include <boost/timer.hpp>typedef std::map< std::string, int > StringMap;int main(int argc, char* argv[]){	StringMap keys;	std::ifstream f("d:\\words.txt");	int c(0);	std::string line;	while (std::getline(f, line)) keys.insert(std::make_pair(line, c++));	boost::timer t;	c = 0;	int n = 1000000;	for (int i = 0; i < n; i++) {		if (keys.find("hello") != keys.end()) c++;	}	double t1 = t.elapsed();	std::cout << "Keys found: " << c << std::endl;	std::cout << "Elapsed time: " << (t1 * 1000) << " ms" << std::endl;	std::cout << "Lookups per ms: " << (n / t1 / 1000) << std::endl;			return 0;}


Using words.txt, which contains ~21000 entires.

The results:
Quote:
Keys found: 1000000
Elapsed time: 875 ms
Lookups per ms: 1142.86


Or, under 1 microsecond per lookup.... 6 minutes coding time, including google lookup.
Also, if you're doing lots of string comparisons in your program, it can be useful to create a 'fixed-string' class, or some other kind of 'global string table' so that you can replace all of your string comparisons with simple pointer comparisons.

e.g. Here's a quick idea:
class FixedString {public: //copy constructor, operator=, operator==, operator bool etc...private:  friend class StringTable  FixedString();  FixedString(const std::string& s) : ptr(&s) {}  const std::string* ptr;};class StringTable{public:  typedef std::set<std::string> Table;  static FixedString get( const std::string& s )  {    static Table table;    pair<Table::iterator, bool> r = table.insert( s );    return FixedString(*r.first);  }};

StringTable will only keep a single copy of any string registered with it. New FixedString's are immutable and can only be created by StringTable, and FixedString is just a wrapper for a raw pointer so it can compare itself to other FixedStrings without having to actually do a string compare.

This topic is closed to new replies.

Advertisement