Partial key match on std::map?

Started by
10 comments, last by NightCreature83 12 years, 9 months ago
Hi all,

For an autocomplete feature I want to be able to type the begining of a string and by pressing tab get a list of possible completions in a game console window.

This is meant for a CLI, but for simplicity I'll use plain words as an example:

Suppose I have the following keys in a std::map:

apple
apricot
avocado
banana

If I do a partial search for "a", I should get "apple","apricot" and "avocado" if I search for "ap", I should get "apple" and "apricot", and so on.
At this stage I only care about keys, not values, eventually the value will be an object or a callback.

Now, std::map keeps elements in order by the value of its keys, and it contains the member functions lower_bound, upper_bound and equal_range that perform a binary search by key, since all keys must be unique, they all return pretty much the same thing, they take no predicate (comparison function pointer), so the only way of using these to find partial keys is by using a custom Compare function when declaring the map. A custom function that returns equal (false for both a < b and b < a) for partial matches would completely screw the inner working of the map.

So, I can use the general lower_bound, upper_bound and equal_range algorithms that take 2 iterators and optionally a predicate, the problem there is that then the predicate needs to take 2 pairs (IE: less(pair<>,pair<>)) as arguments, since I only need and care about the key at this point, it seems pointless to create a pair and fill in an empty value just to get the partial keys.

I have tried using operator<, and various predicates taking different arguments, but I don't even get them to compile, is there a way around this? how do you even define an operator<(const pair<>&,const string&) ?

Thanks for reading and thanks in advance :)
Advertisement
Well, you could just use lower_bound and upper_bound on artificial keys. Ex: "ap" and "ap\x255" if doing completion for "ap".
That would definitely be simpler than the other things I've tried thanks! I am thinking about how it would scale with unicode and wstring though.

By the way, is there an algorithm to find the first differing element between 2 strings/lists/vectors/containers? you know, for example given "apple" and "apricot" a function that would return 2, the index of the first different character.

I could just iterate each string, but if there is something already in STL I'll use it.

Thanks again!
It doesn't return an index, but std::mismatch() will return iterators where two ranges differ.
I am pretty sure Auto-Complete is done using regex. It's been a while since I have used C++, but a search for "using regex in C++" returns regex_search function and Boost.Xpressive.

You can probably hook them up to your std::map
Thanks SiCrane, that's probably even better.

I think given "abacus", "actual" and "advice" for example, lower_bound "a" (or any of the member binary searches for that matter) would return an iterator to "abacus" from there I probably just have to iterate forward until the partial key doesn't match the beginning of the actual key.

Thoughts?

alnite, thanks, I've though about how regex would help me here and toyed with the idea of using pcre which seems fairly popular (C++ doesn't really have regex out of the box yet, I think tr1 has them though), I think I will have a use for it when it comes to parse numeric tokens/values/parameters, but for mapping function calls to CLI instructions, I really see no advantage... yet. What I am constructing is some sort of trie/radix tree, taking advantage of the O (log n) performance std::map provides.

Feel free to provide input though :).

alnite, thanks, I've though about how regex would help me here and toyed with the idea of using pcre which seems fairly popular (C++ doesn't really have regex out of the box yet, I think tr1 has them though), I think I will have a use for it when it comes to parse numeric tokens/values/parameters, but for mapping function calls to CLI instructions, I really see no advantage... yet. What I am constructing is some sort of trie/radix tree, taking advantage of the O (log n) performance std::map provides.

Feel free to provide input though :).


Insert all your instructions into a long string delimited by a special character: "|apple|apricot|avocado|banana"
Then perform your regex search.

Insert all your instructions into a long string delimited by a special character: "|apple|apricot|avocado|banana"
Then perform your regex search.


Yes, that's what I thought, I wonder what the worst performance for a regex lookup is, for example L4D has 2736 commands [source], lets say each has 12 characters on average, not including arguments or sub commands (for example say "set texture" and "set model" would take a line each in the long string even though they share the same initial token/prefix), that's 32832 characters to look through... ~32k maybe is not so bad, but the other reason I am not yet sold is having to keep the command string and the map keys in sync.
The last time I did auto completion I just created a command tree. Every character is a node and while a node has only one child it is safe to auto complete. It also means if you have 20 words starting "over", you still only store the 4 letters once (though I'd expect all the additional pointers to more than compensate). You can also navigate that tree with every typed character and immediately see if the input doesn't exist or auto complete automatically, beep on invalid input or ignore the input or whatever else people turned out to find highly irritating.

Of course much later I found out that others had that simple idea long ago and called it a Trie.
f@dzhttp://festini.device-zero.de
Yeah, I got to this point after going through the trie idea and then PATRICIA, which this would be if I was splitting each word (ap, pple, ricot), then again if you take the words as the unit rather than characters, then it probably is.

I don't care that much about memory though (~32kb - 100kb is way less than a texture takes nowadays), and I doubt the extra work to split words and then putting them back together is that much worth it, I might try it later though.

This topic is closed to new replies.

Advertisement