Jump to content

  • Log In with Google      Sign In   
  • Create Account


Partial key match on std::map?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 Kwizatz   GDNet+   -  Reputation: 1186

Like
0Likes
Like

Posted 14 July 2011 - 02:59 PM

Hi all,

For an autocomplete feature I want to be able to type the begining of a string and by pressing tab get a list of possible completions in a game console window.

This is meant for a CLI, but for simplicity I'll use plain words as an example:

Suppose I have the following keys in a std::map:

apple
apricot
avocado
banana

If I do a partial search for "a", I should get "apple","apricot" and "avocado" if I search for "ap", I should get "apple" and "apricot", and so on.
At this stage I only care about keys, not values, eventually the value will be an object or a callback.

Now, std::map keeps elements in order by the value of its keys, and it contains the member functions lower_bound, upper_bound and equal_range that perform a binary search by key, since all keys must be unique, they all return pretty much the same thing, they take no predicate (comparison function pointer), so the only way of using these to find partial keys is by using a custom Compare function when declaring the map. A custom function that returns equal (false for both a < b and b < a) for partial matches would completely screw the inner working of the map.

So, I can use the general lower_bound, upper_bound and equal_range algorithms that take 2 iterators and optionally a predicate, the problem there is that then the predicate needs to take 2 pairs (IE: less(pair<>,pair<>)) as arguments, since I only need and care about the key at this point, it seems pointless to create a pair and fill in an empty value just to get the partial keys.

I have tried using operator<, and various predicates taking different arguments, but I don't even get them to compile, is there a way around this? how do you even define an operator<(const pair<>&,const string&) ?

Thanks for reading and thanks in advance :)

Sponsor:

#2 SiCrane   Moderators   -  Reputation: 9490

Like
1Likes
Like

Posted 14 July 2011 - 03:31 PM

Well, you could just use lower_bound and upper_bound on artificial keys. Ex: "ap" and "ap\x255" if doing completion for "ap".

#3 Kwizatz   GDNet+   -  Reputation: 1186

Like
0Likes
Like

Posted 14 July 2011 - 04:27 PM

That would definitely be simpler than the other things I've tried thanks! I am thinking about how it would scale with unicode and wstring though.

By the way, is there an algorithm to find the first differing element between 2 strings/lists/vectors/containers? you know, for example given "apple" and "apricot" a function that would return 2, the index of the first different character.

I could just iterate each string, but if there is something already in STL I'll use it.

Thanks again!

#4 SiCrane   Moderators   -  Reputation: 9490

Like
1Likes
Like

Posted 14 July 2011 - 04:46 PM

It doesn't return an index, but std::mismatch() will return iterators where two ranges differ.

#5 alnite   Crossbones+   -  Reputation: 2061

Like
1Likes
Like

Posted 14 July 2011 - 05:05 PM

I am pretty sure Auto-Complete is done using regex. It's been a while since I have used C++, but a search for "using regex in C++" returns regex_search function and Boost.Xpressive.

You can probably hook them up to your std::map

#6 Kwizatz   GDNet+   -  Reputation: 1186

Like
0Likes
Like

Posted 14 July 2011 - 05:38 PM

Thanks SiCrane, that's probably even better.

I think given "abacus", "actual" and "advice" for example, lower_bound "a" (or any of the member binary searches for that matter) would return an iterator to "abacus" from there I probably just have to iterate forward until the partial key doesn't match the beginning of the actual key.

Thoughts?

alnite, thanks, I've though about how regex would help me here and toyed with the idea of using pcre which seems fairly popular (C++ doesn't really have regex out of the box yet, I think tr1 has them though), I think I will have a use for it when it comes to parse numeric tokens/values/parameters, but for mapping function calls to CLI instructions, I really see no advantage... yet. What I am constructing is some sort of trie/radix tree, taking advantage of the O (log n) performance std::map provides.

Feel free to provide input though :).

#7 alnite   Crossbones+   -  Reputation: 2061

Like
0Likes
Like

Posted 14 July 2011 - 06:39 PM

alnite, thanks, I've though about how regex would help me here and toyed with the idea of using pcre which seems fairly popular (C++ doesn't really have regex out of the box yet, I think tr1 has them though), I think I will have a use for it when it comes to parse numeric tokens/values/parameters, but for mapping function calls to CLI instructions, I really see no advantage... yet. What I am constructing is some sort of trie/radix tree, taking advantage of the O (log n) performance std::map provides.

Feel free to provide input though :).


Insert all your instructions into a long string delimited by a special character: "|apple|apricot|avocado|banana"
Then perform your regex search.

#8 Kwizatz   GDNet+   -  Reputation: 1186

Like
0Likes
Like

Posted 14 July 2011 - 07:46 PM

Insert all your instructions into a long string delimited by a special character: "|apple|apricot|avocado|banana"
Then perform your regex search.


Yes, that's what I thought, I wonder what the worst performance for a regex lookup is, for example L4D has 2736 commands [source], lets say each has 12 characters on average, not including arguments or sub commands (for example say "set texture" and "set model" would take a line each in the long string even though they share the same initial token/prefix), that's 32832 characters to look through... ~32k maybe is not so bad, but the other reason I am not yet sold is having to keep the command string and the map keys in sync.

#9 Trienco   Crossbones+   -  Reputation: 2077

Like
1Likes
Like

Posted 14 July 2011 - 10:49 PM

The last time I did auto completion I just created a command tree. Every character is a node and while a node has only one child it is safe to auto complete. It also means if you have 20 words starting "over", you still only store the 4 letters once (though I'd expect all the additional pointers to more than compensate). You can also navigate that tree with every typed character and immediately see if the input doesn't exist or auto complete automatically, beep on invalid input or ignore the input or whatever else people turned out to find highly irritating.

Of course much later I found out that others had that simple idea long ago and called it a Trie.
f@dzhttp://festini.device-zero.de

#10 Kwizatz   GDNet+   -  Reputation: 1186

Like
0Likes
Like

Posted 14 July 2011 - 11:57 PM

Yeah, I got to this point after going through the trie idea and then PATRICIA, which this would be if I was splitting each word (ap, pple, ricot), then again if you take the words as the unit rather than characters, then it probably is.

I don't care that much about memory though (~32kb - 100kb is way less than a texture takes nowadays), and I doubt the extra work to split words and then putting them back together is that much worth it, I might try it later though.

#11 rasher_b2   Members   -  Reputation: 128

Like
0Likes
Like

Posted 18 July 2011 - 11:51 AM

What about taking the more naive approach in terms of algorithmic complexity? You could just loop though all the strings and find which ones start with the partial string. Then for autocomplete, you loop through all the partial matches and find how many more characters they have in common after the initial search string. This would mean a linear search of all keys, and then N loops though all the matched strings to find the extra common characters (this will probably only taking a few lines of code).

For something like a console with ~3000 commands averaging 12 characters performance should not be a problem, and the (relative) simplicity of implementation might be worth the (probably irrelevant) extra computation. I'd imagine you'd need a lot of keys before you'll run into any real performance issues. In terms of memory, you'd need to store your strings in some sort of collection and have some tempory storage when doing the matching/autocomplete.

#12 NightCreature83   Crossbones+   -  Reputation: 2688

Like
0Likes
Like

Posted 18 July 2011 - 03:47 PM

Yeah, I got to this point after going through the trie idea and then PATRICIA, which this would be if I was splitting each word (ap, pple, ricot), then again if you take the words as the unit rather than characters, then it probably is.

I don't care that much about memory though (~32kb - 100kb is way less than a texture takes nowadays), and I doubt the extra work to split words and then putting them back together is that much worth it, I might try it later though.

You could split on spaces with in commands and store hash tokens as the key elements of the Trie, this would only be beneficial however if you support many multi word commands.



Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, Mad Max




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS