Matching words with wildcards
I'm not entirely sure if there is a fast, simple way to do this... I'm thinking regex is probably the simplest way of doing this.
What I need to do is be able to compare strings that may or may not have wildcard characters in them.
For instance I may have a list of >strings< like this:
11225
1123?
48622
55*
785??
99745
And I will be trying to match other strings to this list.
I will loop through the list looking for a match. There will only be one possible match in the list, and I will be comparing one at a time until I find a match.
* is a wildcard that any number of any characters can match
? is a wildcard that matches any single character in its place
anyone good with regular expressions?
Thanks
You could use a regular expression library like PCRE, however if you're only ever going to support '?' and '*', they're trivial enough to implement. If you encounter a '?', then skip the corresponding character in the match string since you don't care what it is. If you reach a '*', then you terminate immediately since it's an automatic match ('*' is greedy). There might be a few edge cases I'm not thinking of but that's the gist of it.
I can tell you this, the regular expression library certainly won't be faster since it's also juggling about a million other things, but the benefit is that you have the full power of regular expressions that you can leverage at any time.
Quick disclaimer, I assumed you were using C++, but most other languages already have standard libraries with regular expression support if you don't implement those operators yourself.
I can tell you this, the regular expression library certainly won't be faster since it's also juggling about a million other things, but the benefit is that you have the full power of regular expressions that you can leverage at any time.
Quick disclaimer, I assumed you were using C++, but most other languages already have standard libraries with regular expression support if you don't implement those operators yourself.
Quote:Original post by ZipsterThe problem is a patterns such as 'a*b*a', when paired with an input such as 'abbbbbbaba' - there are many correct parse trees, and many ways for a naively implemented operator to get stuck.
If you reach a '*', then you terminate immediately since it's an automatic match ('*' is greedy). There might be a few edge cases I'm not thinking of but that's the gist of it.
Greedy operators like '*' generally require some sort of recursive descent or stack-based parser to handle everything correctly.
That's correct, I guess I assumed that '*' would be used at the end of the pattern as per the example. If you want the full functionality, then a regex library would be the best solution since it's already designed to handle all those special cases.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement