Matching words with wildcards

Started by
2 comments, last by Zipster 14 years, 6 months ago
I'm not entirely sure if there is a fast, simple way to do this... I'm thinking regex is probably the simplest way of doing this. What I need to do is be able to compare strings that may or may not have wildcard characters in them. For instance I may have a list of >strings< like this: 11225 1123? 48622 55* 785?? 99745 And I will be trying to match other strings to this list. I will loop through the list looking for a match. There will only be one possible match in the list, and I will be comparing one at a time until I find a match. * is a wildcard that any number of any characters can match ? is a wildcard that matches any single character in its place anyone good with regular expressions? Thanks
Advertisement
You could use a regular expression library like PCRE, however if you're only ever going to support '?' and '*', they're trivial enough to implement. If you encounter a '?', then skip the corresponding character in the match string since you don't care what it is. If you reach a '*', then you terminate immediately since it's an automatic match ('*' is greedy). There might be a few edge cases I'm not thinking of but that's the gist of it.

I can tell you this, the regular expression library certainly won't be faster since it's also juggling about a million other things, but the benefit is that you have the full power of regular expressions that you can leverage at any time.

Quick disclaimer, I assumed you were using C++, but most other languages already have standard libraries with regular expression support if you don't implement those operators yourself.
Quote:Original post by Zipster
If you reach a '*', then you terminate immediately since it's an automatic match ('*' is greedy). There might be a few edge cases I'm not thinking of but that's the gist of it.
The problem is a patterns such as 'a*b*a', when paired with an input such as 'abbbbbbaba' - there are many correct parse trees, and many ways for a naively implemented operator to get stuck.

Greedy operators like '*' generally require some sort of recursive descent or stack-based parser to handle everything correctly.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

That's correct, I guess I assumed that '*' would be used at the end of the pattern as per the example. If you want the full functionality, then a regex library would be the best solution since it's already designed to handle all those special cases.

This topic is closed to new replies.

Advertisement