Archived

This topic is now archived and is closed to further replies.

DerekSaw

Are there any 'Like'-string-compare function?

Recommended Posts

Well, in VB, you can have IF (strName Like "John*") THEN ... or in SQL, ... WHERE name LIKE ''John%'' ... how about in C++? Do the libraries given by MSVC have these LIKE* string comparison?

Share this post


Link to post
Share on other sites
I'm thinking if I could do this (yes I know I could write one function for it, but I hope there's some simple come-with-the-compiler function):

if (strlike(name, "John*"))
{
// anyone with the name begin with 'John'
}


[EDIT] Wow! boost:regex sure fantastic. But I don't need such powerful pattern matching. Guess I need to write one simple function that recognizes '*' and '?'. Thanx.

[edited by - DerekSaw on July 15, 2002 11:54:54 PM]

Share this post


Link to post
Share on other sites
You mean you only want to check the first n characters for equality? strncmp , or strnicmp if you're the case insensitive type

EDIT: Oh, if you want the wildcard functionality, I don't know of any standard functions for that

[edited by - Zipster on July 15, 2002 12:27:44 AM]

Share this post


Link to post
Share on other sites
The POSIX standard has a ''regcomp'' function included, if you have access to POSIX libraries. Otherwise, you probably want to go with the Boost suggestion above (Boost includes a lot of nice things that you won''t find in the ANSI/ISO standards).

Share this post


Link to post
Share on other sites

          

//if your using C++ you can use the std::string function members to search the string for a patten match, being the first name..

Find will search the entire string and return the position in the string if match is found.

#include <string>

using namespace std;

main()
{

string str;
int id1 = -1;
id1 = str.find("JOHN",0);
// or

id1 = str.find_first_of("JOHN",0);

if (id1 != -1) {
// Found the string

}

// Else none found....


}


I use the string.find a lot for pattern matching, it's nice and fast.

str.find("PATTERN TO MATCH",int POSTION TO START SEACHING);

There also a lot of other things you can use,, also string.substr() to cut out the string into another, and string.erase() to remove certain parts..

[edited by - tonic151 on July 16, 2002 10:14:47 AM]

Share this post


Link to post
Share on other sites
Depends on exactly what pattern-matching you need. If you are just checking substrings, then you can use the substr() member function of std::string. You might also like to take a look at the other members of std::string. If you need fairly complex pattern-matching, then you probably need a regex library such as the Boost one already mentioned.

Share this post


Link to post
Share on other sites
Although a simple pattern matching function shouldn''t be too hard to create from scratch, assuming you only need simple wildcards such as ''*'' and ''?''. A combination of strstr() (old fashioned C baby!)... um, actually all you would need is a substring checker, and then check offets in the string, like if it''s at the beginning, end, or if you are using the ''?'' wildcard which defines an explicit number of characters that are "wild". Who knows, you might have fun! *gasp* No, not fun writing a pattern matching function!

Share this post


Link to post
Share on other sites
^[a-zA-Z0-9]{0,8}\.?[a-zA-Z0-9]{0,3}$

gotta love regular expressions.

/me waits for someone to figure out what that one is and correct him because he didn''t make it permissive enough

*wanders off*

Share this post


Link to post
Share on other sites
Let''s say I have a string ''displayable'' (i''m the case insensitive-type... for now ), it can be matched by "*play*", "*able", "display*", "di*able", "di*lay*able";
not just simple strncmp could do it (i hope it could) .

and no! I''m not using it to search/compare DOS filenames.

Have anyone written any similar function? I know it is not hard, and not easy, but tedious. Plus need some time for vigor testing before it is used in a real project.... just testing my luck whether I could have one for free.


Thank you all.

Share this post


Link to post
Share on other sites
I can give you some tips on writing one. Since I don't want to give everything away, I'll tell you how you can implement the '*' wildcard. '?' is up to you Since we're assuming case-insensitivity, let's assume we have a temporary copy of the string to search for that has been lowercased. Not only that, but strstr() doesn't have a case-insensitive counterpart (or at least one I know of).

Test String : displayable

Case 1:
-------
"display*"

String followed by wildcard . Do a strncmp() on the first n characters of the string (in this case, 7). If that signals equality, make sure the byte after the last character (in this case, 'y') isn't 0. If those two checks pass, we have a match.

Case 2:
-------
"*able"

Wildcard followed by the rest of the string . Do a strstr() with the substring ("able"). If that's non-zero, make sure the byte after the last character ('e') is 0, and that the pointer before the first character ('a') is greater than the start of the string. If these pass, we have a match.

Case 3:
-------
"*play*"

String sandwiched by two wildcards . This is a combination of the last two cases. Do a strstr(), and make sure that 1) the byte after 'y' isn't 0, and 2) the pointer before 'p' is greater than the beginning of the string. If pass, match.

Case 4:
-------
"di*able"

Wildcard sandwiched by two strings . This is a combination of two other cases. First make sure that "di*" works in a case 1 scenario, and that "*able" works in a case 2 scenario. Then make sure that they are seperated by some other character(s). This can be accomplished by substracting a pointer to 'i' in "di" from a pointer to 'a' in "able" and making sure the difference is > 1.

I hope this can get you started. The last wildcard example you suggested is simply a combination of these scenarios. '?' wildcadrs are more difficult because you have an explicit number of spaces, but they're not too tricky


[edited by - Zipster on July 16, 2002 12:18:27 AM]

Share this post


Link to post
Share on other sites
Writing a function that does what you want is surprisingly simple. Stop and think for a second and you''ll see that it is really just an iterative comparison with 3 possible cases. for example, your function might look like this in psuedo code:

string input_str; // input to function, such as "*able*"
string search_str; // string to try and find in, "displayable"

foreach( letter in input_str)
if (input_str >= 65 && input_str <= 123) // a definite letter
// compare to search_str
else if (input_str[i] == ''*'') // any number of letters next
// mark the desired next char and eat enough of search_str
//get there, or fail
else if (input_str[i] == ''?'') // wildcard
// eat one char from search_str



thats your basic flow, fairly straight forward, except for dealing with the ''*'', but even that can be dealt with using a few external variables, such as desired_next_char or something.

Share this post


Link to post
Share on other sites
quote:
Original post by DanG
Writing a function that does what you want is surprisingly simple. Stop and think for a second and you''ll see that it is really just an iterative comparison with 3 possible cases....

That''s simple enough! Actually I''ve thought of these (not all) but afraid that they work for 100 times but not at 101-th.

Thanks everyone.

Share this post


Link to post
Share on other sites
yeah wildcard matching is easy to implement. regex is a bit more difficult.

basically the flow goes:
src is the expression
dest is the one you test against

for the entire string
if src char is a * then get the next char in src and go until found also if there is no next src char then you are done with a match

if src char is a ? then get the next src and dest chars also if there is no more src chars but still dest chars dest dont match unless its a single dest char

if src char equal dest char continue else nomatch

increment the position of both strings
if no more src chars yet more dest char then no match
if no more dest chars yet more src chars then no match
end for

ok, the wording is bad, but i dont want to give code, and the psudeo code would make things a wee bit easy (you could use boost or some other library as suggested earlier if you want code). hopefully i did not make a mistake, but if i did you should be able to fix it after coding and debugging.

i hope my implementation algo is a bit more robust (since it handles it all in without the special cases so you can have as many wild cards mixxed and matched as you please).

its important to remember that a ? at the end means you must have exactly one char in its place on the dest string.

ie
strin? != strin
strin? == string
strin? == striny
strin? == strins

* is different. you dont need to match anything for it.
strin* == strin
strin* == string
strin* == strinysssaa
strin* == strinsdndadkhb

hopefully that will give you some quick optimizations (like when you reach the end of the src and its a * you know the dest matches since you did not find a nomatch condtion yet).

the idea is not to find a match, but look for no matches.

though i am sure you got something working already, heh.

Share this post


Link to post
Share on other sites