Regular expression problem
I've currently got a javascript regular expression which searches for a given Word in a string and finds a match as long as the Word is not a substring of another word (matches the whole word), and as long as the Word is not in double quotes. So for example a search for "dog" in:
Hello, "My dog is" the best doggy that ever was a dog
would only find a match against the very last word (dog). The regular expression I'm using to do this currently is:
return new RegExp("(^(?:[^\"]|(?:\"(?:(?:[^\"])*)\"))*)\\b(" + _Word + ")\\b", "i");
Now my problem is that the regular expression doesn't handle the apostrophe the way I would like. For example, if I searched for "dog" in "My dog's rule" it would find a match, since the apostrophe is a non-word character. I don't want it to match against this though. I want it to only match against the exact string, so a search for "dog's" would match against "My dog's rule", but a search for just "dog" shouldn't. I'm sure the only part of the regular expression that needs to be changed is:
\\b(" + _Word + ")\\b
but I'm not certain. If anyone has any ideas of how to get it to do what I want I would greatly appreciate it. Thanks in advance.
So what I need is instead of just breaking on word-boundaries, is to break on word-boundaries except for apostrophes. Does anyone have any ideas how I can write that? I just can't seem to get it right.
Also, if you are wondering what any of the tags are, just do a google search for "javascript regular expression mozilla dev"
Also, if you are wondering what any of the tags are, just do a google search for "javascript regular expression mozilla dev"
Something I read somewhere: "You have a problem. You decide to solve it using regular expressions. You now have two problems." I think that this applies to your problem... Regexps are really powerful, and I don't doubt that they can do what you want. But something simpler might be better, especially with all the powerful string-handling javascript has.
I'd do something more along the lines of:
1) Split the string along space boundaries
2) Split it again on double-quotes
3) Loop through the list, match each word against the word you're looking for, and make sure it's not in double-quotes when you do so.
I'd do something more along the lines of:
1) Split the string along space boundaries
2) Split it again on double-quotes
3) Loop through the list, match each word against the word you're looking for, and make sure it's not in double-quotes when you do so.
The only real solution I see for it is to replace \b with your own character class which doesn't include the apostrophe. So something like [ \t\n\r] (you should probably look up how javascript defines \b, and replicate it except for the apostrophe). There's probably an easier way to do it, and probably a hackish way to do it with some extra lookahead/behinds.<br><br>Another vote to use a non-regex solution. I personally tend to shy away from them in any situation that requires a lookahead/behind, but that's just me.
Quote:Original post by Mushu
The only real solution I see for it is to replace \b with your own character class which doesn't include the apostrophe. So something like [ \t\n\r] (you should probably look up how javascript defines \b, and replicate it except for the apostrophe).
Yeah, I thought of doing this too, but I haven't been able to find how \b is defined anywhere on the net. I am really hoping someone can think of a way to solve this problem with regular expressions. Any suggestions would be appreciated. Thanks.
I took a whack at this; it's a little beyond me!
*but*, I do recommend you take a look at RegEx buddy*. It's helped me a lot, and might make it a little easier to work through your problem.
Hope that helps somewhat!
*I'm in no way associated with this product!
*but*, I do recommend you take a look at RegEx buddy*. It's helped me a lot, and might make it a little easier to work through your problem.
Hope that helps somewhat!
*I'm in no way associated with this product!
Quote:Original post by _Sigma
I took a whack at this; it's a little beyond me!
*but*, I do recommend you take a look at RegEx buddy*. It's helped me a lot, and might make it a little easier to work through your problem.
Yeah, I actually already tried using RegEx Buddy a little bit, but I couldn't figure out how to get what I wanted from it. If anyone else has any ideas or suggestions I would be very appreciative. Thanks.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement