• Advertisement
Sign in to follow this  

Regular expression problem

This topic is 3925 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've currently got a javascript regular expression which searches for a given Word in a string and finds a match as long as the Word is not a substring of another word (matches the whole word), and as long as the Word is not in double quotes. So for example a search for "dog" in: Hello, "My dog is" the best doggy that ever was a dog would only find a match against the very last word (dog). The regular expression I'm using to do this currently is: return new RegExp("(^(?:[^\"]|(?:\"(?:(?:[^\"])*)\"))*)\\b(" + _Word + ")\\b", "i"); Now my problem is that the regular expression doesn't handle the apostrophe the way I would like. For example, if I searched for "dog" in "My dog's rule" it would find a match, since the apostrophe is a non-word character. I don't want it to match against this though. I want it to only match against the exact string, so a search for "dog's" would match against "My dog's rule", but a search for just "dog" shouldn't. I'm sure the only part of the regular expression that needs to be changed is: \\b(" + _Word + ")\\b but I'm not certain. If anyone has any ideas of how to get it to do what I want I would greatly appreciate it. Thanks in advance.

Share this post


Link to post
Share on other sites
Advertisement
So what I need is instead of just breaking on word-boundaries, is to break on word-boundaries except for apostrophes. Does anyone have any ideas how I can write that? I just can't seem to get it right.

Also, if you are wondering what any of the tags are, just do a google search for "javascript regular expression mozilla dev"

Share this post


Link to post
Share on other sites
Something I read somewhere: "You have a problem. You decide to solve it using regular expressions. You now have two problems." I think that this applies to your problem... Regexps are really powerful, and I don't doubt that they can do what you want. But something simpler might be better, especially with all the powerful string-handling javascript has.

I'd do something more along the lines of:
1) Split the string along space boundaries
2) Split it again on double-quotes
3) Loop through the list, match each word against the word you're looking for, and make sure it's not in double-quotes when you do so.

Share this post


Link to post
Share on other sites
The only real solution I see for it is to replace \b with your own character class which doesn't include the apostrophe. So something like [ \t\n\r] (you should probably look up how javascript defines \b, and replicate it except for the apostrophe). There's probably an easier way to do it, and probably a hackish way to do it with some extra lookahead/behinds.

Another vote to use a non-regex solution. I personally tend to shy away from them in any situation that requires a lookahead/behind, but that's just me.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mushu
The only real solution I see for it is to replace \b with your own character class which doesn't include the apostrophe. So something like [ \t\n\r] (you should probably look up how javascript defines \b, and replicate it except for the apostrophe).


Yeah, I thought of doing this too, but I haven't been able to find how \b is defined anywhere on the net. I am really hoping someone can think of a way to solve this problem with regular expressions. Any suggestions would be appreciated. Thanks.

Share this post


Link to post
Share on other sites
I took a whack at this; it's a little beyond me!

*but*, I do recommend you take a look at RegEx buddy*. It's helped me a lot, and might make it a little easier to work through your problem.

Hope that helps somewhat!




*I'm in no way associated with this product!

Share this post


Link to post
Share on other sites
Quote:
Original post by _Sigma
I took a whack at this; it's a little beyond me!

*but*, I do recommend you take a look at RegEx buddy*. It's helped me a lot, and might make it a little easier to work through your problem.


Yeah, I actually already tried using RegEx Buddy a little bit, but I couldn't figure out how to get what I wanted from it. If anyone else has any ideas or suggestions I would be very appreciative. Thanks.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement