# "Simple" regular expression

This topic is 3877 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I need a javascript regular expression that is able to search for a word within a string, and will match as long as it is a whole word, and is not contained within double quotes. Now, to find the _Word I am looking for is simple, I just use: new RegExp("\\b" + _Word + "\\b", "i"); and that will create a regular expression which will match the given _Word in a string, making sure it is a whole word (not part of another word). My problem is trying to also make it so the regular expression doesn't match the word if it is contained in quotes, such as "_Word", or "some times _Word happens". I'm sure this is a simple problem for those who speak regexp often. Any help would be appreciated. Thanks in advance. [Edited by - deadlydog on July 12, 2007 10:09:12 AM]

##### Share on other sites
you should be able to use negative lookahead to make sure it matches properly.

This seems to work fine in Ruby:
regexp = /\b(?!\")myfancyword(?!\")\b/regexp.match 'I like myfancyword'  => Matchregexp.match 'I like "myfancyword"'  => No Match

dunno if javascript support the lookahead though...

##### Share on other sites
Quote:
 Original post by rolloyou should be able to use negative lookahead to make sure it matches properly.This seems to work fine in Ruby:regexp = /\b(?!\")myfancyword(?!\")\b/regexp.match 'I like myfancyword' => Matchregexp.match 'I like "myfancyword"' => No Matchdunno if javascript support the lookahead though...

I believe I tried that regular expression, and it did work if the _Word had a double quote directly in front, or directly behind it, but not if it didn't. For example, that would match against "_Word ...", and "..._Word", but not against "..._Word...". I will try your method again tomorrow when I get into work just to be sure, but I'm pretty sure it has that problem. Are there any other suggestions? Thanks.

##### Share on other sites
I doubt your problem is really as simple as that. In most situations where you care about "whether something is inside a double-quoted string", it's because you're parsing something source-code-like - which means you also have to handle escaped quotes within the string.

What I would do is first write a regexp that detects double-quoted strings:

"(\\.|[^\\"])*"

That is, a quote, followed by (one or more things which are either a backslash followed by any character - as that would always be part of the string - or a non-quote-non-backslash character), followed by a quote. (Actually, detecting escape sequences properly might be a *little* more complicated.)

Replace all instances of this pattern with nothing (in a new string, if you need to leave the original intact). Then search the *remaining* text for the word.

If you have to replace the word in the original string - good luck :)

##### Share on other sites
Thanks for the replies. Zahlman, while your idea might work, I can't use it for what I am trying to do. I basically have a function which returns a regular expression which can find the given word in a string...I am not actually given the string to search through or anything like that. So my function to get the regular expressions looks like:

function GetRegularExpressionToFindWholeWord(_Word)
{
return new RegExp("\\b" + _Word + "\\b", "i");
}

Now, I know how to find the whole _Word in a string (by simply using the regular expression above), and I know how to find if the given _Word is between quotes, using

new RegExp("\".*" + _Word + ".*\"", "i");

I am just not sure how I can combine the two into one regular expression, since regular expressions don't seem to have an AND operator (even though they have an OR operator, which seems weird to have one and not the other). Any other suggestions on my problem would be greatly appreciated. Thanks.

##### Share on other sites
Your expression to find the word between quotes won't work if there are multiple quoted sections within the string being searched. For example:

"This is quoted" Match this _Word "But not this one! _Word"

Both _Words are between quotes, but you only want to match the one that is outside of matching quotes.

I think this can be done by matching 0 or more pairs of quotes, followed by 0 or more non-quote characters, followed by the word you are trying to match.

##### Share on other sites
Quote:
 Original post by VorpyYour expression to find the word between quotes won't work if there are multiple quoted sections within the string being searched. For example:"This is quoted" Match this _Word "But not this one! _Word"Both _Words are between quotes, but you only want to match the one that is outside of matching quotes.I think this can be done by matching 0 or more pairs of quotes, followed by 0 or more non-quote characters, followed by the word you are trying to match.

Ahh, thank you, I did not notice that, but it is a feature I will want. I still am not sure how to get it to work with finding a whole word though. Any more suggestions anyone? Thanks

##### Share on other sites
Nobody here knows regular expressions enough to solve this problem? I figured this would be a simple problem, but I guess it's harder than I thought. I'm still open to any suggestions anyone might have. Thanks.

##### Share on other sites
This seems to work:

^([^"]|("(([^"\]|\\.)+)"))*\b(_Word)\b

It matches _Word surrounded by non-letter characters preceded by an even, possibly zero, number of unescaped quotation marks. That will ensure that _Word isn't in a string.

##### Share on other sites
I think this regex works:

^([^"]|("[^"]*"))*\b(word)\b

It matches any number of non-quote characters or quoted strings and then the word you are looking for.

##### Share on other sites
Awesome. Thanks for the replies guys! I tried both solutions and they both seemed to work the same. This is what I ended up using:

return new RegExp("^([^\"]|(\"(([^\"])*)\"))*\\b(" + _Word + ")\\b");

My problem now is that one of the functions which uses this regular expression, uses it to find an replace the given word. The problem is that this regular expression matches not only against the _Word, but also everything before it. So for example if I try to replace 'dogs' with 'apples' in the sentence "I like dogs and cats", instead of getting "I like apples and cats", I get "apples and cats". Does anybody know a way I can get around this so the regular expression matches only against the _Word, and not everything before it?

##### Share on other sites
Quote:
 Original post by VorpyAdd another set of parentheses that contains the first pair as well as the * immediately after it. This will make $1 correspond to the part of the string that matched before the word. Each pair of parentheses corresponds to one of the$x values, with $0 being the whole string. So all you need is a pair of parentheses that captures everything before the word. Haha, I actually thought of this and tried it right before you posted about it, and it works. So for anyone who cares, this is what my final regular expression looks like: return new RegExp("(^(?:[^\"]|(?:\"(?:(?:[^\"])*)\"))*)\\b(" + _Word + ")\\b", "i"); So everything before the word is stored in RegExp.$1, and the word itself is stored in RegExp.\$2.

Thanks for all the help guys!!