"Simple" regular expression
I need a javascript regular expression that is able to search for a word within a string, and will match as long as it is a whole word, and is not contained within double quotes. Now, to find the _Word I am looking for is simple, I just use:
new RegExp("\\b" + _Word + "\\b", "i");
and that will create a regular expression which will match the given _Word in a string, making sure it is a whole word (not part of another word). My problem is trying to also make it so the regular expression doesn't match the word if it is contained in quotes, such as "_Word", or "some times _Word happens". I'm sure this is a simple problem for those who speak regexp often. Any help would be appreciated. Thanks in advance.
[Edited by - deadlydog on July 12, 2007 10:09:12 AM]
you should be able to use negative lookahead to make sure it matches properly.
This seems to work fine in Ruby:
dunno if javascript support the lookahead though...
This seems to work fine in Ruby:
regexp = /\b(?!\")myfancyword(?!\")\b/regexp.match 'I like myfancyword' => Matchregexp.match 'I like "myfancyword"' => No Match
dunno if javascript support the lookahead though...
Quote:Original post by rollo
you should be able to use negative lookahead to make sure it matches properly.
This seems to work fine in Ruby:regexp = /\b(?!\")myfancyword(?!\")\b/regexp.match 'I like myfancyword' => Matchregexp.match 'I like "myfancyword"' => No Match
dunno if javascript support the lookahead though...<!--QUOTE--></td></tr></table></BLOCKQUOTE><!--/QUOTE--><!--ENDQUOTE--><br><br>I believe I tried that regular expression, and it did work if the _Word had a double quote directly in front, or directly behind it, but not if it didn't. For example, that would match against "_Word ...", and "..._Word", but not against "..._Word...". I will try your method again tomorrow when I get into work just to be sure, but I'm pretty sure it has that problem. Are there any other suggestions? Thanks.
I doubt your problem is really as simple as that. In most situations where you care about "whether something is inside a double-quoted string", it's because you're parsing something source-code-like - which means you also have to handle escaped quotes within the string.
What I would do is first write a regexp that detects double-quoted strings:
That is, a quote, followed by (one or more things which are either a backslash followed by any character - as that would always be part of the string - or a non-quote-non-backslash character), followed by a quote. (Actually, detecting escape sequences properly might be a *little* more complicated.)
Replace all instances of this pattern with nothing (in a new string, if you need to leave the original intact). Then search the *remaining* text for the word.
If you have to replace the word in the original string - good luck :)
What I would do is first write a regexp that detects double-quoted strings:
"(\\.|[^\\"])*"
That is, a quote, followed by (one or more things which are either a backslash followed by any character - as that would always be part of the string - or a non-quote-non-backslash character), followed by a quote. (Actually, detecting escape sequences properly might be a *little* more complicated.)
Replace all instances of this pattern with nothing (in a new string, if you need to leave the original intact). Then search the *remaining* text for the word.
If you have to replace the word in the original string - good luck :)
Thanks for the replies. Zahlman, while your idea might work, I can't use it for what I am trying to do. I basically have a function which returns a regular expression which can find the given word in a string...I am not actually given the string to search through or anything like that. So my function to get the regular expressions looks like:
function GetRegularExpressionToFindWholeWord(_Word)
{
return new RegExp("\\b" + _Word + "\\b", "i");
}
Now, I know how to find the whole _Word in a string (by simply using the regular expression above), and I know how to find if the given _Word is between quotes, using
new RegExp("\".*" + _Word + ".*\"", "i");
I am just not sure how I can combine the two into one regular expression, since regular expressions don't seem to have an AND operator (even though they have an OR operator, which seems weird to have one and not the other). Any other suggestions on my problem would be greatly appreciated. Thanks.
function GetRegularExpressionToFindWholeWord(_Word)
{
return new RegExp("\\b" + _Word + "\\b", "i");
}
Now, I know how to find the whole _Word in a string (by simply using the regular expression above), and I know how to find if the given _Word is between quotes, using
new RegExp("\".*" + _Word + ".*\"", "i");
I am just not sure how I can combine the two into one regular expression, since regular expressions don't seem to have an AND operator (even though they have an OR operator, which seems weird to have one and not the other). Any other suggestions on my problem would be greatly appreciated. Thanks.
Your expression to find the word between quotes won't work if there are multiple quoted sections within the string being searched. For example:
"This is quoted" Match this _Word "But not this one! _Word"
Both _Words are between quotes, but you only want to match the one that is outside of matching quotes.
I think this can be done by matching 0 or more pairs of quotes, followed by 0 or more non-quote characters, followed by the word you are trying to match.
"This is quoted" Match this _Word "But not this one! _Word"
Both _Words are between quotes, but you only want to match the one that is outside of matching quotes.
I think this can be done by matching 0 or more pairs of quotes, followed by 0 or more non-quote characters, followed by the word you are trying to match.
Quote:Original post by Vorpy
Your expression to find the word between quotes won't work if there are multiple quoted sections within the string being searched. For example:
"This is quoted" Match this _Word "But not this one! _Word"
Both _Words are between quotes, but you only want to match the one that is outside of matching quotes.
I think this can be done by matching 0 or more pairs of quotes, followed by 0 or more non-quote characters, followed by the word you are trying to match.
Ahh, thank you, I did not notice that, but it is a feature I will want. I still am not sure how to get it to work with finding a whole word though. Any more suggestions anyone? Thanks
Nobody here knows regular expressions enough to solve this problem? I figured this would be a simple problem, but I guess it's harder than I thought. I'm still open to any suggestions anyone might have. Thanks.
This seems to work:
^([^"]|("(([^"\]|\\.)+)"))*\b(_Word)\b
It matches _Word surrounded by non-letter characters preceded by an even, possibly zero, number of unescaped quotation marks. That will ensure that _Word isn't in a string.
^([^"]|("(([^"\]|\\.)+)"))*\b(_Word)\b
It matches _Word surrounded by non-letter characters preceded by an even, possibly zero, number of unescaped quotation marks. That will ensure that _Word isn't in a string.
I think this regex works:
^([^"]|("[^"]*"))*\b(word)\b
It matches any number of non-quote characters or quoted strings and then the word you are looking for.
^([^"]|("[^"]*"))*\b(word)\b
It matches any number of non-quote characters or quoted strings and then the word you are looking for.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement