Jump to content
  • Advertisement
Sign in to follow this  
Kranar

RegEx question

This topic is 4838 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

How does one use a regular expression to check that a string doesn't consist of XYZ in that order? The string can contain Xs, Ys, Zs, XYs, YZs, so on so forth, but not XYZ. I seem unable to find the right regex pattern to use. Thanks.

Share this post


Link to post
Share on other sites
Advertisement
in perl style:

if ($string!~/XYZ/){ }

I believe there's a more general style using the ^, but I don't often use regexes outside of perl.

Share this post


Link to post
Share on other sites
There are a few ways to do it with (?!), a commonly supported regex extension. I can think of ((?!XYZ).)* for matching a sequence of text that doesn't contain XYZ, or ^((?!XYZ).)*$ for checking an entire string.

(Actually, the first one isn't perfect; it'll refuse to match "ABCXY" of "ABCXYZ"; but I'm sure it'll get you started.)

Share this post


Link to post
Share on other sites
I suppose I should be more specific...

Is there a way to do something like [^xyz] but that also takes order into account? I want to extract everything in a string inbetween two quotation marks, but within the quotation marks, the string can have a quote if it's preceeded by the \\.

Assuming that doing [^(xyz)] meant everything but xyz in that order, the regex would be:

"([^(\\")]*)"

Share this post


Link to post
Share on other sites
I think that "([^"\\]|\\\\|\\")*" might be more useful for that case. I assume that backslashes can be escaped, too; watch out for strings like "foo \\".

Share this post


Link to post
Share on other sites
Quote:
Original post by Beer Hunter
I think that "([^"\\]|\\\\|\\")*" might be more useful for that case. I assume that backslashes can be escaped, too; watch out for strings like "foo \\".


Heh. I was going to make it more complicated by considering it as "runs of non-(quote/backslash) characters separated by (quote or backslash)", basically the same thing with a * after the [^"\\]. Silly, and slower unless the regex compiler is way better than I expect.

FWIW though, it might be nicer to collect the entire string as a group, and not capture the other items, thus: "((?:[^"\\]|\\\\|\\")*)". Then, instead of translating and pasting the individual things together, you could run a second pass of regexes on the captured result in order to translate the \\'s to \'s and \"'s to "'s. :)

(P.S. That's actually harder than it sounds to get right, because of e.g. the possibility of a large number of backslashes followed by a " - the correct translation depends on whether it's an odd or even backslash count... so actually, maybe you should just go with the other way :) )

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!