Back to General and Gameplay Programming

Question about Regular Expressions (might be Perl-specific)

General and Gameplay Programming Programming

Started by CDProp March 10, 2009 02:25 AM

3 comments, last by Excors 15 years, 1 month ago

CDProp

1,451

Author

March 10, 2009 02:25 AM

Perl is my first encounter with regex and so I'm not sure if this is specific to Perl or not. Check out this bit of code:


while (<>){
      while ( /(.*?<!--)(.[^-]*)(.*$)/){
          print $2."\n";
          $_=$3;
      }
}

I can't seem to figure out what .*? does in the first group. It seems that the first two characters, .*, would mean "any character, 0 or more times". However, what does appending a ? to it do? Restrict it to 0 or 1 times? If so, why use the *? (I'm guessing I'm way off, which is why I need help).

Barius

325

March 10, 2009 02:41 AM

From the Perl docs ("perldoc perlre"):

Quote:
By default, a quantified subpattern is "greedy", that is, it will match
as many times as possible (given a particular starting location) while
still allowing the rest of the pattern to match. If you want it to
match the minimum number of times possible, follow the quantifier with
a "?". Note that the meanings don’t change, just the "greediness":

In your regex, this means it tries to match the smallest number of characters before "< !--" is encountered, i.e. finds the earliest occurence of "< !--" (I had to put a space to make it show up in my post).

Tunah

126

March 10, 2009 02:41 AM

That's a quantifier that:

Quote: Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.

Taken from this page, have a look.

Edit: Barius, part of what you posted made my post get mixed with yours until you changed it. That was pretty strange.

CDProp

1,451

Author

March 10, 2009 02:43 AM

Ok thanks I will look into that when I have a chance (btw, looks like the HTML parsing robbed a bit of your post, hehehe, it's cool though I got the gist).

Edit: nm, you fixed it. It looks like it commented out some HTML, therefore concatenating two separate replies. Haha, that is wild.

Excors

715

March 10, 2009 03:48 AM

That seems a slightly unusual way to loop over all the matches in a string. It would be more common to write

while (<>){      while (/<!--(.[^-]*)/g){          print $2."\n";      }}

(using the /g flag to make it match the next occurrence each time through the loop).

If this is meant to actually be extracting HTML comments, then you'd want something more like

/<!--(.*?)-->/

because it's perfectly valid for comments to contain individual dashes. (The .*? in that regexp is the same as before - it means it will match as few characters as possible before finding the -->, instead of as many as possible, which will matter if there are two or more comments on the line.)

Question about Regular Expressions (might be Perl-specific)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Question about Regular Expressions (might be Perl-specific)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines