• 15
• 15
• 11
• 9
• 10

Optimize my regular expression

This topic is 4346 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Consider the following regex: (final)?\s*(abstract)?\s*(public|private|protected)?\s*(static)?\s*function\s*([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*$$([^$$]*)\)(\s*;)? It is designed to find PHP function declarations. The problem is that it takes minutes to search 4KB of PHP source code. The problem is in the beginning. I'm looking for a sequence of possible modifiers (final, abstract, etc.) that may or may not appear before the function declaration. If I remove them from the regex entirely, it runs instantaneously. So, I've just removed them and manually searched for the modifiers after the simplified regex matched. It's a mess, though... Does anyone know how I could make this work reasonably quickly while still having the regular expression find the modifiers for me? Thanks!

Share on other sites
Do the modifiers even have to be in that order? Or could some function be "abstract private static final function"? If so, all the more reasons not to use regexp for looking up the modifiers.

Share on other sites
The modifiers do in fact need to be in that order, according to the PHP documentation.

Share on other sites
Oh. Well, I think this *might* help:

(final\s+)?(abstract\s+)?(public|private|protected\s+)?(static\s+)?function\s*

Now you'd need to strip the spaces from after the modifiers though. But the matcher wouldn't try match for substrings that begin with \s (and then always fail when the next word isn't one of "final", "abstract" etc.), but for things that start with words "final", "abstract",..., or "function". Which should be much rarer.

Share on other sites
Hmm, I tested that with Python and it made the regexp 4 times faster. But it's still a far cry from leaving out the quantifiers which makes it ~700 times faster than the original regexp..

Share on other sites
With some more testing I saw that the fix can actually make the regexp 100+ times faster if the program has enough whitespace in it. So it's worth trying for you.

Share on other sites
If you don't need to check what matched within the groupings (i.e. don't care whether or not the function actually is final or abstract etc.), you can use non-capturing groupings:

(?:final\s+)?(?:abstract\s+)?(?:(?:public|private|protected)\s+)?(?:static\s+)?function\s*

Share on other sites
Does the function need to match only functions exactly? Or is it okay if it gives incorrect results for input that is not valid PHP code? If that is the case then you can simplify the regex to something like:
((final|abstract|public|private|protected|static)\s+)*