Jump to content
  • Advertisement
Sign in to follow this  
benryves

[web] [PHP] Regular Expressions are horrible: URL matching

This topic is 4682 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

When displaying messages on a forum, I run them through this: echo preg_replace('#(http|ftp|https)://([A-Za-z0-9\./_~]*)#i', "<a href=\"$1://$2\">$2</a>", $body); This works great for most URLs. HOWEVER, my forum software allows [img][/img] tags, as well as replacing smilies with the relevant HTML, which means that I get rubbish like this: <img src="<a href="http://path.to.images.jpg">path.to.images.jpg</a>" /> ...which is rubbish. How can I stop this from butchering URLs that are already inside HTML tags?

Share this post


Link to post
Share on other sites
Advertisement
Match a space before and after the URL so that "http://www.example.com" wont match. Although, if somebody actually does put a URL without spaces around it in their message it will not be changed to a link, but how often does that happen?

Share this post


Link to post
Share on other sites
Or just correct afterwards by replacing again:
$body = preg_replace ("#<img src=\"<a href=\"(.*?)\">(.*?)</a>\" />#", "<img src=\"$1\" />", $body);

Share this post


Link to post
Share on other sites
this is a ruby style regex, but you should be able to convert it to php fairly easily:

gsub( /(?!<.*)(http|ftp|https):\/\/([\w.\\\/_~]*)(?!.*>)/im, '<a href="\1://\2">\2</a>')




basically (?!...) is to not match, so it basically says dont match this if its between < >

i'm pretty sure that php preg_replace supports (?!...) commands.

Share this post


Link to post
Share on other sites
Quote:
Original post by Colin Jeanne
Quote:
Original post by kryat
i'm pretty sure that php preg_replace supports (?!...) commands.

PHP's form of (?!...) is (?:...)


not quite, (?:...) matches the expression, but doesnt add it in to matched collection (eg /1 /2 /3...), but (?!...) is a negative look-ahead match, (?=...) being the postive look ahead. Things get a little tricky with look-aheads...

but for example at the string "bob goes home"

/(bob) (\w*) (home)/ => "bob goes home" /1 = "bob" /2 = "goes" /3 = "home"
/(?:bob) (\w*) (?:home)/ => "bob goes home" /1 = "goes"
/bob (\w*) (?!home)/ => no match.

[edit] still trying to come up with a single regex that works properly. the one from before will fail if there is a greater than sign any where after a link (it thinks its a closing of a tag...

[Edited by - kryat on September 10, 2005 1:39:52 AM]

Share this post


Link to post
Share on other sites
I'm sure its possible to come up with a single expression that works, I just cant figure one out. However, here is some code that does work:

//$body = Post content
$exp = "/(http|ftp|https):\/\/([\w.\_\/~\?=%+]*[^. ])/i"; //URLs regex
$rep = "<a href=\"$1://$2\">$2</a>"; //URL Replacement
$htmlexp = "/<[^>]+>/"; //generic HTML regex

preg_match_all($htmlexp,$body, $html_arr); //Collect all HTML tags
$text_arr = preg_replace($exp, $rep, preg_split($htmlexp,$body)); //Process URL replacement
$body = $text_arr[0]; //Rebuild the post
foreach ($html_arr[0] as $key => $h) $body .= $h . $text_arr[$key+1]; //$body is ready.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I think what you mean to say is www.morphinenation.com

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!