$pattern = "/<img[ ]+src[^\"]*\"([^\"\r\n]*)\"[^\"]*>/i";
$replacement = "<img src=\"$1\">";
$text = preg_replace($pattern, $replacement, $text);
$pattern = "/<a[ ]+href[^\"]*\"([^\"\r\n]*)\"[^\"]*>([^\"\r\n]*)<\/a>/i";
$replacement = "<a href=\"$1\">$2</a>";
$text = preg_replace($pattern, $replacement, $text);
[web] Regular expression help!
I'm writing a simple web board, and I'd like people to be able to post pictures and links. Currently, it escapes all the HTML and then hunts the lt; and gt; tags and converts the image and a href tags back to proper HTML. I'm not very good at regular expressions, and they currently look like:...which break in a number of cases. Any ideas?
Because strip_tags() actually removes them, as opposed to encoding them.
Anyway, aside from the fact that you seem to be missing the = signs after scr and a in the original text, I don't see much wrong. Can you give some examples of what breaks?
Anyway, aside from the fact that you seem to be missing the = signs after scr and a in the original text, I don't see much wrong. Can you give some examples of what breaks?
Quote:Original post by SanderAdding "=" anywhere in the regex breaks it completely, and I have no idea why. Things that break them are (for example)
Anyway, aside from the fact that you seem to be missing the = signs after scr and a in the original text, I don't see much wrong. Can you give some examples of what breaks?
<a href="http://site"><img src="pic.jpg"><br />Caption!</a>
Also, alt tags, title tags, width/heights and so on break images (they are not converted back to HTML).
Weird.. anyway, I built this from the ground up. It should work (unless I made a typo):
//images$pattern = '#<img\w+src="([^"]+)">#';$replacement = '<img src="\\1" />';$text = preg_replace($pattern, $replacement, $text);//links$pattern = '#<a\w+href="([^"]+)">#';$replacement = '<a href="\\1" />';$text = preg_replace($pattern, $replacement, $text);$text = str_replace('</a>', '</a>', $text);
= is a special character that needs to be escaped in PHP regexps. See here. That function escapes all special characters and I quote: "The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | :"
Design critique: I don't think you should escape the HTML first, since it prevents you from manually escaping those tags you don't want processed. You should look for < and > first, not < and >, and only convert < and > to < and > for those tags you do not recognize.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement