[.net] regex removing HTML tags with Match
i would like to create a regex that matches everything in a string except characters within "<" ">"; tags.
i know this would be quite simple with regex.Replace("<[^>]*>")
but i need it to be with regex.Match...
is there anyway to do this?
something like Match(/*negate*/"<[^>]*>")
EDIT: i would also like to point out that tehre isnt always whitespace between the tags and the text
I can't be sure without knowing what you're trying to accomplish in the end - but I would recommend avoiding regex (which is really overkill here) and using text parsing functions instead. You should be able to write an enumerator function (using yield return, etc.) to do the job in less than 5 lines. You'll also have something much more readable if you can avoid regexes.
Quote:Original post by Dragon_Strike
i know this would be quite simple with regex.Replace("<[^>]*>")
but i need it to be with regex.Match...
is there anyway to do this?
No. You can't remove text with match. All you can do is determine if your input text matches the specified pattern, and optionally capture subgroups. You'll need additional processing to yield the equivalent of replace - match all the desired groups and then concatenate them.
If I understand correctly You want to remove HTML tags from text.
I usually do it with
Matching is for finding parts of text. Replacing is for manipulation of matching fragments.
I usually do it with
stringWithHtml = Regex.Replace(stringWithHtml, @"<(.|\n)*?>", string.Empty);
Matching is for finding parts of text. Replacing is for manipulation of matching fragments.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement