[.net] regex removing HTML tags with Match

Started by
3 comments, last by Koori 15 years, 4 months ago
i would like to create a regex that matches everything in a string except characters within "<" ">"; tags. i know this would be quite simple with regex.Replace("<[^>]*>") but i need it to be with regex.Match... is there anyway to do this? something like Match(/*negate*/"<[^>]*>") EDIT: i would also like to point out that tehre isnt always whitespace between the tags and the text
Advertisement
I can't be sure without knowing what you're trying to accomplish in the end - but I would recommend avoiding regex (which is really overkill here) and using text parsing functions instead. You should be able to write an enumerator function (using yield return, etc.) to do the job in less than 5 lines. You'll also have something much more readable if you can avoid regexes.
Quote:Original post by Dragon_Strike
i know this would be quite simple with regex.Replace("<[^>]*>")

but i need it to be with regex.Match...

is there anyway to do this?

No. You can't remove text with match. All you can do is determine if your input text matches the specified pattern, and optionally capture subgroups. You'll need additional processing to yield the equivalent of replace - match all the desired groups and then concatenate them.
Like Fingon said, can you elaborate a bit on what you want? I'm a bit unclear.
NetGore - Open source multiplayer RPG engine
If I understand correctly You want to remove HTML tags from text.
I usually do it with
stringWithHtml = Regex.Replace(stringWithHtml, @"<(.|\n)*?>", string.Empty);


Matching is for finding parts of text. Replacing is for manipulation of matching fragments.

This topic is closed to new replies.

Advertisement