[.net] Erasing Whitespace in C#

Started by
4 comments, last by VizOne 18 years, 7 months ago
Hi, I read on another thread like this, that the easiest way to remove whitespace in C#, would probably be

Regex.Replace(input, @"\s", "");
But I'm really confused with how Regex Works, Is there a way you could modify it so ..it only replaces the whitespace if there is more than one? Like if you had a string like this " \r\n A \r\n String \r\n\r\nLike\r\n\r\nThis" turn it into "A String Like This" I made something that does that in regular C (not very optimized) but have no ideas how to do it in C#, because the String functions arn't as nice as the STD ones in C..but anyway here is the code in C of how I did it..Let me know if you have any pointers on how to do it in C#..Thanks

string WHITESPACE = " \r\n\t";
int index = 0;
int wsf, wsb;

//trim the front and back first
wsf = Target.find_first_not_of(WHITESPACE);
if (wsf != string::npos)
{
   wsb = Target.find_last_not_of(WHITESPACE);
   if (wsb != string::npos)
   {
      Target = Target.sbustr(wsf, wsb - wsf +1);
   }
}

//Get space inbetween words

while (Target.find_first_of(WHITESPACE, index) != string::npos)
{
   index = Target.find_first_of(WHITESPACE, index);
   int b = Target.find_first_not_of(WHITESPACE, index);
   if((b -index) > 1 )
   {
      Target = Target.erase(index, (b-index -1));
      index++;
}
//Target now is a good string

Advertisement
Try this:

Regex.Replace(input, @"\s{2,}", "");

{2,} means "two times or more". Other Quantifiers are:

* zero or more times
+ one or more times
? zero or one time
{N} exactly N times
{N, M} from N to M times
{N, } N times or more

Those quantifiers try to match the longest possible string - they are said to be "greedy". To match the shortest possible match ("non-greedy"), add a ? to the quantifier, e.g.
{4,}?


Regards,
Andre
Andre Loker | Personal blog on .NET
I think you'd want:

Regex.Replace(input, @"\s+", " ");
Quote:Original post by Dean Harding
I think you'd want:

Regex.Replace(input, @"\s+", " ");

That would delete single whitespaces as well. The only difference to \s is here that less actual replacements take place. \s will replace every single whitespace, \s+ will replace all consecutive whitespaces at once. In this scenario - replacing with an empty string - the result is the same.


Regards,
Andre
Andre Loker | Personal blog on .NET
Quote:Original post by VizOne
In this scenario - replacing with an empty string - the result is the same.


Notice, though that I replaced with a single space. I'm pretty sure that's what he wants. With your one you'd replace "abc def" with "abcdef" (since it replaces all whitespace 2 or more long with an empty string), also it would leave "abc\tdef" alone.
Oh, you're right :) I did not realize that.

Regards,
Andre
Andre Loker | Personal blog on .NET

This topic is closed to new replies.

Advertisement