[.net] changing a character in a C# string

Started by
18 comments, last by Washu 12 years, 9 months ago
You see, when you say the post is too deep, and your very next sentence starts with immutability and says nothing about char arrays or StringBuilder (or for that matter, the entire post says nothing about them), the connotation is that you think that understanding strings are immutable is too deep. Secondly, immutability is not handled in the background. Immutability is a core principle of the interface of the string class. If anything it's the opposite way around, strings are actually mutable in the background, but you never need to worry about it because the CLR itself is the only thing allowed to muck with them. If you want to say that char arrays and StringBuilder aren't necessary for performance then say that. Also, if those were the only options I mentioned then maybe then you could say that I was "advocating" them, but I also mentioned another option and noted that those weren't the only options.
Advertisement

You see, when you say the post is too deep, and your very next sentence starts with immutability and says nothing about char arrays or StringBuilder (or for that matter, the entire post says nothing about them), the connotation is that you think that understanding strings are immutable is too deep. Secondly, immutability is not handled in the background. Immutability is a core principle of the interface of the string class. If anything it's the opposite way around, strings are actually mutable in the background, but you never need to worry about it because the CLR itself is the only thing allowed to muck with them. If you want to say that char arrays and StringBuilder aren't necessary for performance then say that. Also, if those were the only options I mentioned then maybe then you could say that I was "advocating" them, but I also mentioned another option and noted that those weren't the only options.

No. Strings are immutable. The CLR handles the string creation process, tracking memory allocation and references and collection when all references fall out of scope. "This is a string" is a string that will be entered into the CLRs string tables. Any "change" to that creates a new string. Let's say you want to "add" a period. "This is a new string." is now a fresh new string in the CLRs string tables along side the string "This is a string". As soon as all references to the first string are out of scope the CLR wipes out the string slot and waits for something else to store there.

In the following scenario there are 3 strings created and stored in the CLR:


string s = "This is a string.";
s = s.Remove(s.Length - 1);
s = s.Insert(s.Length, ' '); //insert space



However, even though three strings are created the CLR sees that after the first string is modified to remove the last character there are no more references to the string "This is a string." and wipes it out. The same is true for the .Insert line. The CLR creates the new string "This is a string " and wipes out the string "This is a new string" as there are no more references to it. The CLR never manipulates the strings themselves. In this way all strings are immutable and not even the CLR itself modifies strings. You are wrong. CLR handles immutability in that it tracks the references of string locations and as soon as there are no more references to a string it gets wiped out. Doing this you never have to worry yourself with strings themselves and can focus entirely on doing what you need to do.
Always strive to be better than yourself.
The CLR can modify strings if it likes. The prime example is the StringBuilder class implementation from .NET 2.0 to, I believe, 3.5. Internally, that implementation used a string object that got modified. (This got changed - .NET 4.0's StringBuilder uses a char array internally.) If you don't believe me, grab a copy of Reflector and point it at the .NET 2.0 StringBuilder implementation.
OP:
The only way I know to modify an existing string is the following, but please note that it's a horrible idea that will most likely give you tons of headache and all sorts of unexpected problems.


string str = "Hello world";

unsafe{
fixed(char *tmp = str)
{
*(tmp + str.Length - 1) = ' ';
}
}

Console.WriteLine(str);
Console.WriteLine("Hello world");


The second WriteLine is there to show you why it's a horrible idea.

The CLR can modify strings if it likes. The prime example is the StringBuilder class implementation from .NET 2.0 to, I believe, 3.5. Internally, that implementation used a string object that got modified. (This got changed - .NET 4.0's StringBuilder uses a char array internally.) If you don't believe me, grab a copy of Reflector and point it at the .NET 2.0 StringBuilder implementation.


Yes, specifically the .net 2.0 StringBuilder used the String class's internal function "AppendInPlace".

Frankly, landlocked, I'm not a fan of your attitude. You appear to be rather confrontational. If you have an argument to be made about a particular topic then you should elucidate your thoughts and reasons, generally backed up with references. Furthermore, this is the .Net forum, not the For Beginners forum. Therefore going in depth into the functionality and behavior of various components of the .Net such as the CLR or CLI is hardly "too deep".

[font="arial, verdana, tahoma, sans-serif"]

OP:
The only way I know to modify an existing string is the following, but please note that it's a horrible idea that will most likely give you tons of headache and all sorts of unexpected problems.


string str = "Hello world";

unsafe{
fixed(char *tmp = str)
{
*(tmp + str.Length - 1) = ' ';
}
}

Console.WriteLine(str);
Console.WriteLine("Hello world");


The second WriteLine is there to show you why it's a horrible idea.

That code may not always work, and in fact isn't guaranteed to remain working on any current windows machine either. It all depends on where the data for the string object is allocated and the permissions on those pages of memory. Attempting changes like that will eventually result in an access violation, and a forceful termination of your process.[/font]

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.


That code may not always work, and in fact isn't guaranteed to remain working on any current windows machine either. It all depends on where the data for the string object is allocated and the permissions on those pages of memory. Attempting changes like that will eventually result in an access violation, and a forceful termination of your process.


Are you sure?

MSDN says (link)

"You can initialize a pointer with the address of an array or a string:"
and
"fixed (char* p = str) ... // equivalent to p = &str[0]"

So I think it's pretty safe to get the pointer, but I would advice against modifying the data it points to. I seem to recall seeing Microsoft do things like that to compute the string hash while stepping through the standard library.

The CLR can modify strings if it likes. The prime example is the StringBuilder class implementation from .NET 2.0 to, I believe, 3.5. Internally, that implementation used a string object that got modified. (This got changed - .NET 4.0's StringBuilder uses a char array internally.) If you don't believe me, grab a copy of Reflector and point it at the .NET 2.0 StringBuilder implementation.

This makes me dubious but I'll check it out. It is my understanding StringBuilder took individual strings and just stored them in an array and then ToString simply did one big concatenation instead of doing it incrementally as you added elements (which is why that class exists in the first place). From day one though everything I've read about .NET has touted the immutability of strings so I can't just take your word at this.
Always strive to be better than yourself.

Frankly, landlocked, I'm not a fan of your attitude. You appear to be rather confrontational. If you have an argument to be made about a particular topic then you should elucidate your thoughts and reasons, generally backed up with references. Furthermore, this is the .Net forum, not the For Beginners forum. Therefore going in depth into the functionality and behavior of various components of the .Net such as the CLR or CLI is hardly "too deep".

I don't really care what you think of my attitude. About getting "too deep" you should be pushing simple succinct solutions that meet the need rather than trying to be "spiffy" or clever. I write my posts here knowing that someone, anyone, might come along and take what I say and run with it. So, when someone proposes solutions I know to be overly engineered for the problem that was described I will directly counter it and say so.
Always strive to be better than yourself.

[quote name='Washu' timestamp='1310500516' post='4834472']
That code may not always work, and in fact isn't guaranteed to remain working on any current windows machine either. It all depends on where the data for the string object is allocated and the permissions on those pages of memory. Attempting changes like that will eventually result in an access violation, and a forceful termination of your process.


Are you sure?

MSDN says (link)

"You can initialize a pointer with the address of an array or a string:"
and
"fixed (char* p = str) ... // equivalent to p = &str[0]"

So I think it's pretty safe to get the pointer, but I would advice against modifying the data it points to. I seem to recall seeing Microsoft do things like that to compute the string hash while stepping through the standard library.
[/quote]
Yes, getting a pointer is OK. The access violation can occur if the string is stored in read-only pages and you attempt to modify it. In many cases string constants (like the one you used) get baked into the executable in regions which get loaded into memory with different page permissions than standard dynamically allocated memory (such as what the CLR returns to you), because of this you do have to be aware that making changes to those can result in problems. On average they won't, but it is a possibility that should be kept in mind (and hence avoided).

There usually isn't a reason to bother modifying a string like that anyways, typically the only cases you would need to do that are cases where you're extremely performance constrained. In those cases dropping down to an unmanaged module and letting it do the bulk of the work is probably a better idea (because you can use machine specific instruction sets to boost performance).

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.


I don't really care what you think of my attitude. About getting "too deep" you should be pushing simple succinct solutions that meet the need rather than trying to be "spiffy" or clever. I write my posts here knowing that someone, anyone, might come along and take what I say and run with it. So, when someone proposes solutions I know to be overly engineered for the problem that was described I will directly counter it and say so.


You should care, because your continued participation on this website depends heavily on your attitude. Keep up the attitude and you might find yourself no longer welcome to continue posting on this site. A well reasoned reply with backing evidence is far more likely to be appreciated than a post that ends with:

STFU.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

This topic is closed to new replies.

Advertisement