[java] Java Strings

Started by
12 comments, last by Antheus 14 years, 4 months ago
Quote:Original post by Krohm
Quote:Original post by DevFred
Immutable objects can be shared by many clients without having to worry about side effects. C# does the exact same thing.
I admit I'm not yet sold on this rationale. I can see the benefits for sure, but shouldn't this be the goal of const or final?


No. The idea is that when dealing with multi-threaded applications, side effects are a very bad thing. You don't want a function to modify data that is being used by two threads at once, because this can lead to dangerous application states and access violations.

So by making strings immutable, they are inherently thread safe, and you can easily pass them around however you want without having to carefully manage which thread has possession of which string.

Const and Final only affect literal strings, and cannot be constructed dynamically in the application.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Advertisement
Having String mutable would actually be a huge performance/security problem. Consider having a Person class with a string property called 'name' and int property 'age'. Now your application wants to display a string to the user that has the name and the person and his age. For example "Bill, age 24".

So the application gets the property of the Person and appends age to it. All fine and dandy. But next time it does that, the string is "Bill, age 24, age 24". What is actually happening is that the name of the Person gets modified every time by accident. You could prevent accidents like that by returning a new String object every time the application requests the name property of the Person. But then the method to retrieve the property would become costly, because every time it is called, a new object has to be created.

As you meant, you can use StringBuffer or StringBuilder for string handling, so it is not memory or time consuming. It is also easy to create those from Strings. I don't see a problem here.

I found an article on this, seems to explain this a lot better than I do :) http://macchiato.com/columns/Durable2.html
Java seems rather flawed in this respect (among others, but they are for another topic).

Considering everything is essentially an implicit reference (except for primitive data types, which seems to be another arbitrary flaw), I'd expect that modifying a returned String would modify the original, as well. I'd also expect that final makes a variable unmodifiable, as happens with primitive types - however, final only makes the reference itself unmodifiable, so you can't reassign it to a new object. Therefore, Java has no const-correctness and is forced to simulate it by making arbitrary object types immutable through return-a-copy on seemingly mutable methods.
This approach to string handling is one of few things Java got just right. At least compared to mess that are numeric types.

Aside from threading, string manipulation is an annoying topic. Languages can do well to treat strings in this way.

An important thing to keep in mind here. Java strings are 'words'. They are not char *, collection of chars, an array of values - they are conceptual strings, a text, and should be treated as such.

As for performance - experience shows that in typical string processing, straight-forward approach using immutable strings and garbage collection is order of magnitude faster than C++ methods. Real world practice shows that string manipulations tend to discard data almost immediately after use in just about all cases.

Obviously, there is a whole class of in-place algorithms that can offer vastly superior performance, but they also introduce systematic flaws which hamper productivity and increase cost of development far beyond what is reasonable. Typical example is the require size of buffer. There are effectively no use cases left anymore where predetermined size will fit. Point in case: buffer overflows - still unsolved, still here.

In addition, string operations on immutable strings make all operations streaming in nature. Either use existing instance, or process left to right into new one. This makes copies cheap (but still more expensive than in-place).

C or C++ are blown clear out of water by managed languages when it comes to string handling. Adequately large application will, without any effort spent on optimization, outperform equivalent C or C++ application. This was shown long ago for, IIRC, Tex. Specialized mallocs and local allocation optimization as well as a lot of effort is needed to outperform it.

Another string handling concept that tends to creep up from time to time are ropes. In practice however, these fall into same category as tries or binary insertion sort. Elegant, flexible, but with constant factor cost that makes them slower by a constant factor in just about any real world scenario.

Immutable types are very much in line with functional programming concepts, which is why learning a functional language is always beneficial.

For trivial cases, these differences are irrelevant, with exception of a few very specialized domains.


Edit: HashMap has special optimizations which benefit from immutability. When used with strings (or other immutable objects), it pays to reuse *same* instance for key. This results in lightning fast lookups, which may be counter-intuitive compared to map in C++. Other structures can benefit from this as well, there are also many optimizations hidden in standard library, and it could have something to do with intern() as well (it's been a while).

For extensive lookups, I even went as far as to use a helper set to hold unique instances loaded at runtime, since it offered such huge benefit.

This topic is closed to new replies.

Advertisement