Big Array to String Speed

Started by
9 comments, last by asdatapel 11 years, 4 months ago
Sup guys. Ive got this really big array of about 25 thousand strings(its okay they're pretty short) and i need to get them all into one big string.
Right now Im using string += array[count] + '\n'. It works, but it takes pretty long time to do it. It pretty much about two minutes of the program looking like its crashed. So if you guys can help, thanks.
Advertisement
Did you try a StringBuilder? Make sure to try setting the capacity first.
just wondering, will this make a difference?
Well, you need to allocate a destination string big enough to hold the concatenated strings, otherwise you're going to have to allocate memory and copy data more than you have to. With the solution you provided, the program needs to copy the first string about 25000 times (and the second string 1 less time than that, the third 2 fewer times and so on) so it is not a particularly efficient solution.
What language is this written in? If it's C++ then you could use a std::stringstream to build the large string instead.

I had a similar thing once with CComBSTRs which went from 2 minutes to about 6 seconds just by building up the various substrings from certain sections separately from concatenating that onto the main string, which was a small change. Still not the most efficient, but it more than met the performance requirements, and we spent more time on it in another iteration.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Where do these strings come from? How are they loaded in your application?
If its C# or any other .net language you absolutely must use StringBuilder.
Consider - all strings in .net languages are immutable - they cannot be changed once created. So;

A = A + B then
A = A + C then
A = A + D
essentially means creating a new string called A and filling it with A + C. Repeat 10,000 times and you have the memory heap fragmentation from hell.

StringBuilder keeps a list of the pointers to the original strings, calculates the size, and copies them all in one operation, meaning the intermediate instances of A shown above are not created. Hugely faster and more memory efficient.
How about the following:


[source lang="csharp"] //this just generates a "big string array" to play with
List<string> bigList = new List<string>();
for (int i = 0; i < 25000; ++i) bigList.Add("someMediumLengthString");
string[] bigStringArray = bigList.ToArray();

//measure time
DateTime start = DateTime.Now;

//now we do the part that matters
string ourHugeCombinedString = string.Join<string>('\n'.ToString(), bigStringArray);

//print out the time it took to make the giant string
TimeSpan timeToMakeCombinedString = DateTime.Now.Subtract(start);
System.Diagnostics.Debug.WriteLine(timeToMakeCombinedString.TotalMilliseconds.ToString());

//prints out "2,0001" on my system, so 2 ms![/source]
Depends on the framework implementation of String.Join. I expect its StringBuilder internally.

... and using my trusty copy of .Net Reflector I see that it does indeed use StringBuilder internally.
Simplified syntax for string.Join (C# knows how to do generic type inference):


string combinedString = string.Join("\n", array);


It should be noted that the latest version of .Net includes both a string.Join which takes an array AND one which works on IEnumerable<T>.

This topic is closed to new replies.

Advertisement