Sign in to follow this  

String performance in C++?

This topic is 3476 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone I was considering performance of std::string and CString class in MFC and so far I came up with an idea that string performance could be improved. ( I'm not 100% sure so pls correct me if I'm wrong :) ) - When string operation is performed, there exists many temporary objects and each allocates (and soon deallocate) a buffer from heap. (Am I right?) - If that's the case, wouldn't it possible to enhance string performance by implementing a specific heap for temporary storage? I've got the idea of temporary-specific heap from here ((Chapter 2 "And in theory ...")::("2) Garbage collection ...")::(3rd paragraph, "with GC, ...")). Though I don't totally agree with that article, I think it might be profitable for temporary storage. So, would it enhance performance of string operation? and if it does, will it be worth? Or is there already solution for temporary-specific heap? Thanks in advance

Share this post


Link to post
Share on other sites
Yes, this might enhance the performance of string manipulation a small bit, if it involved a large number of medium-sized strings being manipulated. However, detecting temporary strings is the actual difficulty1mdash;in most cases, building strings using a string-builder pattern (such as an ostringstream) takes as much effort and has a better result because it reduces the number of temporaries.

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop

- When string operation is performed, there exists many temporary objects and each allocates (and soon deallocate) a buffer from heap. (Am I right?)


This will depend on operations performed.

Quote:
- If that's the case, wouldn't it possible to enhance string performance by implementing a specific heap for temporary storage?


Not for general case. If you find a specific scenario however, using custom allocators, provided by all STL containers may help. Improvements noted are up to a factor of 50.

Quote:
will it be worth?


Depends on how you use these strings.

Share this post


Link to post
Share on other sites
Writing custom allocators for strings is something you only really need to do as a last resort optimisation.

Share this post


Link to post
Share on other sites
In my experience, std:string performs pretty well. The only optimizations I could found in my string class were to use memcpy instead of strcpy and implement CoW (although if you are careful when writing code, you don't really need it).

Share this post


Link to post
Share on other sites
Thanks for replies.

In-place allocators are incredibly fast, yeah. But it has very restricted usage since it can't free blocks individually. The allocator probably doesn't know if requested memory is for temporary or relatively permanent. The goal in GC heap is its "relocatability". i.e. memory blocks can be moved from place to place behind the scene. Because of that the heap can remain compacted and thus allowing fast allocation like in-place allocators. Moving heap frame behind the scene is very dangerous in C++ and will require a great care. I couldn't find way to implement relocatable allocator in STL (without ENORMOUS overhead).
Probably the GC heap will also have restricted usage. I don't think it's possible to allocate objects in GC heap without compiler's support.

Quote:
Original post by Antheus
Improvements noted are up to a factor of 50.

The fact that in-place allocator can be 50x faster encouraged me a lot. I'm gonna start CRAZY battle again, with GC heap (more precisely "relocatable heap")

Regards

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop
- When string operation is performed, there exists many temporary objects and each allocates (and soon deallocate) a buffer from heap. (Am I right?)

Give an example of your supposed usage. It's probably abusive.

std::string a, b, c, d, e; // assume these are all initialized
std::string r(a + b + c + d + e); // this will construct a bunch of temporaries and copy (and re-copy) 7 or 8 times


The code above is a typical case of abusing std::string. "But then," you ask, "how is one supposed to concatenate many strings?!"

C++ has a mostly overlooked tool for doing just that called std::stringstream.

std::string a, b, c, d, e; // assume these are all initialized
std::ostringstream ss;
ss << a << b << c << d << e; // best case this will copy all the strings into a suitably-sized buffer. worst case it'll need to resize a couple times, but should always invoke fewer copies than the code above
std::string r(ss.str());


Granted a specialized heap for these allocation operations would be beneficial, but is not needed if your initial buffer is large enough that you never have to resize and thus allocate again. If you simply copy the strings from the source buffers to the destination then you've already achieved maximum performance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ra
Give an example of your supposed usage. It's probably abusive.

I'm highly negative of using std::string. On my computer just concatenating 2 std::string's each 10 characters-long takes 0.9 microseconds! (don't think I'm using 10-year-old or such antique machine) If I instead use strcpy/strcat it takes only 50 nanoseconds. That's why I never use std::string. Now the reason why I'm doing this is to allow such abuse of strings. Actually the guys behind me are using std::string now also. They're making me crazy. Analyzing their code, find such abuses, optimizing ... It's time to end this crazy course.

[Edited by - 1hod0afop on July 11, 2008 2:10:47 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop
Quote:
Original post by Ra
Give an example of your supposed usage. It's probably abusive.

I'm highly negative of using std::string. On my computer just concatenating 2 std::string's each 10 characters-long takes 0.9 miliseconds! (don't think I'm using 10-year-old or such antique machine) If I instead use strcpy/strcat it takes only 50 nanoseconds. That's why I never use std::string. Now the reason why I'm doing this is to allow such abuse of strings. Actually the guys behind me are using std::string now also. They're making me crazy. Analyzing their code, find such abuses, optimizing ... It's time to end this crazy course.


Are you sure that you are not profiling in debug mode? That seems like a huge time difference.

Share this post


Link to post
Share on other sites
Quote:
Original post by EasilyConfused
Are you sure that you are not profiling in debug mode? That seems like a huge time difference.


OK here's a simple program

#include <windows.h>
#include <stdio.h>
#include <string>


void main()
{
unsigned int uStart = GetTickCount();

std::string str1 = "0123456789";
std::string str2 = "abcdefghij";

for( int i = 0; i < 1000000; i ++ )
std::string concat = str1 + str2;

unsigned int uElapsed = GetTickCount() - uStart;

printf( "%d ns\n", uElapsed );
}






In Visual Studio 2005, it outputs
Debug build : 3015 ns
Release build : 860 ns

However, by linking CRT library static it gets a little faster
Release build /MT : 657 ns

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop
Quote:
Original post by EasilyConfused
Are you sure that you are not profiling in debug mode? That seems like a huge time difference.


OK here's a simple program
*** Source Snippet Removed ***

In Visual Studio 2005, it outputs
Debug build : 3015 ns
Release build : 860 ns

However, by linking CRT library static it gets a little faster
Release build /MT : 657 ns


GetTickCount is in milliseconds, not nanoseconds btw.

Share this post


Link to post
Share on other sites
Quote:
Original post by thre3dee
GetTickCount is in milliseconds, not nanoseconds btw.


I know. That's why the program loops 1,000,000 times.

Share this post


Link to post
Share on other sites
Quote:
Original post by EasilyConfused
Quote:
Original post by 1hod0afop
Quote:
Original post by Ra
Give an example of your supposed usage. It's probably abusive.

I'm highly negative of using std::string. On my computer just concatenating 2 std::string's each 10 characters-long takes 0.9 miliseconds! (don't think I'm using 10-year-old or such antique machine) If I instead use strcpy/strcat it takes only 50 nanoseconds. That's why I never use std::string. Now the reason why I'm doing this is to allow such abuse of strings. Actually the guys behind me are using std::string now also. They're making me crazy. Analyzing their code, find such abuses, optimizing ... It's time to end this crazy course.


Are you sure that you are not profiling in debug mode? That seems like a huge time difference.


Sorry for my mistake, microseconds. That's still too slow for me. ^^

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop
OK here's a simple program
*** Source Snippet Removed ***
That's not valid C++. main must be declared as returning int, NOT void.

I personally see 860ns for 1 million string concatenations as being blazingly fast. That's over 1.1 billion per second!
If you ever find such std::string usage to be a bottleneck in a real-life program then I'd be happy to hear from you. (Assuming GDev.net is still around then)

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Quote:
Original post by 1hod0afop
OK here's a simple program
*** Source Snippet Removed ***
That's not valid C++. main must be declared as returning int, NOT void.

I personally see 860ns for 1 million string concatenations as being blazingly fast. That's over 1.1 billion per second!
If you ever find such std::string usage to be a bottleneck in a real-life program then I'd be happy to hear from you. (Assuming GDev.net is still around then)


No, the output 860ns is for 1 string concatenation, meaning 1.1M per second.

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop

Sorry for my mistake, microseconds. That's still too slow for me. ^^


What you're doing is:
- allocate new string
- two strings are concatenated into temporary copy, causing heap allocation
- temporary string is larger than internal buffer of result, another heap allocation
- temporary is copied into result
- temporary is deleted, along with its heap allocation
- result goes out of scope, its heap allocation is deleted

Of course it's slower than strcat. You're doing 10 times more work, not to mention doing 2 heap allocations more.

But before you are able to analyze the problem, there really is no point in blaming the tools.

BTW: how many concatenations do you perform per second, how many per application life-time?

Share this post


Link to post
Share on other sites
Quote:
Original post by 1hod0afop
OK here's a simple program

Ah, good old hand-tuned benchmarks to give the desired performance characteristics.

I'll bet you anything the C orange you were comparing this C++ apple to was faster because it was not at all the same. My guess is that this would be a better C++ comparison. Post your C code and I could tell for certain.

#include <iostream>
#include <string>

int main()
{
unsigned int uStart = GetTickCount();

std::string str1 = "0123456789";
std::string str2 = "abcdefghij";
std::string concat;

for( int i = 0; i < 1000000; i ++ )
concat = str1 + str2;

unsigned int uElapsed = GetTickCount() - uStart;

std::cerr << uElapsed << " ns\n";
}

I don't have access to Visual Studio, so I'd ask you to run that through your tests and report on it.

I would argue that ignorance of how to use a tool is not a valid reason to criticize a tool as inadequate for the job.

Share this post


Link to post
Share on other sites
Most of all I would check the compiler output, because I'm naive enough to expect any semi-decent compiler to detect the pointlesness of the entire loop and just optimize it away.

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Post your C code and I could tell for certain.


Here

#include <windows.h>
#include <stdio.h>
#include <string.h>


void concat( char* pDest, const char* pStr1, const char* pStr2 )
{
strcpy( pDest, pStr1 );
strcat( pDest, pStr2 );
}

void main()
{
unsigned int uStart = GetTickCount();

const char* pstr1 = "0123456789";
const char* pstr2 = "abcdefghij";

char buf[21];

for( int i = 0; i < 1000000; i ++ )
{
_asm
{
mov ebx, pstr2
mov eax, pstr1
lea edx, buf
push ebx
push eax
push edx

call concat

add esp, 0ch
}
}

unsigned int uElapsed = GetTickCount() - uStart;

printf( "%d ns\n", uElapsed );
}






My compiler often optimizes away strcpy/strcat function calls so I had to use tricks to fool the optimization. Here's the assembly code generated by VC2005.


... omitted ...

; const char* pstr1 = "0123456789";
0040106E mov dword ptr [ebp-28h],offset string "0123456789" (4020F4h)
; const char* pstr2 = "abcdefghij";
00401075 mov dword ptr [ebp-20h],offset string "abcdefghij" (402100h)
; for( int i = 0; i < 1000000; i ++ )
{
0040107C mov esi,0F4240h
; _asm
; {
; mov ebx, pstr2
00401081 mov ebx,dword ptr [ebp-20h] <------+
; mov eax, pstr1 |
00401084 mov eax,dword ptr [ebp-28h] |
; lea edx, buf |
00401087 lea edx,[ebp-1Ch] |
; push ebx |
0040108A push ebx |
; push eax |
0040108B push eax |
; push edx |
0040108C push edx |
; call concat |
0040108D call concat (401000h) |
; add esp, 0ch |
00401092 add esp,0Ch |
00401095 sub esi,1 |
00401098 jne main+31h (401081h) ------------+
; }
; }
... omitted...

;void concat( char* pDest, const char* pStr1, const char* pStr2 )
;{
; strcpy( pDest, pStr1 );
00401000 mov eax,dword ptr [esp+8] ; pStr1
00401004 push esi
00401005 push edi
00401006 mov edi,dword ptr [esp+0Ch] ; pStr2
0040100A mov edx,edi
0040100C sub edx,eax
0040100E mov edi,edi
00401010 mov cl,byte ptr [eax] ; pDest <--+
00401012 mov byte ptr [edx+eax],cl |
00401015 add eax,1 |
00401018 test cl,cl |
0040101A jne concat+10h (401010h) ---------+
; strcat( pDest, pStr2 );
0040101C mov eax,dword ptr [esp+14h]
00401020 mov edx,eax
00401022 mov cl,byte ptr [eax]
00401024 add eax,1
00401027 test cl,cl
00401029 jne concat+22h (401022h)
0040102B sub eax,edx
0040102D add edi,0FFFFFFFFh
00401030 mov cl,byte ptr [edi+1] <---------+
00401033 add edi,1 |
00401036 test cl,cl |
00401038 jne concat+30h (401030h) ---------+
0040103A mov ecx,eax
0040103C shr ecx,2
0040103F mov esi,edx
00401041 rep movs dword ptr es:[edi],dword ptr [esi]
00401043 mov ecx,eax
00401045 and ecx,3
00401048 rep movs byte ptr es:[edi],byte ptr [esi]
0040104A pop edi
0040104B pop esi
;}
0040104C ret







On my machine the program gives me the result "94 ns". But wait! If I make my own string class, I can optimize further. (if heap allocation cost is low enough) How? I can store length of string in the string class, so I can eliminate 2 loops (401010 - 40101A, 401030 - 401038), which allows me double the speed.

Quote:
Original post by Bregma
I would argue that ignorance of how to use a tool is not a valid reason to criticize a tool as inadequate for the job.


Quote:
Original post by Antheus
But before you are able to analyze the problem, there really is no point in blaming the tools.

Certainly. But I'm talking about tradeoffs between performace and simplicity. One can get similar performance with std::string by writing tricky code. What's the point of doing that? If you have to seriously consider using "+" overloaded operator, why don't you just sweep away it? And you won't be able to get same performance as strXXX functions since std::string always involves heap operation.


If you can obtain high performance in convenient way, (for example, let's say SOMETHING::string concat = str1 + str2; takes 100 ns) will you still adhere to std::string? (Don't try to bother me with porting problems. Let's imagine SOMETHING::string is fully compatible with std::string)

Regards

Share this post


Link to post
Share on other sites
1) I think you mean miliseconds (GetTickCount isn't nanoseconds)
2) you can always call std::string::reserve if you know you are doing a lot of
concatanations (equivelent to your wonderful buf[21] trick)
3) If string are you bottleneck, and you don't NEED strings, you may as well replace them with something like CRC hashes of the strings in question.
4) If you do NEED string, you may want to rethink your algorithm anyway. Lots of temporary copies and concatenations
can be a sign of you doing it wrong to start with. Changing how you make you pass over the dataset can mean that you
don't even need to be making copies of anything.
5) get a custom allocator for the string, especially if you know you are going to be allocating many small strings.

You have to remember that std::blah is a GENERAL PURPOSE tool. You can always speed it up by giving it knowledge about the
current situation you are using it in. And you can probably write a faster SINGLE PURPOSE tool, but you have to weigh in the cost
of time spent on your tool vs. just using a proven library.

Share this post


Link to post
Share on other sites
Thanks for your reply.

Quote:
Original post by KulSeran
1) I think you mean miliseconds (GetTickCount isn't nanoseconds)

My intention was : GetTickCount is miliseconds and since the program loops 1,000,000 times, the time measured by GetTickCount() becomes nanoseconds for one iteration. I'm too lazy to write
printf( "%f ns\n", uElapsed / 1000.0 / 1000000 /*# of iterations*/ * 1E+9 /**/ );


Quote:

2) you can always call std::string::reserve if you know you are doing a lot of
concatanations (equivelent to your wonderful buf[21] trick)

But in my opinion std::string::reserve() allocates the buffer from the heap and that's still not the same.

BTW, your word "wonderful trick" made me laugh a lot.

Quote:

3) If string are you bottleneck, and you don't NEED strings, you may as well replace them with something like CRC hashes of the strings in question.
4) If you do NEED string, you may want to rethink your algorithm anyway. Lots of temporary copies and concatenations
can be a sign of you doing it wrong to start with. Changing how you make you pass over the dataset can mean that you
don't even need to be making copies of anything.

I saw some sources written in Java and was shocked at that it's pretty fast despite of a lot of string operations. And yes, one can always optimize algorithms to minimize string operations. I'm kind of lazy in that sense. I always think about 'how to make good things without headache'.

Quote:

5) get a custom allocator for the string, especially if you know you are going to be allocating many small strings.

Could you recommend me some? I'd like to look at them so that I can optimize my heap allocator further.

Quote:
You have to remember that std::blah is a GENERAL PURPOSE tool. You can always speed it up by giving it knowledge about the
current situation you are using it in. And you can probably write a faster SINGLE PURPOSE tool, but you have to weigh in the cost
of time spent on your tool vs. just using a proven library.

Thanks. Fortunately this case seems to me profitable. I think the heap I'm making now can be used for other purposes also.


And rating you up for giving many ideas

Share this post


Link to post
Share on other sites
btw, temporary objects do not allocate memory from the heap. They allocate from the stack.
std::string might internally allocate a buffer but the temporary objects(when concatenating) don't.

quick link

Share this post


Link to post
Share on other sites
Quote:
Original post by delta user
btw, temporary objects do not allocate memory from the heap. They allocate from the stack.
std::string might internally allocate a buffer but the temporary objects(when concatenating) don't.

quick link


Thanks. I found a new thing. std::string has its internal buffer. Probably will be specific to implementation. In VC2005, std::string has 16-bytes buffer which is allocated on the stack (if the string itself is allocated on the stack, of course) so strings smaller than 16 bytes will not require heap. But it still uses heap if it's larger than 16bytes, even temporary objects.

BTW, I had been pretty sure that it always uses heap. So rating to you for correcting me.

Share this post


Link to post
Share on other sites
What's this obsession with performance? Are you having an actual performance problem? Or does std::string "just make you sick"?

Rolling your own version of string will likely treat performance (if at all) for other quality factors like correctness (is your own implementation exception safe?).

Share this post


Link to post
Share on other sites
Sign in to follow this