Quote:Original post by Bregma
Post your C code and I could tell for certain.
Here
#include <windows.h>#include <stdio.h>#include <string.h>void concat( char* pDest, const char* pStr1, const char* pStr2 ){ strcpy( pDest, pStr1 ); strcat( pDest, pStr2 );}void main(){ unsigned int uStart = GetTickCount(); const char* pstr1 = "0123456789"; const char* pstr2 = "abcdefghij"; char buf[21]; for( int i = 0; i < 1000000; i ++ ) { _asm { mov ebx, pstr2 mov eax, pstr1 lea edx, buf push ebx push eax push edx call concat add esp, 0ch } } unsigned int uElapsed = GetTickCount() - uStart; printf( "%d ns\n", uElapsed );}
My compiler often optimizes away strcpy/strcat function calls so I had to use tricks to fool the optimization. Here's the assembly code generated by VC2005.
... omitted ...; const char* pstr1 = "0123456789";0040106E mov dword ptr [ebp-28h],offset string "0123456789" (4020F4h) ; const char* pstr2 = "abcdefghij";00401075 mov dword ptr [ebp-20h],offset string "abcdefghij" (402100h) ; for( int i = 0; i < 1000000; i ++ ) {0040107C mov esi,0F4240h ; _asm; {; mov ebx, pstr200401081 mov ebx,dword ptr [ebp-20h] <------+; mov eax, pstr1 |00401084 mov eax,dword ptr [ebp-28h] |; lea edx, buf |00401087 lea edx,[ebp-1Ch] |; push ebx |0040108A push ebx |; push eax |0040108B push eax |; push edx |0040108C push edx |; call concat |0040108D call concat (401000h) |; add esp, 0ch |00401092 add esp,0Ch |00401095 sub esi,1 |00401098 jne main+31h (401081h) ------------+; }; }... omitted...;void concat( char* pDest, const char* pStr1, const char* pStr2 );{; strcpy( pDest, pStr1 );00401000 mov eax,dword ptr [esp+8] ; pStr100401004 push esi 00401005 push edi 00401006 mov edi,dword ptr [esp+0Ch] ; pStr20040100A mov edx,edi 0040100C sub edx,eax 0040100E mov edi,edi 00401010 mov cl,byte ptr [eax] ; pDest <--+00401012 mov byte ptr [edx+eax],cl |00401015 add eax,1 |00401018 test cl,cl |0040101A jne concat+10h (401010h) ---------+; strcat( pDest, pStr2 );0040101C mov eax,dword ptr [esp+14h] 00401020 mov edx,eax 00401022 mov cl,byte ptr [eax] 00401024 add eax,1 00401027 test cl,cl 00401029 jne concat+22h (401022h) 0040102B sub eax,edx 0040102D add edi,0FFFFFFFFh 00401030 mov cl,byte ptr [edi+1] <---------+00401033 add edi,1 |00401036 test cl,cl |00401038 jne concat+30h (401030h) ---------+0040103A mov ecx,eax 0040103C shr ecx,2 0040103F mov esi,edx 00401041 rep movs dword ptr es:[edi],dword ptr [esi] 00401043 mov ecx,eax 00401045 and ecx,3 00401048 rep movs byte ptr es:[edi],byte ptr [esi] 0040104A pop edi 0040104B pop esi ;}0040104C ret
On my machine the program gives me the result "94 ns". But wait! If I make my own string class, I can optimize further. (if heap allocation cost is low enough) How? I can store length of string in the string class, so I can eliminate 2 loops (401010 - 40101A, 401030 - 401038), which allows me double the speed.
Quote:Original post by Bregma
I would argue that ignorance of how to use a tool is not a valid reason to criticize a tool as inadequate for the job.
Quote:Original post by Antheus
But before you are able to analyze the problem, there really is no point in blaming the tools.
Certainly. But I'm talking about tradeoffs between performace and simplicity. One can get similar performance with std::string by writing tricky code. What's the point of doing that? If you have to seriously consider using "+" overloaded operator, why don't you just sweep away it? And you won't be able to get same performance as strXXX functions since std::string always involves heap operation.
If you can obtain high performance in convenient way, (for example, let's say SOMETHING::string concat = str1 + str2; takes 100 ns) will you still adhere to std::string? (Don't try to bother me with porting problems. Let's imagine SOMETHING::string is fully compatible with std::string)
Regards