As for the OP's benchmarks, I'm seeing the following behaviors:
VS 8--------------------------------------------------* Multithreaded DLL Starting test. Number of loops= 50000001 - (string str) -- passed 837.778 ms2 - (string& str) -- passed 333.692 ms3 - (string* str) -- passed 326.002 ms4 - (char* str) -- passed 1576.6 ms5 - strcpy (char* str) -- passed 7.51576 ms*Multithreaded Starting test. Number of loops= 50000001 - (string str) -- passed 366.745 ms2 - (string& str) -- passed 182.604 ms3 - (string* str) -- passed 187.33 ms4 - (char* str) -- passed 1620.86 ms5 - strcpy (char* str) -- passed 5.05036 msVS 7--------------------------------------------------*Singlethreaded Starting test. Number of loops= 50000001 - (string str) -- passed 329.72 ms2 - (string& str) -- passed 149.223 ms3 - (string* str) -- passed 157.478 ms4 - (char* str) -- passed 1000.62 ms5 - strcpy (char* str) -- passed 5.03248 ms *Multithreaded Starting test. Number of loops= 50000001 - (string str) -- passed 349.87 ms2 - (string& str) -- passed 198.168 ms3 - (string* str) -- passed 159.792 ms4 - (char* str) -- passed 734.929 ms5 - strcpy (char* str) -- passed 5.14032 ms *Multithreaded DLL Starting test. Number of loops= 50000001 - (string str) -- passed 572.189 ms2 - (string& str) -- passed 283.535 ms3 - (string* str) -- passed 274.563 ms4 - (char* str) -- passed 739.255 ms5 - strcpy (char* str) -- passed 5.00734 ms
VS7 flags: /Ox /Og /Ob2 /Oi /Ot /G7 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /GF /FD /EHsc /arch:SSE2 /Fo"Release/" /Fd"Release/vc70.pdb" /W3 /nologo /c /Wp64 /Zi /TP /D_SECURE_SCL=0
VS8 flags: /Ox /Ob2 /Oi /Ot /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /GF /FD /EHsc /GS- /arch:SSE2 /fp:fast /GR- /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt /D_SECURE_SCL=0
I've been looking at the assembly for the two, and it seems to be pretty much identical in all cases. Take a look at Test 5:
mov ecx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@+4 mov edx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@ mov eax, 5000000 ; 004c4b40H$LL3@main:; 159 : ; 160 : for(int i = 0; i < BIG_TESTS; i++) sub eax, 1; 161 : {; 162 : String5(temp2); mov DWORD PTR _temp2$[esp+256], edx mov DWORD PTR _temp2$[esp+260], ecx jne SHORT $LL3@main; 163 : }; 164 :
The only difference in the VS7 version is a call to npad 8 just before the loop starts. (What the hell is npad, by the way?) Notice that it simply assigns the string over and over. This is highly suspicious to me, since the optimizer should have dropped that loop completely.
Differences in the first four tests are almost certainly due to a library implementation differences. I'm a little confused about 5 though.