native char* slower on VS2005?

Started by
14 comments, last by DMINATOR 18 years, 4 months ago
OK I am not making any assumptions of any kind, this is just one test. I just made one test to see what benefits does VS2005 give. The test is just a simple check - running functions in the loop and calculating the time taken for execution. Here is the modified source:

#include <iostream>
#include <windows.h>
//#include "tests.h"


//5 million times !
#define BIG_TESTS 5000000



#include <string>

using namespace std;

//a new copy is created - so original string is not changed
void String1(string str)
{
	str = "changed";
}

//a reference is used
void String2(string& str)
{
	str = "changed";

}


void String3(string* str)
{
	*str = "changed";
}


//char* changed by pointer simple printf
void String4(char* str)
{

	sprintf(str,"changed");

}

//direct strcpy
void String5(char* str)
{
	strcpy(str, "changed");
}





LARGE_INTEGER before;
LARGE_INTEGER difference;
LARGE_INTEGER curtime;
LARGE_INTEGER freq;
unsigned long timepassed;
unsigned int fps;




//Calculate time passed
void TimePassed()
{
	

	
	QueryPerformanceFrequency( &freq);
	
	double TimeScale = (1.0/freq.QuadPart)*1000.0;
	
	
	//Add the function in here
	
	QueryPerformanceCounter( &curtime);//end measure

	timepassed = (curtime.QuadPart-before.QuadPart)*TimeScale;


	QueryPerformanceCounter( &before);//begin measure

}



int main()
{
	cout << " Starting test. Number of loops= "<< BIG_TESTS  << endl << endl;

    //get cur time
	QueryPerformanceCounter(&before);
	QueryPerformanceCounter(&curtime);

	string temp = "testing";

	char temp2[50];
	sprintf(temp2,"testing");


	// Regular function with copy
	cout << "1 - (string str)"<< endl;
	TimePassed();

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String1(temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



	//function with a referense to string
	cout << "2 - (string& str)"<< endl;
	TimePassed();

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String2(temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



    // function with a pointer to string
	cout << "3 - (string* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String3(&temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



	//function with a pointer to char
	cout << "4 - (char* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String4(temp2);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;


	//function with a pointer to char using strcpy
	cout << "5 - strcpy (char* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String5(temp2);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;

	int a;
	cin >> a;
	return 0;
}

Well running it on VS6 gave:
Quote: //Default optimisations: 1 - 3428 2 - 573 3 - 573 4 - 1997 //And using inline: 1 - 3429 2 - 568 3 - 524 4 - 1888
It seems the result are pretty good, and logical to me. I made some tests on the latest free BuilderX, and got following results:
Quote: //Standart optimisations 1 - 2486 2 - 452 3 - 459 4 - 1741 5 - 93
Here are the results from VS2005 Express Multi threaded DLL
Quote: //Standart optimisations , inline, or no optimisations doesn't make //much difference 1 - 1627 2 - 882 3 - 882 4 - 2859 5 - 93
It looks a little strange, the only speed gain was found when passing a copy to the function, everywhere else there is speed decrease abot 20-45% Now I just selected Multithreaded and got even more interesting results:
Quote: 1 - 790 2 - 374 3 - 369 4 - 2778 5 - 13 (!)
Now this is impressive improvement. So why is the difference that big between "multithreaded" and "multithreaded DLL" ? So what do you think ? [Edited by - DMINATOR on December 4, 2005 9:36:00 AM]
Advertisement
That test is hardly fair, sprintf is hardly an efficient (or common) function for copying strings. At least use a format "%s" if you're going to do it. Try strcpy(str, "changed") or memcpy(str, "changed", sizeof("changed")); instead.
Also, make sure you're in release mode with full optimizations and intrinsic functions enabled. It depends a bit on what you're actually measuring but it may be a good idea to split off the functions into a separate module to prevent to compiler from optimizing away the entire loop.

Also, you should really do something a bit more complicated string operations than just copying strings around. std::string keeps track of it's length directly which can be a huge advantage for some operations.
Besides a malformed test, the second and third most likely contributors are that VS2005 doesn't include the single-threaded runtime and that it adds extra checks by default to help prevent and/or detect bugs such as buffer overflows.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Ok thank you. I modifed the code a bit, and strcpy or memcpy does make a big difference.

But the most impressive effect I noticed when changing settings to just Multithreaded. Anyone has any ideas about it ? VS6 Didn't had any difference at all.
Quote:Original post by DMINATOR
Ok thank you. I modifed the code a bit, and strcpy or memcpy does make a big difference.

But the most impressive effect I noticed when changing settings to just Multithreaded. Anyone has any ideas about it ? VS6 Didn't had any difference at all.


When you use a DLL, the functions have an extra level of indirection due to the dynamic linking. Use your debugger to step through the code at the disassembly level and you'll see what I mean.
Well about VC2005 performance, I have been doing some tests too, mainly with math functions (vector, matrix op), and the VC2005 produce SLOWER code than VC2003, and not to mention SLOWER than Intel C++ 9.0, thats why I am still using VC 2003 [smile]

There are some posts about VC2005 beeing slower than VC2003 on msnd forums:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PageIndex=2&SiteID=1&PostID=128085&PageID=1

Oscar
Quote:Original post by ogracian
Well about VC2005 performance, I have been doing some tests too, mainly with math functions (vector, matrix op), and the VC2005 produce SLOWER code than VC2003, and not to mention SLOWER than Intel C++ 9.0, thats why I am still using VC 2003 [smile]

There are some posts about VC2005 beeing slower than VC2003 on msnd forums:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PageIndex=2&SiteID=1&PostID=128085&PageID=1

Oscar
As is pointed out in that thread, you should examine and isolate the areas where the code is slower, and talk to someone at MS about it. That way, they can make the necessary changes for SP1 to handle these corner cases more effectively.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
As for the OP's benchmarks, I'm seeing the following behaviors:
VS 8--------------------------------------------------* Multithreaded DLL Starting test. Number of loops= 50000001 - (string str) -- passed 837.778 ms2 - (string& str) -- passed 333.692 ms3 - (string* str) -- passed 326.002 ms4 - (char* str) -- passed 1576.6 ms5 - strcpy (char* str) -- passed 7.51576 ms*Multithreaded Starting test. Number of loops= 50000001 - (string str) -- passed 366.745 ms2 - (string& str) -- passed 182.604 ms3 - (string* str) -- passed 187.33 ms4 - (char* str) -- passed 1620.86 ms5 - strcpy (char* str) -- passed 5.05036 msVS 7--------------------------------------------------*Singlethreaded Starting test. Number of loops= 50000001 - (string str) -- passed 329.72 ms2 - (string& str) -- passed 149.223 ms3 - (string* str) -- passed 157.478 ms4 - (char* str) -- passed 1000.62 ms5 - strcpy (char* str) -- passed 5.03248 ms *Multithreaded Starting test. Number of loops= 50000001 - (string str) -- passed 349.87 ms2 - (string& str) -- passed 198.168 ms3 - (string* str) -- passed 159.792 ms4 - (char* str) -- passed 734.929 ms5 - strcpy (char* str) -- passed 5.14032 ms *Multithreaded DLL Starting test. Number of loops= 50000001 - (string str) -- passed 572.189 ms2 - (string& str) -- passed 283.535 ms3 - (string* str) -- passed 274.563 ms4 - (char* str) -- passed 739.255 ms5 - strcpy (char* str) -- passed 5.00734 ms

VS7 flags: /Ox /Og /Ob2 /Oi /Ot /G7 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /GF /FD /EHsc /arch:SSE2 /Fo"Release/" /Fd"Release/vc70.pdb" /W3 /nologo /c /Wp64 /Zi /TP /D_SECURE_SCL=0

VS8 flags: /Ox /Ob2 /Oi /Ot /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /GF /FD /EHsc /GS- /arch:SSE2 /fp:fast /GR- /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt /D_SECURE_SCL=0

I've been looking at the assembly for the two, and it seems to be pretty much identical in all cases. Take a look at Test 5:
	mov	ecx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@+4	mov	edx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@	mov	eax, 5000000				; 004c4b40H$LL3@main:; 159  : ; 160  : 	for(int i = 0; i < BIG_TESTS; i++)	sub	eax, 1; 161  : 	{; 162  : 		String5(temp2);	mov	DWORD PTR _temp2$[esp+256], edx	mov	DWORD PTR _temp2$[esp+260], ecx	jne	SHORT $LL3@main; 163  : 	}; 164  : 

The only difference in the VS7 version is a call to npad 8 just before the loop starts. (What the hell is npad, by the way?) Notice that it simply assigns the string over and over. This is highly suspicious to me, since the optimizer should have dropped that loop completely.

Differences in the first four tests are almost certainly due to a library implementation differences. I'm a little confused about 5 though.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Promit: One thing I notice in your command lines is that VS8 appears to be using unicode while VS7 is not. That small difference could cause significantly different string function implementation since unicode characters have different byte lengths (unless it's using UTF-32, which seems unlikely) which makes copying a string more complex than just searching for a 0 byte and copying the bytes up to that point.

The way the loop was only partially optimized is very strange, and you should probably send it off to MS so they can analyze it and maybe find the problem. Any explanation I can think of for a problem (such as the optimizer becoming confused about aliasing since pointers are everywhere despite the local reference graph being rather simple) should cause much less optimization than actually occurred.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Right well I actually disabled unicode, and you can see that the copied string is 8 bytes long (7 chars and the null). It's done using two DWORD moves. I posted on the MSDN forums and I know softies from the VS team roam there, so I'm hoping that somebody will have some insight tomorrow. Considering that the optimizer managed to inline the string assignment and replace the string copy with intrinsics, I'm amazed that it didn't maange to pull off the single most obvious optimization. Of course I don't know that much about optimization theory, so maybe it's more difficult than I realized.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

This topic is closed to new replies.

Advertisement