Jump to content
  • Advertisement
Sign in to follow this  
DMINATOR

native char* slower on VS2005?

This topic is 4704 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

OK I am not making any assumptions of any kind, this is just one test. I just made one test to see what benefits does VS2005 give. The test is just a simple check - running functions in the loop and calculating the time taken for execution. Here is the modified source:
#include <iostream>
#include <windows.h>
//#include "tests.h"


//5 million times !
#define BIG_TESTS 5000000



#include <string>

using namespace std;

//a new copy is created - so original string is not changed
void String1(string str)
{
	str = "changed";
}

//a reference is used
void String2(string& str)
{
	str = "changed";

}


void String3(string* str)
{
	*str = "changed";
}


//char* changed by pointer simple printf
void String4(char* str)
{

	sprintf(str,"changed");

}

//direct strcpy
void String5(char* str)
{
	strcpy(str, "changed");
}





LARGE_INTEGER before;
LARGE_INTEGER difference;
LARGE_INTEGER curtime;
LARGE_INTEGER freq;
unsigned long timepassed;
unsigned int fps;




//Calculate time passed
void TimePassed()
{
	

	
	QueryPerformanceFrequency( &freq);
	
	double TimeScale = (1.0/freq.QuadPart)*1000.0;
	
	
	//Add the function in here
	
	QueryPerformanceCounter( &curtime);//end measure

	timepassed = (curtime.QuadPart-before.QuadPart)*TimeScale;


	QueryPerformanceCounter( &before);//begin measure

}



int main()
{
	cout << " Starting test. Number of loops= "<< BIG_TESTS  << endl << endl;

    //get cur time
	QueryPerformanceCounter(&before);
	QueryPerformanceCounter(&curtime);

	string temp = "testing";

	char temp2[50];
	sprintf(temp2,"testing");


	// Regular function with copy
	cout << "1 - (string str)"<< endl;
	TimePassed();

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String1(temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



	//function with a referense to string
	cout << "2 - (string& str)"<< endl;
	TimePassed();

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String2(temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



    // function with a pointer to string
	cout << "3 - (string* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String3(&temp);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;



	//function with a pointer to char
	cout << "4 - (char* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String4(temp2);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;


	//function with a pointer to char using strcpy
	cout << "5 - strcpy (char* str)"<< endl;
	TimePassed(); 

	for(int i = 0; i < BIG_TESTS; i++)
	{
		String5(temp2);
	}

	TimePassed();
	cout << " -- passed "<< timepassed << " ms" << endl;

	int a;
	cin >> a;
	return 0;
}

Well running it on VS6 gave:
Quote:
//Default optimisations: 1 - 3428 2 - 573 3 - 573 4 - 1997 //And using inline: 1 - 3429 2 - 568 3 - 524 4 - 1888
It seems the result are pretty good, and logical to me. I made some tests on the latest free BuilderX, and got following results:
Quote:
//Standart optimisations 1 - 2486 2 - 452 3 - 459 4 - 1741 5 - 93
Here are the results from VS2005 Express Multi threaded DLL
Quote:
//Standart optimisations , inline, or no optimisations doesn't make //much difference 1 - 1627 2 - 882 3 - 882 4 - 2859 5 - 93
It looks a little strange, the only speed gain was found when passing a copy to the function, everywhere else there is speed decrease abot 20-45% Now I just selected Multithreaded and got even more interesting results:
Quote:
1 - 790 2 - 374 3 - 369 4 - 2778 5 - 13 (!)
Now this is impressive improvement. So why is the difference that big between "multithreaded" and "multithreaded DLL" ? So what do you think ? [Edited by - DMINATOR on December 4, 2005 9:36:00 AM]

Share this post


Link to post
Share on other sites
Advertisement
That test is hardly fair, sprintf is hardly an efficient (or common) function for copying strings. At least use a format "%s" if you're going to do it. Try strcpy(str, "changed") or memcpy(str, "changed", sizeof("changed")); instead.
Also, make sure you're in release mode with full optimizations and intrinsic functions enabled. It depends a bit on what you're actually measuring but it may be a good idea to split off the functions into a separate module to prevent to compiler from optimizing away the entire loop.

Also, you should really do something a bit more complicated string operations than just copying strings around. std::string keeps track of it's length directly which can be a huge advantage for some operations.

Share this post


Link to post
Share on other sites
Besides a malformed test, the second and third most likely contributors are that VS2005 doesn't include the single-threaded runtime and that it adds extra checks by default to help prevent and/or detect bugs such as buffer overflows.

Share this post


Link to post
Share on other sites
Ok thank you. I modifed the code a bit, and strcpy or memcpy does make a big difference.

But the most impressive effect I noticed when changing settings to just Multithreaded. Anyone has any ideas about it ? VS6 Didn't had any difference at all.

Share this post


Link to post
Share on other sites
Quote:
Original post by DMINATOR
Ok thank you. I modifed the code a bit, and strcpy or memcpy does make a big difference.

But the most impressive effect I noticed when changing settings to just Multithreaded. Anyone has any ideas about it ? VS6 Didn't had any difference at all.


When you use a DLL, the functions have an extra level of indirection due to the dynamic linking. Use your debugger to step through the code at the disassembly level and you'll see what I mean.

Share this post


Link to post
Share on other sites
Well about VC2005 performance, I have been doing some tests too, mainly with math functions (vector, matrix op), and the VC2005 produce SLOWER code than VC2003, and not to mention SLOWER than Intel C++ 9.0, thats why I am still using VC 2003 [smile]

There are some posts about VC2005 beeing slower than VC2003 on msnd forums:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PageIndex=2&SiteID=1&PostID=128085&PageID=1

Oscar

Share this post


Link to post
Share on other sites
Quote:
Original post by ogracian
Well about VC2005 performance, I have been doing some tests too, mainly with math functions (vector, matrix op), and the VC2005 produce SLOWER code than VC2003, and not to mention SLOWER than Intel C++ 9.0, thats why I am still using VC 2003 [smile]

There are some posts about VC2005 beeing slower than VC2003 on msnd forums:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PageIndex=2&SiteID=1&PostID=128085&PageID=1

Oscar
As is pointed out in that thread, you should examine and isolate the areas where the code is slower, and talk to someone at MS about it. That way, they can make the necessary changes for SP1 to handle these corner cases more effectively.

Share this post


Link to post
Share on other sites
As for the OP's benchmarks, I'm seeing the following behaviors:

VS 8--------------------------------------------------

* Multithreaded DLL
Starting test. Number of loops= 5000000

1 - (string str)
-- passed 837.778 ms
2 - (string& str)
-- passed 333.692 ms
3 - (string* str)
-- passed 326.002 ms
4 - (char* str)
-- passed 1576.6 ms
5 - strcpy (char* str)
-- passed 7.51576 ms

*Multithreaded
Starting test. Number of loops= 5000000

1 - (string str)
-- passed 366.745 ms
2 - (string& str)
-- passed 182.604 ms
3 - (string* str)
-- passed 187.33 ms
4 - (char* str)
-- passed 1620.86 ms
5 - strcpy (char* str)
-- passed 5.05036 ms

VS 7--------------------------------------------------

*Singlethreaded
Starting test. Number of loops= 5000000

1 - (string str)
-- passed 329.72 ms
2 - (string& str)
-- passed 149.223 ms
3 - (string* str)
-- passed 157.478 ms
4 - (char* str)
-- passed 1000.62 ms
5 - strcpy (char* str)
-- passed 5.03248 ms

*Multithreaded
Starting test. Number of loops= 5000000

1 - (string str)
-- passed 349.87 ms
2 - (string& str)
-- passed 198.168 ms
3 - (string* str)
-- passed 159.792 ms
4 - (char* str)
-- passed 734.929 ms
5 - strcpy (char* str)
-- passed 5.14032 ms

*Multithreaded DLL
Starting test. Number of loops= 5000000

1 - (string str)
-- passed 572.189 ms
2 - (string& str)
-- passed 283.535 ms
3 - (string* str)
-- passed 274.563 ms
4 - (char* str)
-- passed 739.255 ms
5 - strcpy (char* str)
-- passed 5.00734 ms



VS7 flags: /Ox /Og /Ob2 /Oi /Ot /G7 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /GF /FD /EHsc /arch:SSE2 /Fo"Release/" /Fd"Release/vc70.pdb" /W3 /nologo /c /Wp64 /Zi /TP /D_SECURE_SCL=0

VS8 flags: /Ox /Ob2 /Oi /Ot /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /GF /FD /EHsc /GS- /arch:SSE2 /fp:fast /GR- /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt /D_SECURE_SCL=0

I've been looking at the assembly for the two, and it seems to be pretty much identical in all cases. Take a look at Test 5:

mov ecx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@+4
mov edx, DWORD PTR ??_C@_07HADGPIEN@changed?$AA@
mov eax, 5000000 ; 004c4b40H
$LL3@main:

; 159 :
; 160 : for(int i = 0; i < BIG_TESTS; i++)

sub eax, 1

; 161 : {
; 162 : String5(temp2);

mov DWORD PTR _temp2$[esp+256], edx
mov DWORD PTR _temp2$[esp+260], ecx
jne SHORT $LL3@main

; 163 : }
; 164 :



The only difference in the VS7 version is a call to npad 8 just before the loop starts. (What the hell is npad, by the way?) Notice that it simply assigns the string over and over. This is highly suspicious to me, since the optimizer should have dropped that loop completely.

Differences in the first four tests are almost certainly due to a library implementation differences. I'm a little confused about 5 though.

Share this post


Link to post
Share on other sites
Promit: One thing I notice in your command lines is that VS8 appears to be using unicode while VS7 is not. That small difference could cause significantly different string function implementation since unicode characters have different byte lengths (unless it's using UTF-32, which seems unlikely) which makes copying a string more complex than just searching for a 0 byte and copying the bytes up to that point.

The way the loop was only partially optimized is very strange, and you should probably send it off to MS so they can analyze it and maybe find the problem. Any explanation I can think of for a problem (such as the optimizer becoming confused about aliasing since pointers are everywhere despite the local reference graph being rather simple) should cause much less optimization than actually occurred.

Share this post


Link to post
Share on other sites
Right well I actually disabled unicode, and you can see that the copied string is 8 bytes long (7 chars and the null). It's done using two DWORD moves. I posted on the MSDN forums and I know softies from the VS team roam there, so I'm hoping that somebody will have some insight tomorrow. Considering that the optimizer managed to inline the string assignment and replace the string copy with intrinsics, I'm amazed that it didn't maange to pull off the single most obvious optimization. Of course I don't know that much about optimization theory, so maybe it's more difficult than I realized.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!