Performance problem: But why???

Started by
8 comments, last by LaBasX2 22 years, 5 months ago
Hi! I have tried to code a faster square root function using 3dNow but somehow it is much slower than the standart sqrtf function from . First I thought that it was a problem with my inline assembler code, but it isn't. If I remove all the assembler code from the function and call that empty function in a large loop, that empty function is still a bit slower than the sqrtf function. I really don't understand it. How can an empty function be slower than a complete square root function? Could someone please help me, here is the code...
      
#include <stdio.h>
#include <windows.h>
#include <math.h>

inline float MySqrt(const float x)
{
      float f;

      /*_asm 
      {
	      FEMMS	     
	      MOVD        MM0, [x]
	      PFRSQRT     MM1, MM0
	      PFMUL	      MM0, MM1
	      MOVD	      [f], MM0
	      FEMMS			
     }*/

     return f;
}


void main()
{	
      DWORD	      StartTicks, EndTicks;
      __int64     i;
      float	      f = 16.0f, g;

	
      StartTicks = GetTickCount();

      for (i = 0; i < 10000000; i++)
      {
	      g = MySqrt(f);
	      //g = sqrtf(f);

      }
	
      EndTicks = GetTickCount();

      printf("%f %i\n", g, EndTicks - StartTicks);
}
    
I've noticed that f = sqrt(f) doesn't work. Are they using some trick here to make the function faster? Even if I'm making the MySqrt function a void function without parameters and no float var inside, it isn't faster. Thanks in advance LaBasX2 Edited by - LaBasX2 on November 11, 2001 1:21:36 PM Edited by - LaBasX2 on November 11, 2001 2:44:20 PM
Advertisement
I''d don''t know how many cycles a 3DNow sqrt function takes but it sounds like it is faster than the standard? Anyway, the femms is probably what''s causing the delay... I know at least the MMX emms function is SLOW. Probably about 50 cycles, or as slow as a standard sqrt call.
Thanks for your reply.

I have commented the complete asm part out and the function is still slower than sqrtf. So that can''t be the problem. By the way, as far as I know, the FEMMS function is just taking 2 cycles, that''s at least what they say on amdzone.com

Thanks
LaBasX2
if you are using visual C++ use the /fa switch to generate assembly output and see what it is doing to your code.

Select Project and click Settings.

Click the C/C++ tab.

Choose Listing Files for the category.

Select Assembly-Only Listing as the Listing File Type
The compiler is too smart!

The sqrtf(f) function is being removed by the compiler, because your not doing anything with it. Move the variables f and g to global name space to fix the problem.

  inline float MySqrt(const float x){	/*_asm	{		              MOVD      MM0, x              PFRSQRT   MM1, MM0              PUNPCKLDQ MM0, MM0              PFMUL     MM0, MM1              MOVD        x, MM0        }*/	return x;}float  f = 16.0f, g;void main(){	DWORD StartTicks, EndTicks;		__int64  i;	StartTicks = GetTickCount();	for (i = 0; i < 10000000; i++)	{		//g = MySqrt(f);		g = sqrtf(f);	}	EndTicks = GetTickCount();	printf("%i\n", EndTicks - StartTicks);}    


Edited by - burp on November 11, 2001 1:57:36 PM
Thanks for your help, but it still doesn''t work. My empty function is still slower than the sqrt function. I will fix the code of my first posting to use the variable. I''ve also looked at the assembler listing but I can''t see too much in it.

Could someone please paste the code to a new project and compile it in release mode and tell me, if he gets the same execution time for the empty MySqrt function and the sqrtf function?

That''s really a bit strange and I am curious to know what could be the problem...

Thanks a lot

LaBasX2
never time like this
for (i = 0; i < 10000000; i++)
{
g = sqrtf(f);
}

use instead
for (i = 0; i < 10000000; i++)
{
g += sqrtf(f);
}
Ok, I understand the problem now. The compiler is really clever and I am stupid. Zedzeek is right, I can''t do it like that because sqrtf(f) is a constant value and the compiler is clever enough to recognize that and calculate it only one time. So my complete loop was just one square root. If I modify the code and use g = sqrtf((float)i) it is working a bit slower than my empty function.

The 3dNow square root is still a bit slower than the sqrtf function, but that''s another problem now....

Thank you very much for your help guys

LaBasX2
I had a small bug in my code; the MySqrt function is even a bit faster than sqrtf (about 10%) but if you consider that it has less precision than sqrtf it is hardly worth implementing. And I can''t figure out another way to optimize MySqrt.

LaBasX2
If you can''t optimize the sqrt(), try to call it as rarely as possible! For example, in a vector class you can store the length and the length squared when you calculate them. Because in some cases only the square of the length is required (in physics sometimes)

Gonosz

This topic is closed to new replies.

Advertisement