Archived

This topic is now archived and is closed to further replies.

LaBasX2

Performance problem: But why???

Recommended Posts

LaBasX2    122
Hi! I have tried to code a faster square root function using 3dNow but somehow it is much slower than the standart sqrtf function from . First I thought that it was a problem with my inline assembler code, but it isn't. If I remove all the assembler code from the function and call that empty function in a large loop, that empty function is still a bit slower than the sqrtf function. I really don't understand it. How can an empty function be slower than a complete square root function? Could someone please help me, here is the code...
      
#include <stdio.h>
#include <windows.h>
#include <math.h>

inline float MySqrt(const float x)
{
      float f;

      /*_asm 
      {
	      FEMMS	     
	      MOVD        MM0, [x]
	      PFRSQRT     MM1, MM0
	      PFMUL	      MM0, MM1
	      MOVD	      [f], MM0
	      FEMMS			
     }*/

     return f;
}


void main()
{	
      DWORD	      StartTicks, EndTicks;
      __int64     i;
      float	      f = 16.0f, g;

	
      StartTicks = GetTickCount();

      for (i = 0; i < 10000000; i++)
      {
	      g = MySqrt(f);
	      //g = sqrtf(f);

      }
	
      EndTicks = GetTickCount();

      printf("%f %i\n", g, EndTicks - StartTicks);
}
    
I've noticed that f = sqrt(f) doesn't work. Are they using some trick here to make the function faster? Even if I'm making the MySqrt function a void function without parameters and no float var inside, it isn't faster. Thanks in advance LaBasX2 Edited by - LaBasX2 on November 11, 2001 1:21:36 PM Edited by - LaBasX2 on November 11, 2001 2:44:20 PM

Share this post


Link to post
Share on other sites
Staffan    122
I''d don''t know how many cycles a 3DNow sqrt function takes but it sounds like it is faster than the standard? Anyway, the femms is probably what''s causing the delay... I know at least the MMX emms function is SLOW. Probably about 50 cycles, or as slow as a standard sqrt call.

Share this post


Link to post
Share on other sites
LaBasX2    122
Thanks for your reply.

I have commented the complete asm part out and the function is still slower than sqrtf. So that can''t be the problem. By the way, as far as I know, the FEMMS function is just taking 2 cycles, that''s at least what they say on amdzone.com

Thanks
LaBasX2

Share this post


Link to post
Share on other sites
invective    118
if you are using visual C++ use the /fa switch to generate assembly output and see what it is doing to your code.

Select Project and click Settings.

Click the C/C++ tab.

Choose Listing Files for the category.

Select Assembly-Only Listing as the Listing File Type

Share this post


Link to post
Share on other sites
burp    122
The compiler is too smart!

The sqrtf(f) function is being removed by the compiler, because your not doing anything with it. Move the variables f and g to global name space to fix the problem.

  
inline float MySqrt(const float x)
{
/*_asm
{
MOVD MM0, x
PFRSQRT MM1, MM0
PUNPCKLDQ MM0, MM0
PFMUL MM0, MM1
MOVD x, MM0
}*/

return x;
}
float f = 16.0f, g;
void main()
{
DWORD StartTicks, EndTicks;
__int64 i;

StartTicks = GetTickCount();
for (i = 0; i < 10000000; i++)
{
//g = MySqrt(f);

g = sqrtf(f);
}
EndTicks = GetTickCount();
printf("%i\n", EndTicks - StartTicks);
}


Edited by - burp on November 11, 2001 1:57:36 PM

Share this post


Link to post
Share on other sites
LaBasX2    122
Thanks for your help, but it still doesn''t work. My empty function is still slower than the sqrt function. I will fix the code of my first posting to use the variable. I''ve also looked at the assembler listing but I can''t see too much in it.

Could someone please paste the code to a new project and compile it in release mode and tell me, if he gets the same execution time for the empty MySqrt function and the sqrtf function?

That''s really a bit strange and I am curious to know what could be the problem...

Thanks a lot

LaBasX2

Share this post


Link to post
Share on other sites
LaBasX2    122
Ok, I understand the problem now. The compiler is really clever and I am stupid. Zedzeek is right, I can''t do it like that because sqrtf(f) is a constant value and the compiler is clever enough to recognize that and calculate it only one time. So my complete loop was just one square root. If I modify the code and use g = sqrtf((float)i) it is working a bit slower than my empty function.

The 3dNow square root is still a bit slower than the sqrtf function, but that''s another problem now....

Thank you very much for your help guys

LaBasX2

Share this post


Link to post
Share on other sites
LaBasX2    122
I had a small bug in my code; the MySqrt function is even a bit faster than sqrtf (about 10%) but if you consider that it has less precision than sqrtf it is hardly worth implementing. And I can''t figure out another way to optimize MySqrt.

LaBasX2

Share this post


Link to post
Share on other sites
Gonosz    122
If you can''t optimize the sqrt(), try to call it as rarely as possible! For example, in a vector class you can store the length and the length squared when you calculate them. Because in some cases only the square of the length is required (in physics sometimes)

Gonosz

Share this post


Link to post
Share on other sites