#### Archived

This topic is now archived and is closed to further replies.

# Performance problem: But why???

This topic is 5910 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi! I have tried to code a faster square root function using 3dNow but somehow it is much slower than the standart sqrtf function from . First I thought that it was a problem with my inline assembler code, but it isn't. If I remove all the assembler code from the function and call that empty function in a large loop, that empty function is still a bit slower than the sqrtf function. I really don't understand it. How can an empty function be slower than a complete square root function? Could someone please help me, here is the code...
  #include #include #include inline float MySqrt(const float x) { float f; /*_asm { FEMMS MOVD MM0, [x] PFRSQRT MM1, MM0 PFMUL MM0, MM1 MOVD [f], MM0 FEMMS }*/ return f; } void main() { DWORD StartTicks, EndTicks; __int64 i; float f = 16.0f, g; StartTicks = GetTickCount(); for (i = 0; i < 10000000; i++) { g = MySqrt(f); //g = sqrtf(f); } EndTicks = GetTickCount(); printf("%f %i\n", g, EndTicks - StartTicks); } 
I've noticed that f = sqrt(f) doesn't work. Are they using some trick here to make the function faster? Even if I'm making the MySqrt function a void function without parameters and no float var inside, it isn't faster. Thanks in advance LaBasX2 Edited by - LaBasX2 on November 11, 2001 1:21:36 PM Edited by - LaBasX2 on November 11, 2001 2:44:20 PM

##### Share on other sites
I''d don''t know how many cycles a 3DNow sqrt function takes but it sounds like it is faster than the standard? Anyway, the femms is probably what''s causing the delay... I know at least the MMX emms function is SLOW. Probably about 50 cycles, or as slow as a standard sqrt call.

##### Share on other sites

I have commented the complete asm part out and the function is still slower than sqrtf. So that can''t be the problem. By the way, as far as I know, the FEMMS function is just taking 2 cycles, that''s at least what they say on amdzone.com

Thanks
LaBasX2

##### Share on other sites
if you are using visual C++ use the /fa switch to generate assembly output and see what it is doing to your code.

Select Project and click Settings.

Click the C/C++ tab.

Choose Listing Files for the category.

Select Assembly-Only Listing as the Listing File Type

##### Share on other sites
The compiler is too smart!

The sqrtf(f) function is being removed by the compiler, because your not doing anything with it. Move the variables f and g to global name space to fix the problem.

  inline float MySqrt(const float x){ /*_asm { MOVD MM0, x PFRSQRT MM1, MM0 PUNPCKLDQ MM0, MM0 PFMUL MM0, MM1 MOVD x, MM0 }*/ return x;}float f = 16.0f, g;void main(){ DWORD StartTicks, EndTicks; __int64 i; StartTicks = GetTickCount(); for (i = 0; i < 10000000; i++) { //g = MySqrt(f); g = sqrtf(f); } EndTicks = GetTickCount(); printf("%i\n", EndTicks - StartTicks);}

Edited by - burp on November 11, 2001 1:57:36 PM

##### Share on other sites
Thanks for your help, but it still doesn''t work. My empty function is still slower than the sqrt function. I will fix the code of my first posting to use the variable. I''ve also looked at the assembler listing but I can''t see too much in it.

Could someone please paste the code to a new project and compile it in release mode and tell me, if he gets the same execution time for the empty MySqrt function and the sqrtf function?

That''s really a bit strange and I am curious to know what could be the problem...

Thanks a lot

LaBasX2

##### Share on other sites
never time like this
for (i = 0; i < 10000000; i++)
{
g = sqrtf(f);
}

for (i = 0; i < 10000000; i++)
{
g += sqrtf(f);
}

##### Share on other sites
Ok, I understand the problem now. The compiler is really clever and I am stupid. Zedzeek is right, I can''t do it like that because sqrtf(f) is a constant value and the compiler is clever enough to recognize that and calculate it only one time. So my complete loop was just one square root. If I modify the code and use g = sqrtf((float)i) it is working a bit slower than my empty function.

The 3dNow square root is still a bit slower than the sqrtf function, but that''s another problem now....

Thank you very much for your help guys

LaBasX2

##### Share on other sites
I had a small bug in my code; the MySqrt function is even a bit faster than sqrtf (about 10%) but if you consider that it has less precision than sqrtf it is hardly worth implementing. And I can''t figure out another way to optimize MySqrt.

LaBasX2