# Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

## Recommended Posts

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

##### Share on other sites
3 minutes ago, alvaro said:

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

I realize that work on the gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

Edited by CarlML

##### Share on other sites
2 minutes ago, CarlML said:

I realize that work gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Edited by alvaro

##### Share on other sites

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

##### Share on other sites
3 minutes ago, alvaro said:

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

##### Share on other sites
1 minute ago, CarlML said:

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

yes, but you only proven that a higher exponent overflows (quicker) and that causes a slow down.

#include <ctime>
#define NCOUNT 10000000
float vals[ NCOUNT ];
int PowTest( ) {
using namespace std;
for ( int i = 0; i < NCOUNT; i++ ) {
vals[ i ] = 1.f + rand( )* 0.00001f / RAND_MAX;
}
float vall0 = 0.0f;
clock_t begin0 = clock( );
for ( int i = 0; i < NCOUNT; i++ ) {
vall0 += pow( vals[ i ], 4.0f );
}
clock_t end0 = clock( );
clock_t begin1 = clock( );
float vall1 = 0.0f;
for ( int i = 0; i < NCOUNT; i++ ) {
vall1 += pow( vals[ i ], 400.0f );
}
clock_t end1 = clock( );
double elapsed_secs0 = double( end0 - begin0 ) / CLOCKS_PER_SEC;
double elapsed_secs1 = double( end1 - begin1 ) / CLOCKS_PER_SEC;
printf( "%f %f %f %f\n", elapsed_secs0, elapsed_secs1, vall0, vall1 );
return 0;
}


sorry, my lazy mod to get it compile

##### Share on other sites
26 minutes ago, ProfL said:

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

The numbers have to be random so that the compiler won't optimize things out.

If I use random values between 0.0f and 1.0f the times are similar as in my test and the result doesn't go infinite. So overflow can't be the only reason.

Edit: but they probably went infinitesimal.

Edited by CarlML

##### Share on other sites

In my case I get the same timings from both code paths after my change.

No, you don't need to add random numbers, the array already obfuscates the access for the compiler, at least in the default settings, but my mod has some rand to have noise. results are still the same for 4 and 400

##### Share on other sites

If I use random values between 1.0 and 1.2 there is no overflow either way and the timing results are nearly the same so the culprit was most likely infinite or infinitesimal numbers.

Thanks for the input.

##### Share on other sites

thanks for your feedback, and I think it's good you've tried it yourself rather than just believing blindly the guys on the internet

But like someone said, your GPU results might differ. I suggest you try that also, simplest would be https://www.shadertoy.com/

just click on "new" and modify the color output to run a few more pow and increase your loop count until the fps drops  (try full screen for the best slowdown).

you might be also surprised how many times you can run pow without to worry about that instruction.

## Create an account

Register a new account

• ### Game Developer Survey

We are looking for qualified game developers to participate in a 10-minute online survey. Qualified participants will be offered a \$15 incentive for your time and insights. Click here to start!

• 12
• 14
• 10
• 33
• 23