Jump to content
  • Advertisement
Sign in to follow this  
CarlML

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

Recommended Posts

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

 

Share this post


Link to post
Share on other sites
Advertisement
3 minutes ago, alvaro said:

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

 

I realize that work on the gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

Edited by CarlML

Share this post


Link to post
Share on other sites
2 minutes ago, CarlML said:

I realize that work gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Edited by alvaro

Share this post


Link to post
Share on other sites

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

Share this post


Link to post
Share on other sites
3 minutes ago, alvaro said:

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

Share this post


Link to post
Share on other sites
1 minute ago, CarlML said:

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

yes, but you only proven that a higher exponent overflows (quicker) and that causes a slow down.

#include <ctime>
#define NCOUNT 10000000
float vals[ NCOUNT ];
int PowTest( ) {
    using namespace std;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vals[ i ] = 1.f + rand( )* 0.00001f / RAND_MAX;
    }
    float vall0 = 0.0f;
    clock_t begin0 = clock( );
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall0 += pow( vals[ i ], 4.0f );
    }
    clock_t end0 = clock( );
    clock_t begin1 = clock( );
    float vall1 = 0.0f;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall1 += pow( vals[ i ], 400.0f );
    }
    clock_t end1 = clock( );
    double elapsed_secs0 = double( end0 - begin0 ) / CLOCKS_PER_SEC;
    double elapsed_secs1 = double( end1 - begin1 ) / CLOCKS_PER_SEC;
    printf( "%f %f %f %f\n", elapsed_secs0, elapsed_secs1, vall0, vall1 );
    return 0;
}

sorry, my lazy mod to get it compile

Share this post


Link to post
Share on other sites
26 minutes ago, ProfL said:

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

The numbers have to be random so that the compiler won't optimize things out.

If I use random values between 0.0f and 1.0f the times are similar as in my test and the result doesn't go infinite. So overflow can't be the only reason.

Edit: but they probably went infinitesimal.

Edited by CarlML

Share this post


Link to post
Share on other sites

In my case I get the same timings from both code paths after my change.

 

No, you don't need to add random numbers, the array already obfuscates the access for the compiler, at least in the default settings, but my mod has some rand to have noise. results are still the same for 4 and 400

Share this post


Link to post
Share on other sites

If I use random values between 1.0 and 1.2 there is no overflow either way and the timing results are nearly the same so the culprit was most likely infinite or infinitesimal numbers.

Thanks for the input.

Share this post


Link to post
Share on other sites

thanks for your feedback, and I think it's good you've tried it yourself rather than just believing blindly the guys on the internet :)

But like someone said, your GPU results might differ. I suggest you try that also, simplest would be https://www.shadertoy.com/

just click on "new" and modify the color output to run a few more pow and increase your loop count until the fps drops  (try full screen for the best slowdown).

you might be also surprised how many times you can run pow without to worry about that instruction.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!