Jump to content
  • Advertisement
Sign in to follow this  
CarlML

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

Recommended Posts

I'm curious about how the pow function works. Does the cost of the function go up linearly with the number of powers? Does it calculate val*val 40 times in pow(val, 40.0)?

Share this post


Link to post
Share on other sites
Advertisement

That depends a lot on the hardware and the compiler. in general, both should be the same if there is hardware support. If the compiler realizes you want x*=x; x*=x; it most likely will be quicker.

the way pow works is usually :

exp( exponent * log( base ) );

Share this post


Link to post
Share on other sites

Thanks for the reply. Sounds good that it would generally be the same.

 

So I guess calculating a very tight specular highlight using pow() is not more expensive than calculating a wide highlight then.

Share this post


Link to post
Share on other sites

When it comes to performance, you can only tell by measuring. Even if at some point you think you understand how something like this works, a few years later compilers and hardware might have changed and your knowledge might become obsolete. This has happened to me several times.

But my educated guess is the same as ProfL's: It will probably be implemented using exp(exponent * log(base)) and it won't matter what the exponent is.

Share this post


Link to post
Share on other sites

if you use pow, to calculate a highlight, in a shader, you most likely will pass the value as a constant, that's what the compiler won't see, hence it will use a generic exp+log. In that case it will run at the same speed, no matter what value that light power constant is.

Share this post


Link to post
Share on other sites

I just did some time measurements and it seems the exponent does matter. Between 4.0 and 40.0 there was no noticable difference but as the exponent got higher there was, so it seems there is a point where the algorithm changes based on the exponent.

Some results in milliseconds for doing pow() 100 000 times in c++ (including accessing an array to get a random value):

4.0: 2.977
40.0: 2.966
400.0: 4.192
4 000: 4.803
40 000: 3.872
400 000: 3.742

Edited by CarlML

Share this post


Link to post
Share on other sites

I wouldn't be surprised if there are fast-paths in place for small and/or common exponents. That could be as part of the library implementation or applied by the compiler or the hardware.

Share this post


Link to post
Share on other sites
On 9/3/2019 at 11:50 AM, CarlML said:

I just did some time measurements and it seems the exponent does matter. Between 4.0 and 40.0 there was no noticable difference but as the exponent got higher there was, so it seems there is a point where the algorithm changes based on the exponent.

Some results in milliseconds for doing pow() 100 000 times in c++ (including accessing an array to get a random value):

4.0: 2.977
40.0: 2.966
400.0: 4.192
4 000: 4.803
40 000: 3.872
400 000: 3.742

First of all, are you really calculating specular highlights on the CPU?

Either way, this type of synthetic test is probably not very relevant. For instance, depending on the range of numbers you are plugging in, you might be getting degradation for high exponents because handling infinities, or denormalized (very small) numbers might be slower than operating on regular numbers. More generally, in your real program the CPU might be able to parallelize the pow() with some other operations, while in your test it might not (or the other way around). The cache usage might be very different. Etc.

The way to test performance is to introduce timings in your program and run it in realistic conditions.

Out of curiosity, try with an exponent `4' (instead of `4.0'). In some cases this might be much faster.

Edited by alvaro

Share this post


Link to post
Share on other sites

99% of the time, an artificial benchmark like this benchmarks the capability of the creator, not of it supposes to test ;)

could you share your test code, your compiler name, compile settings. 

Edited by ProfL

Share this post


Link to post
Share on other sites
2 hours ago, alvaro said:

First of all, are you really calculating specular highlights on the CPU?

Either way, this type of synthetic test is probably not very relevant. For instance, depending on the range of numbers you are plugging in, you might be getting degradation for high exponents because handling infinities, or denormalized (very small) numbers might be slower than operating on regular numbers. More generally, in your real program the CPU might be able to parallelize the pow() with some other operations, while in your test it might not (or the other way around). The cache usage might be very different. Etc.

The way to test performance is to introduce timings in your program and run it in realistic conditions.

Out of curiosity, try with an exponent `4' (instead of `4.0'). In some cases this might be much faster.

No specular calcualtion happens in a shader. Doing it on the cpu would be crazy.😋

As you point out, I suspect the timing difference has something to do with numbers going infiinite or infitinesimal. In a regular use case where I would use values between 20.0 and 100.0 for calucalting specular I suspect there generally would not be a big difference.

To say that the exponent does not matter is wrong though because those numbers in my test don't lie.

1 hour ago, ProfL said:

99% of the time, an artificial benchmark like this benchmarks the capability of the creator, not of it supposes to test ;)

could you share your test code, your compiler name, compile settings. 

Don't be butthurt that the numbers didn't go your way.😉

In any case I appreciate the input.

 

This was my test code, using Visual Studio 2017:

 

float vals[100000];
for (int i = 0; i < 100000; i++)
{
   vals[i] = random.getf(0.0f, 1000.0f);
}
float vall = 0.0f;
timer.Start();
for (int i = 0; i < 100000; i++)
{
    vall += pow(vals[i], 4.0f);
}
float tim1 = timer.End();
timer.Start();
for (int i = 0; i < 100000; i++)
{
    vall += pow(vals[i], 400.0f);
}
float tim2 = timer.End();
Print(Vec3(tim1, tim2, vall));

 

Edited by CarlML

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!