pow alternative

Math and Physics Programming

Started by fluke July 23, 2008 07:26 PM

10 comments, last by Maze Master 15 years, 9 months ago

136

Author

July 23, 2008 07:26 PM

first off, this is actually two questions in one (all c/c++ related): 1) suppose i'm calling this a few million times: pow(x,.125); given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it? 2) also, i need an general purpose alternative to float pow(float,float);. i've searched the web and found some very elaborate versions that supposedly offer better performance at a reduced precision (which is EXACTLY what i need), however none of them seem to produce faster results in my benchmark tests. if you've got a nice alternative pow function in your library, i'd love to see it! inline assembly versions warmly welcomed :)

dmatter

4,872

July 23, 2008 07:44 PM

Quote:Original post by fluke
1) suppose i'm calling this a few million times:

pow(x,.125);

given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it?

Can you change your algorithm to call it fewer times?
What properties does x have?

Just a thought really, have you profiled against:

sqrt(sqrt(sqrt(x)));

It might be possible, though I wouldn't know how, for someone else here optimise for the octic root of a number.

David Gill :: GitHub :: twitter .

silvermace

634

July 23, 2008 07:53 PM

because that function call would easily approach accuracy problems with IEEE754 would an approximation be sufficient? the most obvious one I can think of is Newton Raphson; see the Quake3 source code for Carmackien/Abrashien implementation (which you can probably bet is fast..)

see: http://www.beyond3d.com/content/articles/8/

[EDIT: might also give you some ideas about how you could implement a faster pow(float,float) ?]

"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website

fluke

136

Author

July 23, 2008 08:48 PM

Quote:Original post by dmatter
Quote:Original post by fluke
1) suppose i'm calling this a few million times:

pow(x,.125);

given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it?

Can you change your algorithm to call it fewer times?
What properties does x have?

Just a thought really, have you profiled against:

sqrt(sqrt(sqrt(x)));

It might be possible, though I wouldn't know how, for someone else here optimise for the octic root of a number.

dmatter, unfortunately the algorithm cannot be called fewer times.

i tried sqrt(sqrt(sqrt(x))); and it actually offered a slight performance boost.

where a typical result might yeild a turn-around time of 38891ms, your method resulted in a 32907ms turn-around. i know it's all out of context here, but while that's not the optimal performance increase i was looking for - it is certainly much better than what i had!

and silvermace, yes, approximations are quite fine. i'll take a look at the link you provided. thanks!

silvermace

634

July 23, 2008 09:00 PM

this is actually a pretty interesting optimisation problem.. I wonder if you could use SSE to calculate the square roots 4 elements at time? I think there is an unbreakable circular dependency there though, because x is a function of its previous value...

"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website

alvaro

21,604

July 23, 2008 09:06 PM

A first attempt seems to be about twice as fast as std::pow(x,.125):

#include <cmath>struct EighthRoot {  double input;  double f_over_f_prime(double x) {    double x2=x*x;    double x4=x2*x2;    double x6=x4*x2;    double x7=x6*x;    double x8=x4*x4;    return (x8-input)/(8.0*x7);  }public:  EighthRoot(double input) : input(input) {  }  operator double () {    union {      long long as_long_long;      double as_double;    } hackish_union;    hackish_union.as_double = input;    hackish_union.as_long_long -= 0x3ff0000000000000ll;    hackish_union.as_long_long >>= 3;    hackish_union.as_long_long += 0x3ff0000000000000ll;    double x = hackish_union.as_double;    for(unsigned i=0;i<3;++i) {      x -= f_over_f_prime(x);    }    return x;  }};int main() {  std::cout << EighthRoot(1000.0) << '\n';}

dmatter

4,872

July 23, 2008 09:10 PM

Quote:Original post by fluke
i tried sqrt(sqrt(sqrt(x))); and it actually offered a slight performance boost.

In that case then, besides getting an actual octic rooter function, you can be looking into finding optimised/approximated sqrt functions - of which I'm sure there are plenty.

Of course, if x has any usable properties then there could be far simpler optimisations. For example, if x only ever takes on a handful of different values, then you can perform the calculations in advance and just conditionally select the correct one. I suspect that's hoping for too much though.

David Gill :: GitHub :: twitter .

alvaro

21,604

July 23, 2008 09:23 PM

This is a bit better.

#include <iostream>#include <cmath>class EighthRoot {  double input;  double f_over_f_prime(double x) {    double x2=x*x;    double x4=x2*x2;    double x8=x4*x4;    return .125*x*(1.0-input/x8);  }public:  EighthRoot(double input) : input(input) {  }  operator double () {    // Find first guess by basically shifting the exponent part of the    // double by 3 (I don't really care much about the rest of the double)    union {      long long as_long_long;      double as_double;    } hackish_union;    hackish_union.as_double = input;    hackish_union.as_long_long >>= 3;    hackish_union.as_long_long += 0x37F2000000000000ll;    // I am sure one could find a slightly better constant        double x = hackish_union.as_double;    x -= f_over_f_prime(x);    x -= f_over_f_prime(x);    // Repeat more times for extra accuracy    return x;  }};int main() {  std::cout << EighthRoot(1000.0) << '\n';}

alvaro

21,604

July 23, 2008 09:55 PM

Ah, nevermind. What I posted is only marginally faster than sqrt(sqrt(sqrt(x))) and significantly less precise. A better magic constant is 0x37f1504f333fa37ell, but it's not much better.

silvermace

634

July 23, 2008 10:40 PM

fiddled with compiler settings? VC++ 7.1/9/10 are pretty capable...

http://msdn.microsoft.com/en-us/library/aa289157(VS.71).aspx

"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website

pow alternative

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

pow alternative

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines