pow alternative

Started by
10 comments, last by Maze Master 15 years, 9 months ago
first off, this is actually two questions in one (all c/c++ related): 1) suppose i'm calling this a few million times: pow(x,.125); given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it? 2) also, i need an general purpose alternative to float pow(float,float);. i've searched the web and found some very elaborate versions that supposedly offer better performance at a reduced precision (which is EXACTLY what i need), however none of them seem to produce faster results in my benchmark tests. if you've got a nice alternative pow function in your library, i'd love to see it! inline assembly versions warmly welcomed :)
Advertisement
Quote:Original post by fluke
1) suppose i'm calling this a few million times:

pow(x,.125);

given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it?

Can you change your algorithm to call it fewer times?
What properties does x have?

Just a thought really, have you profiled against:

sqrt(sqrt(sqrt(x)));

It might be possible, though I wouldn't know how, for someone else here optimise for the octic root of a number.
because that function call would easily approach accuracy problems with IEEE754 would an approximation be sufficient? the most obvious one I can think of is Newton Raphson; see the Quake3 source code for Carmackien/Abrashien implementation (which you can probably bet is fast..)

see: http://www.beyond3d.com/content/articles/8/

[EDIT: might also give you some ideas about how you could implement a faster pow(float,float) ?]
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
Quote:Original post by dmatter
Quote:Original post by fluke
1) suppose i'm calling this a few million times:

pow(x,.125);

given what you know about numbers (probably a whole lot more than me), how can this be sped up? or can it?

Can you change your algorithm to call it fewer times?
What properties does x have?

Just a thought really, have you profiled against:

sqrt(sqrt(sqrt(x)));

It might be possible, though I wouldn't know how, for someone else here optimise for the octic root of a number.


dmatter, unfortunately the algorithm cannot be called fewer times.

i tried sqrt(sqrt(sqrt(x))); and it actually offered a slight performance boost.

where a typical result might yeild a turn-around time of 38891ms, your method resulted in a 32907ms turn-around. i know it's all out of context here, but while that's not the optimal performance increase i was looking for - it is certainly much better than what i had!


and silvermace, yes, approximations are quite fine. i'll take a look at the link you provided. thanks!
this is actually a pretty interesting optimisation problem.. I wonder if you could use SSE to calculate the square roots 4 elements at time? I think there is an unbreakable circular dependency there though, because x is a function of its previous value...
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
A first attempt seems to be about twice as fast as std::pow(x,.125):
#include <cmath>struct EighthRoot {  double input;  double f_over_f_prime(double x) {    double x2=x*x;    double x4=x2*x2;    double x6=x4*x2;    double x7=x6*x;    double x8=x4*x4;    return (x8-input)/(8.0*x7);  }public:  EighthRoot(double input) : input(input) {  }  operator double () {    union {      long long as_long_long;      double as_double;    } hackish_union;    hackish_union.as_double = input;    hackish_union.as_long_long -= 0x3ff0000000000000ll;    hackish_union.as_long_long >>= 3;    hackish_union.as_long_long += 0x3ff0000000000000ll;    double x = hackish_union.as_double;    for(unsigned i=0;i<3;++i) {      x -= f_over_f_prime(x);    }    return x;  }};int main() {  std::cout << EighthRoot(1000.0) << '\n';}
Quote:Original post by fluke
i tried sqrt(sqrt(sqrt(x))); and it actually offered a slight performance boost.
In that case then, besides getting an actual octic rooter function, you can be looking into finding optimised/approximated sqrt functions - of which I'm sure there are plenty.

Of course, if x has any usable properties then there could be far simpler optimisations. For example, if x only ever takes on a handful of different values, then you can perform the calculations in advance and just conditionally select the correct one. I suspect that's hoping for too much though.
This is a bit better.
#include <iostream>#include <cmath>class EighthRoot {  double input;  double f_over_f_prime(double x) {    double x2=x*x;    double x4=x2*x2;    double x8=x4*x4;    return .125*x*(1.0-input/x8);  }public:  EighthRoot(double input) : input(input) {  }  operator double () {    // Find first guess by basically shifting the exponent part of the    // double by 3 (I don't really care much about the rest of the double)    union {      long long as_long_long;      double as_double;    } hackish_union;    hackish_union.as_double = input;    hackish_union.as_long_long >>= 3;    hackish_union.as_long_long += 0x37F2000000000000ll;    // I am sure one could find a slightly better constant        double x = hackish_union.as_double;    x -= f_over_f_prime(x);    x -= f_over_f_prime(x);    // Repeat more times for extra accuracy    return x;  }};int main() {  std::cout << EighthRoot(1000.0) << '\n';}
Ah, nevermind. What I posted is only marginally faster than sqrt(sqrt(sqrt(x))) and significantly less precise. A better magic constant is 0x37f1504f333fa37ell, but it's not much better.
fiddled with compiler settings? VC++ 7.1/9/10 are pretty capable...

http://msdn.microsoft.com/en-us/library/aa289157(VS.71).aspx
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website

This topic is closed to new replies.

Advertisement