Sure. I have plans for tonight but I can do it soon.You can make this code more precise by using a higher-degree polynomial in x2, and by picking better coefficients (I pretty much picked these by hand).
Any takers?
One thing that will make it faster and more accurate is to replace:
static const float pi_halves = float(std::atan(1.0)*2.0);
with:const float pi_halves = 1.5707963267948966192313216916398f; // float(std::atan(1.0)*2.0)
Even using a high-precision calculator does not give exactly PI/2 when I go through atan(1)*2. Due to the accuracy of floats, it will likely end up being the same constant no matter what, but this ensures it is correct for higher degrees of accuracy, and it makes us sure we have exactly the best constant.It also avoids the possibility that a certain compiler does not implement atan() as an intrinsic and evaluated at compile time.
The 2nd and more-important point is to remove static. This will add a lot of code for initializing, in a thread-safe way, pi_halves. It will always add a branch (critical to avoid in performant code such as this) and may add instruction-cache misses due to the extra code that exists for locking, applying the value, setting a flag, and unlocking.
I see a few other things that could impact performance as well.
And I will tackle accuracy soon.
L. Spiro