Looking for fast c++ math library

Started by
9 comments, last by Paradigm Shifter 10 years, 8 months ago

Hello,

I'm making real-geometry grass class for my engine and at the certain point I need to calculate spectators distance to each grass array. The problem is that when I implement standart c++ functions like sqrt and pow for the distance calculation the engine can render 10 times less grass than without it.

Is there any lightning fast c++ math library that can do basic square root and power functions ?

Thanks in advance :D

Advertisement

An external library is not going to make your performance problems magically go away. Computing square roots is expensive, no matter how you do it, so you should probably try to avoid it as much as possible.

Code like this:


  if (distance(camera, blade) >= 10.0f)

can be replaced with


  if (distance_squared(camera, blade) >= 100.0f)

Computing the distance squared doesn't require calling sqrt.

If you are using sqrt and pow in situations where you don't know how to eliminate the calls, post some details of what you are doing and we can probably suggest some alternatives.

For fast square root and FloatToInt you can use this:

http://www.nvidia.com/object/fast_math_routines.html

Use SSE intrinsics?

__m128 x = _mm_set_ps(3*3, 2*2, 4*4, 8*8);

__m128 y = _mm_sqrt_ps(x); // sqrt(x)

__m128 z = _mm_mul_ps(y, y); // sqrt(x) * sqrt(x)

Or use a library like Eigen (http://eigen.tuxfamily.org).

Edit: I don't know how much faster using SSE is (if at all). Anyway, Alvaro has the best solution. An actual distance value is very rarely needed.

An external library is not going to make your performance problems magically go away. Computing square roots is expensive, no matter how you do it, so you should probably try to avoid it as much as possible.

Code like this:


  if (distance(camera, blade) >= 10.0f)

can be replaced with


  if (distance_squared(camera, blade) >= 100.0f)

Computing the distance squared doesn't require calling sqrt.

If you are using sqrt and pow in situations where you don't know how to eliminate the calls, post some details of what you are doing and we can probably suggest some alternatives.

Well I was thinking this way, but thought it wouldn't do any good. But now I'll try to do this thanks unless Kambiz pointed library will perform quite well :D


I don't know how much faster using SSE is (if at all)
If the computation/ALU time is the bottleneck, then SSE can give a 4x improvement :)

If memory-bandwidth is the bottleneck, then using SSE will make no difference at all (or can even hurt performance) :(

Algorithmic improvements like Álvaro posted are a great place to start.

@Tank202, if you want specific optimization advice, you could post your loop that deals with grass rendering ;)

As Alvaro indicates, the fastest square root is the one you don't perform.

I my code, for example, I have two separate functions:

point.DistanceFrom(otherPoint) which returns the actual distance from the other point, requiring a sqrt.

point.WithinDistanceOf(otherPoint, distance) which simply returns true if it's within the distance, and doesn't return a sqrt, and is just as accurate.

It's not super optimized, and I'm no mathmagician, but here's my code (for 2D games) as an example of the concept:


//Returns the absolute distance between this point and 'other'.
unsigned cPoint::DistanceFrom(const cPoint &other)
{
	//Pygorean's theorum: A^2 + B^2 = C^2
	int horizontal = (this->x - other.x); //(this->x - other.x) could be negative, but multiplying it by itself will always be positive anyway.
	horizontal *= horizontal;
	
	int vertical = (this->y - other.y);
	vertical *= vertical;
	
	return std::sqrt(horizontal + vertical);
}

//Returns true if we are within 'distance' of 'other'. This is faster than 'DistanceFrom', because it saves a sqrt().
bool cPoint::WithinDistanceOf(const cPoint &other, unsigned distance) const
{
	//Pygorean's theorum: A^2 + B^2 = C^2
	int horizontal = (this->x - other.x); //(this->x - other.x) could be negative, but multiplying it by itself will always be positive anyway.
	horizontal *= horizontal;
	
	int vertical = (this->y - other.y);
	vertical *= vertical;
	
	//Square the distance.
	distance *= distance;
	
	//Use the squared distance for comparison, instead of getting the square root.
	return (horizontal + vertical) <= distance;
}

But if you're still going slow, the fastest code is the code you don't call. Do you really need to calculate this every frame? Can you re-calculate them half on one frame, and the other half on the next frame? For the farther ones at least?

Also, are you recalculating the grass behind the player that's not on the screen?

http://macton.smugmug.com/gallery/8936708_T6zQX#!i=593426709&k=BrHWXdJ

is a good read if you want to write really efficient code, even if the author is a bit smug ;)

Nice and clever opinions :D But the best solution is neither nice nor clever... Its simple - I've removed square root, where I couldn't extract rooting I implemented nVidia fastmath.h library(Kambiz post), and the best performance was achieved only by removing standart c++ cmath header and therefore removing all pow functions. Now the power calculations are hard-coded.

Maybe It doesn' say much but I'm quite happy with the result as the engine now can initialise up to 10mil grass blades, render (I think) up to 1mil in real-time. And the best part of it that it will be dramatically optimised in the meantime :P So i think the numbers will multiply and my game graphics will be rather detail and nice.

Thank you all for your time and effort :D

I want to point out that std::pow is not necessarily slow, although it might be slow in a particular compiler. g++ will generate multiplications if you are using std::pow with a constant positive integer exponent, and it's quite clever about it.

For instance, this code:
  std::pow(x, 17);
turns into this assembly:
        movsd   8(%rsp), %xmm1
        movapd  %xmm1, %xmm0
        mulsd   %xmm1, %xmm0
        mulsd   %xmm0, %xmm0
        mulsd   %xmm0, %xmm0
        mulsd   %xmm0, %xmm0
        mulsd   %xmm1, %xmm0
So it computes x^17 using five multiplications, thusly:
x^2 = x * x
x^4 = x^2 * x^2
x^8 = x^4 * x^4
x^16 = x^8 * x^8
x^17 = x^16 * x

This topic is closed to new replies.

Advertisement