• Create Account

# Looking for fast c++ math library

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

10 replies to this topic

### #1Tank202  Members   -  Reputation: 106

Like
1Likes
Like

Posted 16 August 2013 - 07:50 AM

Hello,

I'm making real-geometry grass class for my engine and at the certain point I need to calculate spectators distance to each grass array. The problem is that when I implement standart c++ functions like sqrt and pow for the distance calculation the engine can render 10 times less grass than without it.

Is there any lightning fast c++ math library that can do basic square root and power functions ?

### #2Álvaro  Crossbones+   -  Reputation: 8161

Like
4Likes
Like

Posted 16 August 2013 - 08:00 AM

An external library is not going to make your performance problems magically go away. Computing square roots is expensive, no matter how you do it, so you should probably try to avoid it as much as possible.

Code like this:

  if (distance(camera, blade) >= 10.0f)

can be replaced with

  if (distance_squared(camera, blade) >= 100.0f)

Computing the distance squared doesn't require calling sqrt.

If you are using sqrt and pow in situations where you don't know how to eliminate the calls, post some details of what you are doing and we can probably suggest some alternatives.

Edited by Álvaro, 16 August 2013 - 08:00 AM.

### #3Kambiz  Members   -  Reputation: 758

Like
1Likes
Like

Posted 16 August 2013 - 08:19 AM

For fast square root and FloatToInt you can use this:

http://www.nvidia.com/object/fast_math_routines.html

### #4_swx_  Members   -  Reputation: 682

Like
0Likes
Like

Posted 16 August 2013 - 09:57 AM

Use SSE intrinsics?

__m128 x = _mm_set_ps(3*3, 2*2, 4*4, 8*8);

__m128 y = _mm_sqrt_ps(x); // sqrt(x)

__m128 z = _mm_mul_ps(y, y); // sqrt(x) * sqrt(x)

Or use a library like Eigen (http://eigen.tuxfamily.org).

Edit: I don't know how much faster using SSE is (if at all). Anyway, Alvaro has the best solution. An actual distance value is very rarely needed.

Edited by _swx_, 16 August 2013 - 10:01 AM.

### #5Tank202  Members   -  Reputation: 106

Like
0Likes
Like

Posted 16 August 2013 - 10:12 AM

An external library is not going to make your performance problems magically go away. Computing square roots is expensive, no matter how you do it, so you should probably try to avoid it as much as possible.

Code like this:

  if (distance(camera, blade) >= 10.0f)

can be replaced with

  if (distance_squared(camera, blade) >= 100.0f)

Computing the distance squared doesn't require calling sqrt.

If you are using sqrt and pow in situations where you don't know how to eliminate the calls, post some details of what you are doing and we can probably suggest some alternatives.

Well I was thinking this way, but thought it wouldn't do any good. But now I'll try to do this thanks unless Kambiz pointed library will perform quite well

### #6Hodgman  Moderators   -  Reputation: 19621

Like
1Likes
Like

Posted 16 August 2013 - 10:17 AM

I don't know how much faster using SSE is (if at all)
If the computation/ALU time is the bottleneck, then SSE can give a 4x improvement

If memory-bandwidth is the bottleneck, then using SSE will make no difference at all (or can even hurt performance)

Algorithmic improvements like Álvaro posted are a great place to start.

@Tank202, if you want specific optimization advice, you could post your loop that deals with grass rendering ;)

### #7Servant of the Lord  Crossbones+   -  Reputation: 12835

Like
1Likes
Like

Posted 16 August 2013 - 10:23 AM

As Alvaro indicates, the fastest square root is the one you don't perform.

I my code, for example, I have two separate functions:

point.DistanceFrom(otherPoint) which returns the actual distance from the other point, requiring a sqrt.

point.WithinDistanceOf(otherPoint, distance) which simply returns true if it's within the distance, and doesn't return a sqrt, and is just as accurate.

It's not super optimized, and I'm no mathmagician, but here's my code (for 2D games) as an example of the concept:

//Returns the absolute distance between this point and 'other'.
unsigned cPoint::DistanceFrom(const cPoint &other)
{
//Pygorean's theorum: A^2 + B^2 = C^2
int horizontal = (this->x - other.x); //(this->x - other.x) could be negative, but multiplying it by itself will always be positive anyway.
horizontal *= horizontal;

int vertical = (this->y - other.y);
vertical *= vertical;

return std::sqrt(horizontal + vertical);
}

//Returns true if we are within 'distance' of 'other'. This is faster than 'DistanceFrom', because it saves a sqrt().
bool cPoint::WithinDistanceOf(const cPoint &other, unsigned distance) const
{
//Pygorean's theorum: A^2 + B^2 = C^2
int horizontal = (this->x - other.x); //(this->x - other.x) could be negative, but multiplying it by itself will always be positive anyway.
horizontal *= horizontal;

int vertical = (this->y - other.y);
vertical *= vertical;

//Square the distance.
distance *= distance;

//Use the squared distance for comparison, instead of getting the square root.
return (horizontal + vertical) <= distance;
}


But if you're still going slow, the fastest code is the code you don't call. Do you really need to calculate this every frame? Can you re-calculate them half on one frame, and the other half on the next frame? For the farther ones at least?

Also, are you recalculating the grass behind the player that's not on the screen?

Edited by Servant of the Lord, 16 August 2013 - 03:31 PM.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.

All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.                                                                                                                                                       [Need free cloud storage? I personally like DropBox]

### #8_swx_  Members   -  Reputation: 682

Like
0Likes
Like

Posted 16 August 2013 - 10:45 AM

http://macton.smugmug.com/gallery/8936708_T6zQX#!i=593426709&k=BrHWXdJ

is a good read if you want to write really efficient code, even if the author is a bit smug ;)

### #9Tank202  Members   -  Reputation: 106

Like
0Likes
Like

Posted 16 August 2013 - 03:10 PM

Nice and clever opinions But the best solution is neither nice nor clever... Its simple - I've removed square root, where I couldn't extract rooting I implemented nVidia fastmath.h library(Kambiz post), and the best performance was achieved only by removing standart c++ cmath header and therefore removing all pow functions. Now the power calculations are hard-coded.

Maybe It doesn' say much but I'm quite happy with the result as the engine now can initialise up to 10mil grass blades, render (I think) up to 1mil in real-time. And the best part of it that it will be dramatically optimised in the meantime So i think the numbers will multiply and my game graphics will be rather detail and nice.

Thank you all for your time and effort

### #10Álvaro  Crossbones+   -  Reputation: 8161

Like
2Likes
Like

Posted 16 August 2013 - 05:05 PM

I want to point out that std::pow is not necessarily slow, although it might be slow in a particular compiler. g++ will generate multiplications if you are using std::pow with a constant positive integer exponent, and it's quite clever about it.

For instance, this code:
  std::pow(x, 17);
turns into this assembly:
        movsd   8(%rsp), %xmm1
movapd  %xmm1, %xmm0
mulsd   %xmm1, %xmm0
mulsd   %xmm0, %xmm0
mulsd   %xmm0, %xmm0
mulsd   %xmm0, %xmm0
mulsd   %xmm1, %xmm0

So it computes x^17 using five multiplications, thusly:
x^2 = x * x
x^4 = x^2 * x^2
x^8 = x^4 * x^4
x^16 = x^8 * x^8
x^17 = x^16 * x

Edited by Álvaro, 16 August 2013 - 05:08 PM.

### #11Paradigm Shifter  Crossbones+   -  Reputation: 4067

Like
1Likes
Like

Posted 17 August 2013 - 01:54 AM

If you aren't raising to a constant power, but instead doing kx a lot (with k constant), it is faster to calculate c = loge(k) and doing exp(x * c) instead.

I've used that for particle systems before, where damping is applied with a variable frame rate, the damping factor is constant and it is raised to the power of delta time.

pow is implemented using ax = exp(x log a) anyway if x is non-integral so it is quicker to cut out the middleman and use base e from the off.

Edited by Paradigm Shifter, 17 August 2013 - 01:54 AM.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS