# Fastest way to solve a quadratic equation?

This topic is 3815 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations? For example, I found that at least in some cases, by using b' = b/2a, and c' = c/a I can save a few floating point operations. Then the equation reduces to: x1,2 = -b' +/- sqrt(b'2 - c') and if one can calculate b' and c' directly, this saves some computation. (And of course, checking if b'2 - c <= 0 helps avoid unnecessary sqrt)

##### Share on other sites
I would imagine the compiler does a pretty good job of optimizing stuff like this. I can't imagine too many cases where the quadratic equation would be a bottleneck in a program - and if it were improving performance would likely have to be on a case by case basis.

##### Share on other sites
If you really need it you could use a short taylor series expansion of the sqrt to speed things up (at the cost of precision, though).

##### Share on other sites
The algorithm itself is trivial. There's not much you can optimize about it.

But, if you need to perform this on more data, then you could obviously vectorize everything, and use the SIMD or GPU or something similar.

But pray tell, why is solving 10 million quadratic equations per second too slow?

##### Share on other sites
Quote:
 Original post by cshoweI can't imagine too many cases where the quadratic equation would be a bottleneck in a program - and if it were improving performance would likely have to be on a case by case basis.
Quote:
 Original post by Antheus...

I'm currently working on a ray tracer. There many of the ray-primitive intersection tests require to solve a quadratic equation, and since those tests are the heart of the ray tracer, I would imagine any improvement in speed here would be noticeable.

Granted though, I know that this may not be the biggest bottleneck, and I will probably get bugger gains by reducing the number of tests using bounding volume hierarchies and the like. At the moment it is more of an intellectual pursuit than a real optimization effort. :-)

Antheus, using SIMD here does sound like a good idea, though maybe tricky.. I'll have to remember that for when I get to the optimization stage.

At the moment I have a 5 object scene rendering to a 800x600 window, with just primary and shadow rays, completely unoptimized (too early yet). That's about 5 million intersection tests, and it takes about 2-3 seconds to render a frame on my 1.7Ghz pc. Though not all of these use a quadratic equation, and I'm sure there are bottlenecks in other places, on a wild guess I'd say the quadratic equations do contribute..

Quote:
 Original post by joanusdmentiaIf you really need it you could use a short taylor series expansion of the sqrt to speed things up (at the cost of precision, though).

I suspect that on modern processors an sqrt would be faster than a taylor series, even a short one. But I admit this is an interesting idea that I haven't thought of. :-)

##### Share on other sites
Quote:
 I would imagine any improvement in speed here would be noticeable.

Not really. As you increase number of objects, the number of ray bounces increases as well. Without some basic spatial portioning for collision tests this will kill you algorithmically.

You'd also be better off asking this in the graphics forum, mentioning it's for ray-tracing from the start.

##### Share on other sites
Quote:
Original post by Antheus
Quote:
 I would imagine any improvement in speed here would be noticeable.

Not really. As you increase number of objects, the number of ray bounces increases as well. Without some basic spatial portioning for collision tests this will kill you algorithmically.

True, but once the number of tests is reduced to a minimum, the next major impact should be the speed of each such test, since in an optimized ray tracer a large portion of the time will be spent doing those tests. And this is where the quadratic equation comes in.
Quote:
 You'd also be better off asking this in the graphics forum, mentioning it's for ray-tracing from the start.

For ray-tracing specific solutions, yes. But since I'm still far from a true optimization phase, I was more interested in the generic question. Hence I posted here without mentioning the actual use in my case. I only mentioned ray tracing as an example of when the quadratic equation may play a large part in a program.

##### Share on other sites
Quote:
 Original post by technobot2Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations?For example, I found that at least in some cases, by using b' = b/2a, and c' = c/a I can save a few floating point operations. Then the equation reduces to:x1,2 = -b' +/- sqrt(b'2 - c')and if one can calculate b' and c' directly, this saves some computation. (And of course, checking if b'2 - c <= 0 helps avoid unnecessary sqrt)
I'm not sure if the rearranged equation is equivalent to
x = (-b +/- sqrt(b2 - 4ac)) / 2a
but I could be wrong.
Have you considered the geometric method of ray/sphere intersection testing?

Sorry it has been a number of years since I wrote my RTRT.

##### Share on other sites
Quote:
 Original post by technobot2True, but once the number of tests is reduced to a minimum, the next major impact should be the speed of each such test, since in an optimized ray tracer a large portion of the time will be spent doing those tests. And this is where the quadratic equation comes in.

Then do the algorithmic optimization first. If you end up doing 1000 solutions per frame you shouldn't waste time "optimizing" what the compiler already does for you. Premature optimization is the root of all evil, only when you have done algorithmic optimizations and confirmed that the solution of quadratic equations is a bottleneck should you try to optimize it. And after doing any kind of what you believe to be optimizations make sure to test whether it speeds up your code, doesn't affect it or actually makes it slower.

You talk about saving instructions, but according to some quick counting you do the same number of square root operations, the same number of subtractions, one more division and two less multiplications. According to my copy Intel's optimization manual a single precision division (float) is three times as slow as a multiplication when it comes to latency and 11.5-15 times as slow for throughput on P4s. For better precision it just becomes worse and the throughput of a division can be up to 22 times as slow as a multiplication (using extended precision). So saving two multiplications at the expense of one division seems to be a bad tradeoff, but with modern CPUs this kind of cycle counting can't be used as a performance measure.

If you really need to optimize a piece of code like this you need to take pipelining, prefetching and caching into account, but a compiler should be able to do this much better than you.

Also are you sure you even need the general form? Often problems can be reduced to a simpler form. For instance you might know that your discriminant is always nonnegative and can therefore omit the nasty conditional.

Quote:
 I'm not sure if the rearranged equation is equivalent tox = (-b +/- sqrt(b2 - 4ac)) / (2a)but I could be wrong.

It follows from elementary algebra.
-b' +/- sqrt(b'2 - c') =
-b/2a +/- sqrt((b/2a)2 - c/a) =
-b/2a +/- sqrt(b2/4a2 - 4ac/4a2) =
-b/2a +/- sqrt( (b2 - 4ac)/4a2) =
-b/2a +/- sqrt(b2 - 4ac)/sqrt(4a2) =
-b/2a +/- sqrt(b2 - 4ac)/2a =
(-b +/- sqrt(b2 - 4ac))/2a

How close an approximation it's in actual floating point arithmetic I don't know, but if floating point numbers perfectly represented the real numbers then it would be valid.

##### Share on other sites
Quote:
 Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations?

I'm not sure how much of an issue it is but the traditional form can have a number of issues with floating point precision, in particular if b^2 is much larger then 4ac or if b^2 - 4ac is very close to 0, if it is a problem then the following form may be of use (or you may be able to manipulate this form to reduce the number of operations). Just a note that these issues are unlikely to appear on x86 architectures if the compiler performs the whole operation on the FP stack due to the extended 80 bit precision that the processor uses for intermediate results.
    -b +/- sqrt(b^2 - 4ac)   -b -/+ sqrt(b^2 - 4ac)x = ---------------------- * ----------------------            2a               -b -/+ sqrt(b^2 - 4ac)          b^2 - (b^2 - 4ac)  = -----------------------------    2a * (-b -/+ sqrt(b^2 - 4ac))              2c  = ----------------------    -b -/+ sqrt(b^2 - 4ac)
Quote:
 I'm not sure if the rearranged equation is equivalent tox = (-b +/- sqrt(b^2 - 4ac)) / 2abut I could be wrong.Where did you learn that rearrangment from - Links please?
I'ts easy to verify,
x = -b' +/- sqrt(b'^2 - c') where b' = b / 2a and c' = c / a  = -(b / 2a) +/- sqrt((b/2a)^2 - (c / a))  = -(b / 2a) +/- sqrt(b^2/4a^2 - c/a)  = -(b / 2a) +/- sqrt(1 / (4 * a^2) * (4 * a^2) * (b^2/4a^2 - c/a))  = -(b / 2a) +/- sqrt(1 / (4 * a^2) * (b^2 - 4ac))  = -(b / 2a) +/- sqrt(1/(4 * a^2)) * sqrt(b^2 - 4ac)  = -(b / 2a) +/- 1/2a * sqrt(b^2 - 4ac)  = (-b +/- sqrt(b^2 - 4ac)) / 2a

Edit: To slow :)
Edit: If you do decide to go with SSE I'd apply partial fractions to the above result and get it into the form a' + b'/sqrt(c') so that you can take advantage of the limited precision reciprocal square root instruction and compare to see if that gives improved performance.

[Edited by - Julian90 on July 9, 2007 6:49:28 AM]

##### Share on other sites
Quote:
 Original post by iMalcHave you considered the geometric method of ray/sphere intersection testing?

Yes. At least as far as counting instruction goes, the naive implementation of the algebraic method takes more. But if you cache your last c coefficient, assuming that several consecutive tests use the same ray origin (which can be arranged), and allow for normalized ray vectors only, thus ensuring also that a is always 1, then the algebraic method can be made to use less instructions. However, I'm not sure how that reflects on actual performance. And in any case, the quadratic equation is used not only for ray-sphere.

Quote:
 Original post by CTarThen do the algorithmic optimization first. If you end up doing 1000 solutions per frame you shouldn't waste time "optimizing" what the compiler already does for you. Premature optimization is the root of all evil, only when you have done algorithmic optimizations and confirmed that the solution of quadratic equations is a bottleneck should you try to optimize it. And after doing any kind of what you believe to be optimizations make sure to test whether it speeds up your code, doesn't affect it or actually makes it slower.

That's a very valid point, and I'm quite aware of that. But as I said earlier, atm it is more of an intellectual pursuit than a true optimization effort.
Quote:
 You talk about saving instructions, but according to some quick counting you do the same number of square root operations, the same number of subtractions, one more division and two less multiplications.

Not sure how you counted, but by my count the rearranged method takes exactly the same amount (and kind) of operations if you start from a,b,c (and assuming you do a'=1/a, b'=b*0.5*a', c'=c*a' rather than b'=b*0.5/a, c'=c/a or some other way). If you calculate 0.5b or even -0.5b directly, which is possible in some cases, you save a multiplication (and a negation in the latter case) - granted, that's not much at all. But if you can also reach c'=c/a and b'=b/2a or even b"=-a/2a directly, then you may save a few more operations, since x=-b'+sqrt(b'*b'-c') or x=b"+sqrt(b"*b"-c') takes less than x=(-b+sqrt(b*b-4*a*c)/(2*a) . Though I suppose that even then the savings are not very large, and the point you made about floating point precision is also important (that speckling I'm currently getting might be because of that, actually.. [rolleyes] I'll need to check that..).

Quote:
 Also are you sure you even need the general form? Often problems can be reduced to a simpler form. For instance you might know that your discriminant is always nonnegative and can therefore omit the nasty conditional.

I don't really need anything, given that I'm currently treating this as a theoretical question rather than a practical one. [smile] But I agree that case specific optimizations are possible, and perhaps easier, and the point about the discriminant is valid.

Quote:
 Original post by Julian90x = ... = 2c / (-b -/+ sqrt(b^2 - 4ac))If you do decide to go with SSE I'd apply partial fractions to the above result and get it into the form a' + b'/sqrt(c') so that you can take advantage of the limited precision reciprocal square root instruction and compare to see if that gives improved performance.

Interesting.

##### Share on other sites
Julian, could you show me how to get from 2c / (-b -/+ sqrt(b^2 - 4ac)) to a' + b'/sqrt(c') ? I tried to derive this myself, but I'm afraid sometimes math isn't my strongest side.. [rolleyes] Thanks.

##### Share on other sites
The trivial one's would be a' = 0 or b' = 0 ;), just kidding, I played around with it for a little bit and it doesn't seem to work out how I'd expected, it appears that you will always end up with a square root in either a' or b'. I threw a CAS at it just to see what came out and ended up with the following so....
a' = a'      2*c*sqrt(b^2-4*a*c) + a'*sqrt(b^2-4*a*c)*b - a'*b^2 + 4*a'*a*cb' = --------------------------------------------------------------                        -b + sqrt(b^2-4*a*c)

##### Share on other sites
Just an interesting note for the OP...

Awhile back I read an article on a real-time ray-tracing effort using the Cell processor of the PS3. Using 4 machines in a network, they were rendering a very complex car model with complex lighting and materials at a pretty good framerate. The article detailed their approach to optimizing the ray-tracer for SIMD operation in a very novel, and appearantly efficient, manner.

I don't have the link, but you may want to try your luck with google if it sounds interesting to you.

##### Share on other sites

This topic is 3815 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.