Fastest way to solve a quadratic equation?

Started by
12 comments, last by Ravyne 16 years, 9 months ago
Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations? For example, I found that at least in some cases, by using b' = b/2a, and c' = c/a I can save a few floating point operations. Then the equation reduces to: x1,2 = -b' +/- sqrt(b'2 - c') and if one can calculate b' and c' directly, this saves some computation. (And of course, checking if b'2 - c <= 0 helps avoid unnecessary sqrt)
Advertisement
I would imagine the compiler does a pretty good job of optimizing stuff like this. I can't imagine too many cases where the quadratic equation would be a bottleneck in a program - and if it were improving performance would likely have to be on a case by case basis.
If you really need it you could use a short taylor series expansion of the sqrt to speed things up (at the cost of precision, though).
"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
The algorithm itself is trivial. There's not much you can optimize about it.

But, if you need to perform this on more data, then you could obviously vectorize everything, and use the SIMD or GPU or something similar.

But pray tell, why is solving 10 million quadratic equations per second too slow?
Quote:Original post by cshowe
I can't imagine too many cases where the quadratic equation would be a bottleneck in a program - and if it were improving performance would likely have to be on a case by case basis.
Quote:Original post by Antheus
...

I'm currently working on a ray tracer. There many of the ray-primitive intersection tests require to solve a quadratic equation, and since those tests are the heart of the ray tracer, I would imagine any improvement in speed here would be noticeable.

Granted though, I know that this may not be the biggest bottleneck, and I will probably get bugger gains by reducing the number of tests using bounding volume hierarchies and the like. At the moment it is more of an intellectual pursuit than a real optimization effort. :-)


Antheus, using SIMD here does sound like a good idea, though maybe tricky.. I'll have to remember that for when I get to the optimization stage.

At the moment I have a 5 object scene rendering to a 800x600 window, with just primary and shadow rays, completely unoptimized (too early yet). That's about 5 million intersection tests, and it takes about 2-3 seconds to render a frame on my 1.7Ghz pc. Though not all of these use a quadratic equation, and I'm sure there are bottlenecks in other places, on a wild guess I'd say the quadratic equations do contribute..

Quote:Original post by joanusdmentia
If you really need it you could use a short taylor series expansion of the sqrt to speed things up (at the cost of precision, though).

I suspect that on modern processors an sqrt would be faster than a taylor series, even a short one. But I admit this is an interesting idea that I haven't thought of. :-)
Quote:I would imagine any improvement in speed here would be noticeable.


Not really. As you increase number of objects, the number of ray bounces increases as well. Without some basic spatial portioning for collision tests this will kill you algorithmically.

You'd also be better off asking this in the graphics forum, mentioning it's for ray-tracing from the start.
Quote:Original post by Antheus
Quote:I would imagine any improvement in speed here would be noticeable.

Not really. As you increase number of objects, the number of ray bounces increases as well. Without some basic spatial portioning for collision tests this will kill you algorithmically.

True, but once the number of tests is reduced to a minimum, the next major impact should be the speed of each such test, since in an optimized ray tracer a large portion of the time will be spent doing those tests. And this is where the quadratic equation comes in.
Quote:You'd also be better off asking this in the graphics forum, mentioning it's for ray-tracing from the start.

For ray-tracing specific solutions, yes. But since I'm still far from a true optimization phase, I was more interested in the generic question. Hence I posted here without mentioning the actual use in my case. I only mentioned ray tracing as an example of when the quadratic equation may play a large part in a program.
Quote:Original post by technobot2
Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations?

For example, I found that at least in some cases, by using b' = b/2a, and c' = c/a I can save a few floating point operations. Then the equation reduces to:
x1,2 = -b' +/- sqrt(b'2 - c')
and if one can calculate b' and c' directly, this saves some computation. (And of course, checking if b'2 - c <= 0 helps avoid unnecessary sqrt)
I'm not sure if the rearranged equation is equivalent to
x = (-b +/- sqrt(b2 - 4ac)) / 2a
but I could be wrong.
Where did you learn that rearrangment from - Links please?
Have you considered the geometric method of ray/sphere intersection testing?

Sorry it has been a number of years since I wrote my RTRT.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by technobot2
True, but once the number of tests is reduced to a minimum, the next major impact should be the speed of each such test, since in an optimized ray tracer a large portion of the time will be spent doing those tests. And this is where the quadratic equation comes in.

Then do the algorithmic optimization first. If you end up doing 1000 solutions per frame you shouldn't waste time "optimizing" what the compiler already does for you. Premature optimization is the root of all evil, only when you have done algorithmic optimizations and confirmed that the solution of quadratic equations is a bottleneck should you try to optimize it. And after doing any kind of what you believe to be optimizations make sure to test whether it speeds up your code, doesn't affect it or actually makes it slower.

You talk about saving instructions, but according to some quick counting you do the same number of square root operations, the same number of subtractions, one more division and two less multiplications. According to my copy Intel's optimization manual a single precision division (float) is three times as slow as a multiplication when it comes to latency and 11.5-15 times as slow for throughput on P4s. For better precision it just becomes worse and the throughput of a division can be up to 22 times as slow as a multiplication (using extended precision). So saving two multiplications at the expense of one division seems to be a bad tradeoff, but with modern CPUs this kind of cycle counting can't be used as a performance measure.

If you really need to optimize a piece of code like this you need to take pipelining, prefetching and caching into account, but a compiler should be able to do this much better than you.

Also are you sure you even need the general form? Often problems can be reduced to a simpler form. For instance you might know that your discriminant is always nonnegative and can therefore omit the nasty conditional.

Quote:I'm not sure if the rearranged equation is equivalent to
x = (-b +/- sqrt(b2 - 4ac)) / (2a)
but I could be wrong.

It follows from elementary algebra.
-b' +/- sqrt(b'2 - c') =
-b/2a +/- sqrt((b/2a)2 - c/a) =
-b/2a +/- sqrt(b2/4a2 - 4ac/4a2) =
-b/2a +/- sqrt( (b2 - 4ac)/4a2) =
-b/2a +/- sqrt(b2 - 4ac)/sqrt(4a2) =
-b/2a +/- sqrt(b2 - 4ac)/2a =
(-b +/- sqrt(b2 - 4ac))/2a

How close an approximation it's in actual floating point arithmetic I don't know, but if floating point numbers perfectly represented the real numbers then it would be valid.
Quote:Hi, I was just wondering - what is the fastest algorithm to solve a quadratic equation in c++ or a similar language (assuming plain IEEE floating point)? For that matter, is there a generally fastest algorithm at all, or is it just a matter of case-specific optimizations?


I'm not sure how much of an issue it is but the traditional form can have a number of issues with floating point precision, in particular if b^2 is much larger then 4ac or if b^2 - 4ac is very close to 0, if it is a problem then the following form may be of use (or you may be able to manipulate this form to reduce the number of operations). Just a note that these issues are unlikely to appear on x86 architectures if the compiler performs the whole operation on the FP stack due to the extended 80 bit precision that the processor uses for intermediate results.
    -b +/- sqrt(b^2 - 4ac)   -b -/+ sqrt(b^2 - 4ac)x = ---------------------- * ----------------------            2a               -b -/+ sqrt(b^2 - 4ac)          b^2 - (b^2 - 4ac)  = -----------------------------    2a * (-b -/+ sqrt(b^2 - 4ac))              2c  = ----------------------    -b -/+ sqrt(b^2 - 4ac)
Quote:
I'm not sure if the rearranged equation is equivalent to
x = (-b +/- sqrt(b^2 - 4ac)) / 2a
but I could be wrong.
Where did you learn that rearrangment from - Links please?
I'ts easy to verify,
x = -b' +/- sqrt(b'^2 - c') where b' = b / 2a and c' = c / a  = -(b / 2a) +/- sqrt((b/2a)^2 - (c / a))  = -(b / 2a) +/- sqrt(b^2/4a^2 - c/a)  = -(b / 2a) +/- sqrt(1 / (4 * a^2) * (4 * a^2) * (b^2/4a^2 - c/a))  = -(b / 2a) +/- sqrt(1 / (4 * a^2) * (b^2 - 4ac))  = -(b / 2a) +/- sqrt(1/(4 * a^2)) * sqrt(b^2 - 4ac)  = -(b / 2a) +/- 1/2a * sqrt(b^2 - 4ac)  = (-b +/- sqrt(b^2 - 4ac)) / 2a


Edit: To slow :)
Edit: If you do decide to go with SSE I'd apply partial fractions to the above result and get it into the form a' + b'/sqrt(c') so that you can take advantage of the limited precision reciprocal square root instruction and compare to see if that gives improved performance.

[Edited by - Julian90 on July 9, 2007 6:49:28 AM]

This topic is closed to new replies.

Advertisement