Sqrt() still slow?

Started by
7 comments, last by Zahlman 15 years, 9 months ago
Hi, I've recently been attending job interview for positions in the games industry. In a recent interview I mentioned to the interviewer that I was never explicitly taught that the square root function was slow, it was just something I picked up. He then stopped me and told me that nowadays this is not the case, and hasn't been for a long time. He said that nowadays with optimisations and so forth that the operation is scarcely more expensive than a division. However only a few months ago I attended an interview where I was asked to make some code more efficient, and the answer they were looking for was to remove the square root. So who is right? Google tells me that people still seem to worry about sqrt being slow, as the first few hits are all 'faster methods for finding sqrt()'. Additionally, if it's an iterative technique you're only going to be able to make it so much faster (and certainly it's speed couldn't be converging on a division, if it requires one or more divisions per cycle). I'd really appreciate if someone in the know could help me out here. I'm leaning towards thinking this guy is wrong, but I don't see how someone in his position could be so wrong about something that is fairly elementary.
Advertisement
I guess it depends on the processor and instructions you use. There is an approximating instruction for SSE. It involves table lookup and interpolation so it is pretty fast.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Quote:Original post by Winegums
He then stopped me and told me that nowadays this is not the case, and hasn't been for a long time. He said that nowadays with optimisations and so forth that the operation is scarcely more expensive than a division.

I haven't seen any evidence to suggest that this is true. Precise sqrt by design is slow to calculate...

You can optimise it, but it might not be precise any more.

I believe Andre LaMothe published a sqrt function which was only a few lines of assembly and only had a maximum error of %5 for numbers in the range 0-1. There's also look-up tables if you've got a small enough input range, and/or enough memory...
The interviewer is right. sqrt() is "slow," not slow. Fundamentally, the operation is more complex than an addition or division.

Theoretically, the majority of modern desktop machines and consoles have fast instruction level support for the operation to some degree, which in combination with the fact that said processors are in general much faster than they were years ago, ameliorates the performance impact.

Practically, there are embedded or isolated platforms for which the above is not true.

Idealogically, very few professional software developers are in a position to actually give a damn until they have profiled and discovered that they have a serious bottleneck in their sqrt() function. Optimizing sqrt() is a micro-optimization -- it's not an algorithmic or fundamental design level change, so you can just as easily do it now as you can later.

What the interviewer (hopefully) meant is mostly that last point: that sqrt() is no longer slow enough in general to agonize over up-front unless one knows one is on a platform where it matters. These types of platforms are less common.
I tend to avoid sqrt if I can (if I can do my calculations in squared distances or whatever the problem calls for). I don't lose any sleep if I have to put one in though. That's what profiling my code is for.
I'm aware that in the grand scheme of things it's probably going to be bad software design choices that choke your application, not too many expensive mathematical operations, but the fact that he likened the sqrt() function to a division set of alarms in my head.

Additionally I was careful to ensure that he didn't just mean that the cost of sqrt() is now less exepensive as we have faster processors (ie it may have taken 1ms to execute on a PC 10 years ago, and now it takes 200us, or whatever).

From what I'm aware companies are still working on writing more quickly converging sqrt functions (I think someone from Epic wrote an article on how he was working on theirs).

Quote:Original post by Winegums
However only a few months ago I attended an interview where I was asked to make some code more efficient, and the answer they were looking for was to remove the square root.


That indeed qualifies as an improvement. Concern for sqrt() performance is validated only if you need to calculate it many, many times in a very short interval (like a million times a second), and then only if accuracy is crucial (as there are much faster, less accurate alternatives). The octogon test, for instance, is the absolute fastest, and least accurate, approximation. It was used for lighting calculations in Diablo II (which is why light sources cast octogonal volumes). It involves one condition (a comparison), one division, one addition, and one assignment. Of course, the margin-of-error runs as high as 14 percent.

Faster variants are researched for the sake of faster hardware, especially video hardware. Trigonometric functions (which are also converging) fall into the same category and tend to be considerably slower than sqrt().

GDNet+. It's only $5 a month. You know you want it.

Quote:Original post by Winegums
Hi, I've recently been attending job interview for positions in the games industry. In a recent interview I mentioned to the interviewer that I was never explicitly taught that the square root function was slow, it was just something I picked up. He then stopped me and told me that nowadays this is not the case, and hasn't been for a long time. He said that nowadays with optimisations and so forth that the operation is scarcely more expensive than a division.

However only a few months ago I attended an interview where I was asked to make some code more efficient, and the answer they were looking for was to remove the square root.



They both could be right. If you are trying to find which item is closer sqrt is unnecessary overhead even if it is fast. With how slow ram has become compared to processor speed, small functions like sqrt are essentially free after you load them in to cache.
What you need to understand:

1) Computers are really, really fast. Elementary mathematics take incredibly little time to do, even if they're as complicated as sqrt().

2) Taking a square root is "slow" in the sense that it's a lot of work. Basically, every instruction on modern hardware is faster than on older hardware, but doing a square root still takes quite a lot more time than doing an add. (Although, the question "how long does it take to do X" no longer has a straight answer on modern CPUs, because of things like superscalar architecture, pipelines etc.)

3) "The fastest code is that which doesn't get executed". When you find that you do, in fact, need to make something faster, and when you have decided which part seems to be slowing things down (the bottleneck), one of the first steps is to try to find parts of the calculation that aren't really needed. If we need to find out how far away something is, that square root is essential, and using an approximation might cause things to fail. Getting the right answer takes precedence over getting a fast answer. If we only need to check which of two things is further away, then we don't need the square roots, and they represent a large chunk of the effort in the calculation. In that case, getting rid of it is a no-brainer.

4) Some optimizations are so simple that they get done more or less automatically. Many experienced developers wouldn't even get the idea to make the square root calls, when doing the length-comparison, in the first place. :) But there are lots of others where there's a common pattern for applying them, but it's actually a lot of work - and sometimes, turns out not to be an optimization at all. You see this with a lot of old hands who shun exceptions, preferring to use return codes and "out parameters" everywhere.

This topic is closed to new replies.

Advertisement