• ### Announcements

#### Archived

This topic is now archived and is closed to further replies.

# A mad fast SquareRoot method

## Recommended Posts

ageny6    122
I know that for a fact the Squareroot method is really slow! But, unfortunatly, I got to this point where I need it I there a substitute method that is much faster than the SquareRoot method given in the Math.h library (or the sqr function if you are using VB)? My signature used to suck. But it''s much better now.

##### Share on other sites
Lorenzo    122
if you have a new compiler, than float sqrt(float) in your math.h is fast enough. Period. there are a lot of impl using integers in the forums, but integer arith ceased to be faster in 1995. FPU is a lot faster now.

##### Share on other sites
aboeing    180
and if you dont have a new compiler download support libraries from amd and intel which use the faster sqrt functions that their processors provide. [or u could use asm emit statements...]

im not to sure if interger/bit hacks are slower these days, but i wouldn''t be suprised.

##### Share on other sites
ageny6    122
quote:
Original post by Lorenzo
if you have a new compiler, than float sqrt(float) in your math.h is fast enough.

How fast enough?

---------------------------------------------
And is there an algorithm at the higher level language that performs the squareroot quickly?

My signature used to suck. But it''s much better now.

##### Share on other sites
antareus    576
Well, what makes you think its slow?

##### Share on other sites
Guest Anonymous Poster
If you are using sqrt for distances try an approximation. http://www.flipcode.com/articles/article_fastdistance.shtml

sometimes you can try getting rid of the square root.

( a * a > (b * b))

vs

(sqrt(a * a) > b )

##### Share on other sites
This is called micro-optimization, and it's dumb. Instead, if sqrt proves to be too slow, do something about it. Until it does, don't worry about it. Most often, the "doing something" will be optimizing your algorithm at a higher level, which does not include finding a faster sqrt. As someone said, any recent ('99+) compiler will have a plenty fast sqrt.

"How fast enough?" Fast enough that you don't need to worry about it. Just use it.

Later,
ZE.

[edited by - zealouselixir on October 13, 2003 4:30:51 PM]

##### Share on other sites
Promit    13246
I wrote a new sqrt based on MacLaurin expansions (along with sin and cos). Never needed them, finally. No noticeable gain in anything.

Debating using a lookup table, but the real thing is this--you have way more important things to speed up than your sqrt routine. Don''t even worry about it until you have a playable game that is cool enough to show off to everybody.

##### Share on other sites
kordova    140
Gives the inverse of the square root (floats only):

float InvSqrt (float x){    float xhalf = 0.5f*x;    int i = *(int*)&x;    i = 0x5f3759df - (i >> 1);    x = *(float*)&i;    x = x*(1.5f - xhalf*x*x);    return x;}

That was on these forums a while ago, credit to Carmack but really discovered by someone else. I need sleep now.

##### Share on other sites
eldee    122
quote:
Original post by kordova
That was on these forums a while ago, credit to Carmack but really discovered by someone else. I need sleep now.

the Newton-Raphson method.. although, the theory behind it is a hundred years old if im not mistaken.

-eldee
;another space monkey;
[ Forced Evolution Studios ]

Do NOT let Dr. Mario touch your genitals. He is not a real doctor!

##### Share on other sites
Ronin Magus    142
Oddly enough, I was just showing that function to a buddy of mine the other day.

daveandrews.org, soon to be home to a GAPI Sprite Engine.

##### Share on other sites
Guest Anonymous Poster
quote:
Original post by eldee
quote:
Original post by kordova
That was on these forums a while ago, credit to Carmack but really discovered by someone else. I need sleep now.

the Newton-Raphson method.. although, the theory behind it is a hundred years old if im not mistaken.

-eldee
;another space monkey;
<small>[ <a href="http://www.forced-evolution.com/" target="_blank">Forced Evolution Studios </a> ] </small></p>
<p><a href="http://www.forced-evolution.com" target="_blank"><img src="http://www.forced-evolution.com/images/skullsig.gif" border="0" alt="::evolve::"></a>
<SPAN CLASS=smallfont>
Do NOT let Dr. Mario touch your genitals. He is not a real doctor!
</span>

The Newton-Rhapson method is old, but the trick is performing the bit shift on the integer interpretation of the bit configuration for a float.

##### Share on other sites
Yup, Carmack is credited (probably rightly) with devising the bitshift method and determining the magic number that makes it converge the fastest.

Heh, and yeah, Newton-Raphson is a good deal more than a century old.

##### Share on other sites
> if you have a new compiler, than float sqrt(float) in your math.h is fast enough. Period.
I''ll give credit to the compiler (ICC 7.1) for turning the sqrt into fsqrt (it''s an intrinsic, obviously ), but it still takes 27(!) clocks. That''s quite slow.

quote:
there are a lot of impl using integers in the forums, but integer arith ceased to be faster in 1995. FPU is a lot faster now.

I agree, especially given SSE/3DNow! sqrt instructions. fsqrt is easy to beat, though.
Oh BTW - you can boost fsqrt speed to 19 clocks by setting FPU precision to single.

quote:
and if you dont have a new compiler download support libraries from amd and intel which use the faster sqrt functions that their processors provide. [or u could use asm emit statements...]

Using SSE sqrtss is good if you can assume CPU support, but you still pay for moving into the SSE regs. That doesn''t work so well on older Athlons - all FPU regs are invalidated by 3DNow! instructions.

> Debating using a lookup table
Heh, always laugh on hearing sqrt LUT since Nvidia''s fastmath implementation, which fills L2 with crap.

quote:
Yup, Carmack is credited (probably rightly) with devising the bitshift method and determining the magic number that makes it converge the fastest.

hmm, a derivation on magic-software.com shows that the approximation of the mantissa is underoptimal.

##### Share on other sites
ageny6    122
quote:
Original post by eldee
the Newton-Raphson method.. although, the theory behind it is a hundred years old if im not mistaken.

But ins''t the squareroot method provided in the math.h supposed to be faster than the Newton-Raphson method?

My signature used to suck. But it''s much better now.

##### Share on other sites
Ronin Magus    142
quote:
Original post by Jan Wassenberg

I''ll give credit to the compiler (ICC 7.1) for turning the sqrt into fsqrt (it''s an intrinsic, obviously ), but it still takes 27(!) clocks. That''s quite slow.

I have a question.. is this not 27 clocks on the FPU, rather than the CPU? If it is on the FPU, I see it as negligible..

daveandrews.org, soon to be home to a GAPI Sprite Engine.

##### Share on other sites
> But ins''t the squareroot method provided in the math.h supposed to be faster than the Newton-Raphson method?
Carmack''s rsqrt is 17 clocks by my quick count (+4 for the multiply to make it sqrt), but uses some integer and mem access slots.
3DNow!''s (and SSE) hardware N-R implementations wipe the floor with both: 3 clocks for an initial rsqrt approximation good to 14 bits (15 for 24 bits).

> I have a question.. is this not 27 clocks on the FPU, rather than the CPU? If it is on the FPU, I see it as negligible..
The FPU has been integrated into the CPU since the 486. What''s the difference? Latency is latency.

##### Share on other sites
ageny6    122
quote:
Original post by kordova
Gives the inverse of the square root (floats only):

float InvSqrt (float x){    float xhalf = 0.5f*x;    int i = *(int*)&x;    i = 0x5f3759df - (i >> 1);    x = *(float*)&i;    x = x*(1.5f - xhalf*x*x);    return x;}

if it gives you the inverse of the squareroot, can''t you just do something like:
float x;x = 1/InvSqrt(23.4);

My signature used to suck. But it''s much better now.

##### Share on other sites
Enigma    1410
quote:
Original post by ageny6
if it gives you the inverse of the squareroot, can''t you just do something like:
   float x;x = 1/InvSqrt(23.4);

You could, but it would be a really bad idea!

Carmack''s inverse square root is much faster to compute that 1 / sqrt(num). This is because 1 / sqrt(num) requires a call to sqrt (slow) and a floating point division (slow). Using 1 / InvSqrt(num) requires a call to InvSqrt and a floating point division (slow) . This takes longer than just a pure call to sqrt(num).

Enigma

##### Share on other sites
davepermen    1047
sqrt(x) == x*rsq(x) with rsq == the inverse / reverse square root approximation of carmack..

the fastest way for a sqrt approx.

If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net

##### Share on other sites
ageny6    122
quote:
Original post by davepermen
sqrt(x) == x*rsq(x) with rsq == the inverse / reverse square root approximation of carmack..

the fastest way for a sqrt approx.

Not sure I understand. rsq == the inverse/ reverse square root? What is the reverse square root?

My signature used to suck. But it''s much better now.

##### Share on other sites
davepermen    1047
the same as inverse square root.. its the way it got named in pixelshaders,vertexshaders,fragmentprograms,pixelprograms, as it is a such often used instruction there..

the inverse square root would be the square.. :D or so..

its just a name.

If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net

##### Share on other sites
The trick is using babylonian bisection or a reduced form of tangent approximation(differential) and then trying to find the square root of numbers between 1 and less than 2. Using some sophisticated method (maybe a look up table) create a very good initial approximation. Then 2-3 iterations, all expanded and optimised should fulfill the precision of a 7 digit-passed the desimal (base 10) float, which is what''s used in openGL.

##### Share on other sites
ageny6    122
quote:
The trick is using babylonian bisection or a reduced form of tangent approximation(differential) and then trying to find the square root of numbers between 1 and less than 2. Using some sophisticated method (maybe a look up table) create a very good initial approximation. Then 2-3 iterations, all expanded and optimised should fulfill the precision of a 7 digit-passed the desimal (base 10) float, which is what's used in openGL.

Ins't that slower than Carmack's inverse square root with a multiplication of the number being squarerooted?

My signature used to suck. But it's much better now.

btw: what is this on probation thing?

[edited by - ageny6 on October 15, 2003 12:38:17 PM]

##### Share on other sites
ageny6    122
Wait a minute:

quote:
Original post by kordova
Gives the inverse of the square root (floats only):

float InvSqrt (float x){    float xhalf = 0.5f*x;    int i = *(int*)&x;    i = 0x5f3759df - (i >> 1);    x = *(float*)&i;    x = x*(1.5f - xhalf*x*x);    return x;}

what is
 i = 0x5f3759df - (i >> 1);
? What is the purpose of the hexadecimal?

My signature used to suck. But it''s much better now.