Members - Reputation: 1969
Posted 26 October 2004 - 08:39 PM
Crossbones+ - Reputation: 2002
Posted 26 October 2004 - 08:52 PM
I don't think yo ushould worry about it too much :)
Members - Reputation: 366
Posted 26 October 2004 - 10:32 PM
float InvSqrt (float x)
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
And i got it from
Yes, it breaks when the representation changes. use the #defines!
You usually never need it. But if you need a lot of sqrts, use the above code.
Banned - Reputation: 100
Posted 27 October 2004 - 04:03 AM
I suggest to all people that have a problem with using sqrt that they put a loop in their code somewhere that keeps calling sqrt. You can typically put it in thousands of times before you even see a FPS drop.
Posted 27 October 2004 - 07:35 AM
I compared three versions: 1 / sqrtf(), the code from the above post, and the code from Doom 3. Doom 3 uses the same principal, but calculates the seed on the fly using some constants and a lookup table.
Amazingly enough (and unless I messed up somewhere) DoomInvSqrt() returned exactly the same results as 1 / sqrtf(). So no accuracy problems there. And Q3InvSqrt() was plenty close.
I just did a brute-force test - 10,000,000 calls to each function. The Q3 and Doom versions were about 1.8 times as fast as 1 / sqrt().
I would use the Doom version, but I imagine the code is copyrighted, and I don't understand it well enough to recreate it for myself. But I suppose the other code is fair game.
Members - Reputation: 1412
Posted 27 October 2004 - 08:45 AM
The only way you'll get speedup is using the inverse square root formula to actually calculate the inverse square root, and iirc the win was barely one.
Well, you could also get a win using one of the functions that returns much less precision than sqrtf, but in that case you're not really comparing like things.
Moderators - Reputation: 1361
Posted 28 October 2004 - 04:02 AM
In addition to the other thread created within the past few days on the same subject, I found many, many, MANY discussions on fast sqt dating back to 2001. I don't see much new in this thread. If anyone has a compelling argument why the thread should remain open, please send me a private message and make your case strongly. The topic of fast sqrt has been covered to death and I will double check any argument to reopen the thread against the archives to see if the argument holds water.
Moderators - Reputation: 1361
Posted 28 October 2004 - 05:59 AM
I reopened the thread just to post this example from superpig. It seems like a good contribution that may be useful. The thread is closing back immediately.
From superpig via private message
SSE has both SQRT and RSQRT instructions, but you don't really get much benefit unless you're doing four of them at once. Say you want to get the lengths of four vectors, stored as a structure of arrays:
__declspec(align(16)) struct blockOfFourVectors
movaps xmm0, [data + 0x00] ; load x components
movaps xmm1, [data + 0x10] ; load y components
movaps xmm2, [data + 0x20] ; load z components
; square each component
mulaps xmm0, xmm0
mulaps xmm1, xmm1
mulaps xmm2, xmm2
; sum them into xmm0
addps xmm0, xmm1
addps xmm0, xmm2
; sqrt to get length
sqrtps xmm0, xmm0
; save out
movaps [data + 0x30], xmm0
That would calc all four lengths into data.lengths. For just a single vector it's not really worthwhile (and unless you store that vector in a SoA, would require a load of shuffling).