Home » Community » Forums » Math and Physics » Fast sqrt
Intel sponsors gamedev.net search:
Control Panel Register Bookmarks Who's Online Active Topics Stats FAQ Search

Add Forum to Favorites |  Send Topic To a Friend | View Forum FAQ | Track this topic


 Last Thread Next Thread 
 Fast sqrt
Post New Topic  Post Reply 
I've been looking through the various threads on this forum about fast sqrt and inverse sqrt approximations, and have also been looking at the relevant code in the Doom 3 sdk, and have a couple of questions.

1. I understand enough about the methods used to know that it relies on the specifics of 32 bit IEEE floating point representation. So I assume that means the code breaks if the representation changes? What happens on 64 bit machines like the Mac G5?

2. And, is it worth it? Or would you generally be better off (and close to as fast) using the standard functions, which I assume are gauranteed to be portable and consistent from platform to platform?

(Please excuse me if the question seems naive, but it involves some areas I don't know much about, i.e. floating point architecture, etc.)

 User Rating: 2006   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

there should be no need for it nowadays. and it's a fast inverse sqrt. Use the normal stuff, and see if it bottleneck your application. Nothing prevents you to have a float finvsqrt(float x) { ... } function in your math lib, then the implementation is left to you and the platform it's running on (so SSE/sqrt+div/Carmack for PC, and whatever for Macs).

I don't think yo ushould worry about it too much :)

 User Rating: 1874   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

Look
float InvSqrt (float x)
{
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i >> 1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}



And i got it from
here

Yes, it breaks when the representation changes. use the #defines!

You usually never need it. But if you need a lot of sqrts, use the above code.
From,
Nice coder

 User Rating: 1180   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

it typically takes LOTS of sqrt calls to mean anything in terms of slowdown.

I suggest to all people that have a problem with using sqrt that they put a loop in their code somewhere that keeps calling sqrt. You can typically put it in thousands of times before you even see a FPS drop.

 User Rating: 854   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I messed around with the functions a little and did some completely unscientific tests (if I remember correctly, in earlier threads on the subject people did some very rigorous comparisons).

I compared three versions: 1 / sqrtf(), the code from the above post, and the code from Doom 3. Doom 3 uses the same principal, but calculates the seed on the fly using some constants and a lookup table.

Amazingly enough (and unless I messed up somewhere) DoomInvSqrt() returned exactly the same results as 1 / sqrtf(). So no accuracy problems there. And Q3InvSqrt() was plenty close.

I just did a brute-force test - 10,000,000 calls to each function. The Q3 and Doom versions were about 1.8 times as fast as 1 / sqrt().

I would use the Doom version, but I imagine the code is copyrighted, and I don't understand it well enough to recreate it for myself. But I suppose the other code is fair game.

 User Rating: 1015    Report this Post to a Moderator | Link

Uh...AP = jyk.

 User Rating: 2006   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

can you copyright just a few lines of code? its probablly just an implementation of something discovered 200 years ago, if not longer ago.

 User Rating: 975   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Is it Newton's Method you want to copywrite?

 User Rating: 1021   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Yes, I believe both versions (Q3 and Doom) use Newton's method. However, the Doom version uses some intricate and specific bit manipulation using lookup tables to find the seed. That's what I'm suspecting may be copyrighted.

 User Rating: 2006   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

I did a benchmark on various float-type sqrt routines, and the fastest one was the 'sqrtf' in MSVC's standard library (equal to inline assembly). The post is here somewhere on gamedev, but I have no idea where.

The only way you'll get speedup is using the inverse square root formula to actually calculate the inverse square root, and iirc the win was barely one.

Well, you could also get a win using one of the functions that returns much less precision than sqrtf, but in that case you're not really comparing like things.

 User Rating: 1870   |  Rate This User  Send Private MessageView ProfileView JournalView GD Showcase Entries Report this Post to a Moderator | Link

There's a really cool paper about this fast sqrt trick, where everything is derived. It's from the thread posted here by Nice Coder.

invsqrt

 User Rating: 1060   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Isn't there a SSE sqrt instruction? Anyone know how that compares?

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I'm going to close the thread, since the topic has been discussed in immense detail throughout the archives.

In addition to the other thread created within the past few days on the same subject, I found many, many, MANY discussions on fast sqt dating back to 2001. I don't see much new in this thread. If anyone has a compelling argument why the thread should remain open, please send me a private message and make your case strongly. The topic of fast sqrt has been covered to death and I will double check any argument to reopen the thread against the archives to see if the argument holds water.

Graham Rhodes
Moderator, Math & Physics forum @ gamedev.net

 User Rating: 1796   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

OK,

I reopened the thread just to post this example from superpig. It seems like a good contribution that may be useful. The thread is closing back immediately.

Quote:
From superpig via private message
SSE has both SQRT and RSQRT instructions, but you don't really get much benefit unless you're doing four of them at once. Say you want to get the lengths of four vectors, stored as a structure of arrays:

__declspec(align(16)) struct blockOfFourVectors
{
float xValues[4];
float yValues[4];
float zValues[4];
float lengths[4];
}

blockOfFourVectors data;

__asm
{
 movaps xmm0, [data + 0x00] ; load x components
 movaps xmm1, [data + 0x10] ; load y components
 movaps xmm2, [data + 0x20] ; load z components
 
 ; square each component
 mulaps xmm0, xmm0
 mulaps xmm1, xmm1
 mulaps xmm2, xmm2

 ; sum them into xmm0
 addps xmm0, xmm1
 addps xmm0, xmm2
 
 ; sqrt to get length
 sqrtps xmm0, xmm0

 ; save out
 movaps [data + 0x30], xmm0
}


That would calc all four lengths into data.lengths. For just a single vector it's not really worthwhile (and unless you store that vector in a SoA, would require a load of shuffling).




Graham Rhodes
Moderator, Math & Physics forum @ gamedev.net

 User Rating: 1796   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

All times are ET (US)

Post Reply
 Last Thread Next Thread 
Forum Rules:
You may not post new threads
You may not post replies
You may not edit your posts
You may not use HTML in your posts
Jump To:
Administrative Options: