#### Archived

This topic is now archived and is closed to further replies.

# SQRT(); Another Option? give me your ideas....

## Recommended Posts

so the thread starter says it all. let me elaborate a moment. you need to find the distance between 2 points. lets say an asteroid and a ship. to get this distance(probably to check for some sort of collision) you need to use the sqrt like so.. sqrt(x*x, y*y) - 2d check sqrt(x*x, y*y, z*z) - 3d check heres the problem and i have seen this happen and have witnessed it first hand... the sqrt functoin takes up anywhere from 60 - 80 CPU cycles. this is incredibly SLOW! probably about 55 times too slow ifyou ask me. so i ask you, whats the best option? yes there are a few and i have my own idea, such as doing fast addition...but i was wondering if anyone else had a similar or better approach. one CPU cycle really isnt that big of a deal. but at 60 FPS or more you are eating away at your CPU. thanks all in advance for your ideas! --guru

##### Share on other sites
Much of the time you don''t actually need to do the square root in distance calculations. For example, collision detection is usually handled by comparing the computed distance against a known threshold. In those cases you can instead compare the computed square distance against the threshold squared.

##### Share on other sites
isqrt.h:
#ifndef __ISQRT_H#define __ISQRT_H#ifdef __cplusplusextern "C" {#endifunsigned short isqrt(unsigned long a);unsigned short iisqrt(unsigned long a);unsigned short ihypot (unsigned long dx, unsigned long dy);#ifdef __cplusplus}#endif#endif

isqrt.c:
#include "isqrt.h"unsigned short isqrt(unsigned long a){unsigned long rem = 0;unsigned long root = 0;int i;for (i = 0; i < 16; i++) {   root <<= 1;   rem = ((rem << 2) + (a >> 30));   a <<= 2;   root++;   if (root <= rem) {      rem -= root;      root++;      }   else      root--;  }return (unsigned short) (root >> 1);}unsigned short ihypot (unsigned long dx, unsigned long dy){return isqrt (dx * dx + dy * dy);}unsigned short iisqrt(unsigned long a){unsigned long rem = 0;unsigned long root = 0;unsigned long divisor = 0;int i;for (i = 0; i < 16; i++) {   root <<= 1;   rem = ((rem << 2) + (a >> 30));   a <<= 2;   divisor = (root << 1) + 1;   if (divisor <= rem) {      rem -= divisor;      root++;      }   }return (unsigned short)(root);}

##### Share on other sites
What about the sin() and cos() functions?
Any idea to speed these up?

##### Share on other sites
You will not really need sqrt unless you want to normalize vectors.
For this, there''s a pretty good approximation of 1/sqrt(x) using the following function :

float inv_sqrt(float x) {
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}

SaM3d!, a cross-platform API for 3d based on SDL and OpenGL.
The trouble is that things never get better, they just stay the same, only more so. -- (Terry Pratchett, Eric)

##### Share on other sites

if not, take either the red pill, or google...

##### Share on other sites
sin() and cos() basically use Newton's Method approximation (look it up).

So if you don't want to do the look up table

You can do this yourself and do the newton method to a lesser degree of precision, thus less calculations. (of course to get max performance you would do this with assembly)

EDIT: misspelled newton

[edited by - snisarenko on May 25, 2004 4:15:20 PM]

##### Share on other sites
i know assembly would be faster.. just b/c its right there next to the CPU and you dont have to worry about buses, but in a couple of years we are looking at 5 - 7 Ghz machines.. the CPU''s keep getting faster and faster..i mean when will it be worth anyones time to even use assembly?

##### Share on other sites
Currently, the new Prescott (or was it Northwood?) chipset supports up to 6 GHz processors.

Also, most game consoles have processors < 1 GHz. That''s where you would use assembly. Processor power should also not be measured in clock speeds alone.

##### Share on other sites
erm... you still have to worry bout the bus when writing asmembly. Infact your c compiler complies to bytecode, where your asmembly compiler also compiles to bytecode. it''s just humans can usually code better then a compiler can if they really work at it.

The xbox has a 1.25 ghz processor, yet the gamecube sometimes looks better. you see, the speed of the processor does have a inpact, but so does everything else. I think it''s fine to use asm on a 200 ghz machine if the task calls for it.

##### Share on other sites
quote:
I think it''s fine to use asm on a 200 ghz machine if the task calls for it.

don''t think you''ll ever have to worry about that. any processor above 28 or 29GHz would violate the constraints of special realitivity, hence be an impossibility

##### Share on other sites
quote:
Original post by OpenGL_Guru
i know assembly would be faster.. just b/c its right there next to the CPU and you dont have to worry about buses, but in a couple of years we are looking at 5 - 7 Ghz machines.. the CPU''s keep getting faster and faster..i mean when will it be worth anyones time to even use assembly?

Well that isn''t exactly true. Your c++ code is converted to assembly via your compiler. You will find that for fairly complex things your c code will be fasters as the compilers know all the optimizations that need to be made... eg when to roll out loops.

##### Share on other sites
i think more processor power should be used for better ai, more realistic physics, and better grafic, and not just to clean up the asses of some fools, who are to stupid to implement a lookup, or write 20 lines of assambly language...

edit: damn english^^

[edited by - fooman on May 26, 2004 3:36:32 AM]

##### Share on other sites
For modern GPUs with their complex architecture, deep pipelines, branch prediction, conditional execution, parallelism etc. it is very hard to hand-code better assembly code than a good optimizing compiler will produce, whos optimizer has been built by people doing nothing but examine the target CPUs workings and squeeze every ns from the resulting code, and who have a lot of theory and tools at hand for this.

A simple example: On a modern CPU, it can be advantageous to chose a slower instruction for a certain operation to allow for parallelizing some code and avoid pipeline stalls. Few people have such intimate knowledge of the CPU''s they''re coding for workings.

I had been hand-coding critical parts of my applications in assembly language myself and had given up years ago when what the compiler produced was about 10% faster than my code ...

##### Share on other sites
There is a lot of information about sine approximation at the google discussion boards: http://groups.google.nl/groups?hl=nl&lr=&ie=UTF-8&q=cosine+approximation, there are probably a lot of other scientific pages on this subject as cos and sin are really slow and are used a lot.

##### Share on other sites
Just as a side note, XBox does not have a 1.25 GHz processor. It''s actually about 700-800 MHz.

##### Share on other sites
I''m kind of a newbie at programming (KIND OF???), but wouldn''t a binomial expansion work?

Search for square root in this document:

http://www.krysstal.com/binomial.html

If the number you must square root isn''t too big, it''s probably a good appox with maybe 5 or so terms.

##### Share on other sites
For sine and cosine, use lookup tables (I used to use those way back when I did programming in BASIC and needed a speed boost). Binomial expansions aren't worth it because well the cure would be slower than the disease: have you looked at binomial expansions lately? those x^n terms can get pretty costly (x^n is O(x^(n+1)). Why do you need to speed up the square root function, exactly? I can't imagine that is the limiting factor in speed in your program. You may want to consider lookup tables too there, but whatever haha it all sounds a bit weird to me.

[edited by - uber_n00b on May 27, 2004 12:15:07 AM]

##### Share on other sites
well even a simple game as asteroids or a much more complicated game that has to find the distance between 2 objects or 2 items or whatever every frame and update can be costly. taking up 70 - 80 CPU cycles instead of 5 - 10 every frame adds up

##### Share on other sites
a lot time ago, i solved the sin/cos problem with something like this:

first, two vectors:

float senos[360]; // cause i speak spanish
float cosenos[360];

cause i didnt need an excesive precision, i just
calculate the sin and cos, for integers
then when i need a sin or a cos, instead of calculating in real time, i just extractit from the vector

position_x = senos[angulo];

and so...

Aquellos que crean que algo es imposible no deben ponerse en el camino de los que estan dispuestos a intentarlo.
MEGANUKE

##### Share on other sites
First off, I would like to ask isn''t senos plural for breast? I guess it doubles as sine. Also, putting sine and cosine into arrays like that is pretty much making sine/cosine tables

##### Share on other sites
Lookup tables is a bad idea. It will be slower. SQRT is faster than memory access anyway (especially if that memory location is not cached).

##### Share on other sites
yes, senos is plural for breast, but also for sin

it was a long time ago when i wrote that program
so you should be right, it must be double

i dont know today, but back then sqrt function was
waaaaaaaaay slower than memory access

it was just an idea, you could do better things using pointers
or real time calculations using <<, >>, + and -
or etc...

greets

##### Share on other sites
Some time ago the CPUs were slow, so they didn''t have to wait too much for the memory. Nowdays, the CPUs are much faster than the memory, so they have to wait a lot for the memory to give them the data. A SQRTF instruction is executed in the FPU internal registers, so it''s fast.

##### Share on other sites
I tried an sqrt approximation yesterday, using lookup tables and interpolating between the closest elements it's a lot slower, when i just did this:

double sqrta(double d)
{
if(d>=max_sqrt){
return sqrt(d);
};
return squareroots[unsigned int((APPROX_SQRT_ELEMENTS/max_sqrt)*d)];
}

It was about as fast as the normal sqrt, however the function returns a pretty bad approximation of sqrt(d), this is quite useless.

[edited by - Tree Penguin on May 30, 2004 6:31:46 AM]

• ### Forum Statistics

• Total Topics
628375
• Total Posts
2982310

• 10
• 9
• 14
• 24
• 11