Archived

This topic is now archived and is closed to further replies.

OpenGL_Guru

SQRT(); Another Option? give me your ideas....

Recommended Posts

so the thread starter says it all. let me elaborate a moment. you need to find the distance between 2 points. lets say an asteroid and a ship. to get this distance(probably to check for some sort of collision) you need to use the sqrt like so.. sqrt(x*x, y*y) - 2d check sqrt(x*x, y*y, z*z) - 3d check heres the problem and i have seen this happen and have witnessed it first hand... the sqrt functoin takes up anywhere from 60 - 80 CPU cycles. this is incredibly SLOW! probably about 55 times too slow ifyou ask me. so i ask you, whats the best option? yes there are a few and i have my own idea, such as doing fast addition...but i was wondering if anyone else had a similar or better approach. one CPU cycle really isnt that big of a deal. but at 60 FPS or more you are eating away at your CPU. thanks all in advance for your ideas! --guru

Share this post


Link to post
Share on other sites
Much of the time you don''t actually need to do the square root in distance calculations. For example, collision detection is usually handled by comparing the computed distance against a known threshold. In those cases you can instead compare the computed square distance against the threshold squared.

Share this post


Link to post
Share on other sites
isqrt.h:
#ifndef __ISQRT_H
#define __ISQRT_H

#ifdef __cplusplus
extern "C" {
#endif

unsigned short isqrt(unsigned long a);
unsigned short iisqrt(unsigned long a);
unsigned short ihypot (unsigned long dx, unsigned long dy);

#ifdef __cplusplus
}
#endif
#endif


isqrt.c:
#include "isqrt.h"

unsigned short isqrt(unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
int i;

for (i = 0; i < 16; i++) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
root++;
if (root <= rem) {
rem -= root;
root++;
}
else
root--;
}
return (unsigned short) (root >> 1);
}

unsigned short ihypot (unsigned long dx, unsigned long dy)
{
return isqrt (dx * dx + dy * dy);
}

unsigned short iisqrt(unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
unsigned long divisor = 0;
int i;

for (i = 0; i < 16; i++) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
divisor = (root << 1) + 1;
if (divisor <= rem) {
rem -= divisor;
root++;
}
}
return (unsigned short)(root);
}

Share this post


Link to post
Share on other sites
You will not really need sqrt unless you want to normalize vectors.
For this, there''s a pretty good approximation of 1/sqrt(x) using the following function :

float inv_sqrt(float x) {
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}



SaM3d!, a cross-platform API for 3d based on SDL and OpenGL.
The trouble is that things never get better, they just stay the same, only more so. -- (Terry Pratchett, Eric)

Share this post


Link to post
Share on other sites
sin() and cos() basically use Newton's Method approximation (look it up).

So if you don't want to do the look up table

You can do this yourself and do the newton method to a lesser degree of precision, thus less calculations. (of course to get max performance you would do this with assembly)

EDIT: misspelled newton

[edited by - snisarenko on May 25, 2004 4:15:20 PM]

Share this post


Link to post
Share on other sites
i know assembly would be faster.. just b/c its right there next to the CPU and you dont have to worry about buses, but in a couple of years we are looking at 5 - 7 Ghz machines.. the CPU''s keep getting faster and faster..i mean when will it be worth anyones time to even use assembly?

Share this post


Link to post
Share on other sites
Currently, the new Prescott (or was it Northwood?) chipset supports up to 6 GHz processors.

Also, most game consoles have processors < 1 GHz. That''s where you would use assembly. Processor power should also not be measured in clock speeds alone.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
erm... you still have to worry bout the bus when writing asmembly. Infact your c compiler complies to bytecode, where your asmembly compiler also compiles to bytecode. it''s just humans can usually code better then a compiler can if they really work at it.

The xbox has a 1.25 ghz processor, yet the gamecube sometimes looks better. you see, the speed of the processor does have a inpact, but so does everything else. I think it''s fine to use asm on a 200 ghz machine if the task calls for it.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
I think it''s fine to use asm on a 200 ghz machine if the task calls for it.

don''t think you''ll ever have to worry about that. any processor above 28 or 29GHz would violate the constraints of special realitivity, hence be an impossibility

Share this post


Link to post
Share on other sites
quote:
Original post by OpenGL_Guru
i know assembly would be faster.. just b/c its right there next to the CPU and you dont have to worry about buses, but in a couple of years we are looking at 5 - 7 Ghz machines.. the CPU''s keep getting faster and faster..i mean when will it be worth anyones time to even use assembly?


Well that isn''t exactly true. Your c++ code is converted to assembly via your compiler. You will find that for fairly complex things your c code will be fasters as the compilers know all the optimizations that need to be made... eg when to roll out loops.


Share this post


Link to post
Share on other sites
i think more processor power should be used for better ai, more realistic physics, and better grafic, and not just to clean up the asses of some fools, who are to stupid to implement a lookup, or write 20 lines of assambly language...

edit: damn english^^

[edited by - fooman on May 26, 2004 3:36:32 AM]

Share this post


Link to post
Share on other sites
For modern GPUs with their complex architecture, deep pipelines, branch prediction, conditional execution, parallelism etc. it is very hard to hand-code better assembly code than a good optimizing compiler will produce, whos optimizer has been built by people doing nothing but examine the target CPUs workings and squeeze every ns from the resulting code, and who have a lot of theory and tools at hand for this.

A simple example: On a modern CPU, it can be advantageous to chose a slower instruction for a certain operation to allow for parallelizing some code and avoid pipeline stalls. Few people have such intimate knowledge of the CPU''s they''re coding for workings.

I had been hand-coding critical parts of my applications in assembly language myself and had given up years ago when what the compiler produced was about 10% faster than my code ...

Share this post


Link to post
Share on other sites
I''m kind of a newbie at programming (KIND OF???), but wouldn''t a binomial expansion work?

Search for square root in this document:

http://www.krysstal.com/binomial.html

If the number you must square root isn''t too big, it''s probably a good appox with maybe 5 or so terms.

Share this post


Link to post
Share on other sites
For sine and cosine, use lookup tables (I used to use those way back when I did programming in BASIC and needed a speed boost). Binomial expansions aren't worth it because well the cure would be slower than the disease: have you looked at binomial expansions lately? those x^n terms can get pretty costly (x^n is O(x^(n+1)). Why do you need to speed up the square root function, exactly? I can't imagine that is the limiting factor in speed in your program. You may want to consider lookup tables too there, but whatever haha it all sounds a bit weird to me.

[edited by - uber_n00b on May 27, 2004 12:15:07 AM]

Share this post


Link to post
Share on other sites
well even a simple game as asteroids or a much more complicated game that has to find the distance between 2 objects or 2 items or whatever every frame and update can be costly. taking up 70 - 80 CPU cycles instead of 5 - 10 every frame adds up

Share this post


Link to post
Share on other sites
a lot time ago, i solved the sin/cos problem with something like this:

first, two vectors:

float senos[360]; // cause i speak spanish
float cosenos[360];

cause i didnt need an excesive precision, i just
calculate the sin and cos, for integers
then when i need a sin or a cos, instead of calculating in real time, i just extractit from the vector

position_x = senos[angulo];

and so...

Aquellos que crean que algo es imposible no deben ponerse en el camino de los que estan dispuestos a intentarlo.
MEGANUKE

Share this post


Link to post
Share on other sites
First off, I would like to ask isn''t senos plural for breast? I guess it doubles as sine. Also, putting sine and cosine into arrays like that is pretty much making sine/cosine tables

Share this post


Link to post
Share on other sites
yes, senos is plural for breast, but also for sin

it was a long time ago when i wrote that program
so you should be right, it must be double

i dont know today, but back then sqrt function was
waaaaaaaaay slower than memory access

it was just an idea, you could do better things using pointers
or real time calculations using <<, >>, + and -
or etc...

greets

Share this post


Link to post
Share on other sites
Some time ago the CPUs were slow, so they didn''t have to wait too much for the memory. Nowdays, the CPUs are much faster than the memory, so they have to wait a lot for the memory to give them the data. A SQRTF instruction is executed in the FPU internal registers, so it''s fast.

Share this post


Link to post
Share on other sites
I tried an sqrt approximation yesterday, using lookup tables and interpolating between the closest elements it's a lot slower, when i just did this:

double sqrta(double d)
{
if(d>=max_sqrt){
return sqrt(d);
};
return squareroots[unsigned int((APPROX_SQRT_ELEMENTS/max_sqrt)*d)];
}

It was about as fast as the normal sqrt, however the function returns a pretty bad approximation of sqrt(d), this is quite useless.


[edited by - Tree Penguin on May 30, 2004 6:31:46 AM]

Share this post


Link to post
Share on other sites