Are square roots still really that evil?

Started by
12 comments, last by alvaro 9 years, 11 months ago

@Ohforf sake: thanks, I made a 2nd CoordToCoord distance function, now for quared distance:


float CoordToCoordDistSqr(const D3DXVECTOR3 &pv1, const D3DXVECTOR3 &pv2)
{
	return D3DXVec3LengthSq(&D3DXVECTOR3(pv1-pv2));
}

To be sure I use any possible optimizations, I also changed the non-squared version:


float CoordToCoordDist(const D3DXVECTOR3 &pv1, const D3DXVECTOR3 &pv2)
{
	return D3DXVec3Length(&D3DXVECTOR3(pv1 - pv2));
}

Now that's done, I'll go through my code where I call the CoordToCoord distance function and see what I compare the result too. For example radius of a sphere I can do "radius*radius" like you said. The same probably goes for checking distance between mesh/ renderable center's and point lights, versus point light radius (radius would be radius*radius then).

Thanks for the help.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement

1. My CoordToCoord distance function:

float CoordToCoordDist(const D3DXVECTOR3 pv1, const D3DXVECTOR3 pv2)
{
return sqrt(pow(pv1.x - pv2.x, 2) + pow(pv1.y - pv2.y, 2) + pow(pv1.z - pv2.z, 2));
}

OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.
Just do
float tmp = pv1.x - pv2.x;
tmp = tmp * 2; //or tmp = tmp + tmp;
Whether "tmp * 2" is better than "tmp + tmp" depends on the architecture you're running. On one hand, you've got addition vs multiplication, and often addition has lower latency than multiplication. On the other hand, the multiplication is against a constant value, and some archs may have special optimizations for that (i.e. custom opcodes, better pipelining). However both of them will be a zillion times better than a pow( tmp, 2 ).

Second, to answer the OP; like others have said, work smarter (i.e. don't use sqrt if it's unnecessary); but if you're curious, yes sqrt has gotten faster; but more importantly CPUs have gotten better at hiding the latency (this is called pipelining: executing instructions that come after and don't depend on the sqrt's result, while this sqrt hasn't finished yet). Tricks like the famous Carmack's sqrt "fast approximation" actually hurt performance in today's hardware (because they tend to hinder pipelining, or involve RAM roundtrips, and ALU has gotten faster, but memory latency hasn't changed much in the last 10 years).


OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.
Just do
float tmp = pv1.x - pv2.x;
tmp = tmp * 2; //or tmp = tmp + tmp;Whether "tmp * 2" is better than "tmp + tmp" depends on the architecture you're running.

I'm afraid this is not correct, shouldn't it be tmp * tmp?

(tmp * 2 would only work if it were always 2 :))

I now have 2 coord to coord distance functions, one squared and one non-squared.

Next step is going through my codebase and see where I can use the squared one and multiply the other variable by itself (that or saving the original value squared, the 2nd doesn't sound that good because I would then have to keep track of this always and rename all member vars).

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.


I don't encourage people relying on compiler optimizations too much, but gcc optimizes calls to pow where the exponent is a positive integer, turning the computation into a sequence of multiplies. I don't know if Visual C++ would do the same.

This topic is closed to new replies.

Advertisement