• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
cozzie

Are square roots still really that evil?

13 posts in this topic

Was just wondering, a lot of topics, replies and articles state that you should prevent using square roots in your equations 'with all cost'. Of course 'all cost' in my opinion depends on what you have to do to prevent using the sqrt.

With today's hardware, is it still reasonable to believe that 3 to 6 multiplications and value assignments are still cheaper then 1 sqrt? (of course profiling would tell, but I'm just curious about experience and opinions)

Examples I'm talking about are mainly distance comparisions, i.e. on the CPU (point to point distance, point in sphere check etc.) but also on the GPU side (i.e for light attenuation).
0

Share this post


Link to post
Share on other sites

Its not that evil but the fastest code is the code that you never have to run.  So if you don't need to do a sqrt then don't do it.

2

Share this post


Link to post
Share on other sites

Normalization involves a sqrt, so since normalization is used so much, you can safely assume that GPUs are optimized for it.

 

Of course, using the built-in normalize instruction rather than writing your own normalization would be advised to take advantage of where it may be implemented directly in the hardware.

 

As the others said, for distance comparisons/etc, where you can get away without doing a sqrt then do so.

2

Share this post


Link to post
Share on other sites

GPUs have a sqrt instruction which is a single instruction (edit: it's not just 1 cycle I think), so taking the x1*x1 + y1*y1 ...> x2*x2 + y2*y2 ... comparison can actually end up being slower than just doing sqrt(vecn(...),vecn(...0). 

 

CPUs have a similar instruction as well, but it's (afaik) SIMD only. However, it wouldn't surprise me if compilers implemented std::sqrt simply by calling that simd sqrt.

 

edit: yes, seems I was wrong. Thanks for correcting me.

Edited by agleed
-2

Share this post


Link to post
Share on other sites

Thanks, this gives a good view of what (not to) do.

I'll keep in mind that every sqrt (or anything else :)) that isn't really necessary, shouldn't be done at all.

 

Two examples:

 

1. My CoordToCoord distance function:

float CoordToCoordDist(const D3DXVECTOR3 pv1, const D3DXVECTOR3 pv2)
{
	return sqrt(pow(pv1.x - pv2.x, 2) + pow(pv1.y - pv2.y, 2) + pow(pv1.z - pv2.z, 2));
}

How would you do that without the sqrt?

 

2. Point in sphere.

I currently save the radius of my bounding spheres as 'normal' radius. I take the CoordToCoord distance from world center of the sphere to the point I'm checking. This distance I compare to the radius. That would basically be solved if the above CoordToCoord distance function returns the squared distance. In that I could initially take the squared radius of the sphere and save that (and when updating also keep the squared radius).

 

Note: in my shaders I don't use any sqrt at the moment, I'll ook into how I do my attenuation at the moment.

Of course there are some normalizations in my VS/PS, which I think are needed (and cannot be done without a square root).

0

Share this post


Link to post
Share on other sites

GPUs have a sqrt instruction which is a single instruction (edit: it's not just 1 cycle I think), so taking the x1*x1 + y1*y1 ...> x2*x2 + y2*y2 ... comparison can actually end up being slower than just doing sqrt(vecn(...),vecn(...0). 
 
CPUs have a similar instruction as well, but it's (afaik) SIMD only. However, it wouldn't surprise me if compilers implemented std::sqrt simply by calling that simd sqrt.

 

GPUs have an approximate rsqrt and an approximate rcp function similar to the SIMD counterparts in my last post. They do not have a vector->length function which performs the squaring and adding of the components in addition to the sqrt in one instruction. So everything that was said for the CPU pretty much also holds for the GPU.
 
 

Thanks, this gives a good view of what (not to) do.
I'll keep in mind that every sqrt (or anything else smile.png) that isn't really necessary, shouldn't be done at all.
 
Two examples:
 
1. My CoordToCoord distance function:

float CoordToCoordDist(const D3DXVECTOR3 pv1, const D3DXVECTOR3 pv2)
{
	return sqrt(pow(pv1.x - pv2.x, 2) + pow(pv1.y - pv2.y, 2) + pow(pv1.z - pv2.z, 2));
}

How would you do that without the sqrt?
 
2. Point in sphere.
I currently save the radius of my bounding spheres as 'normal' radius. I take the CoordToCoord distance from world center of the sphere to the point I'm checking. This distance I compare to the radius. That would basically be solved if the above CoordToCoord distance function returns the squared distance. In that I could initially take the squared radius of the sphere and save that (and when updating also keep the squared radius).
 
Note: in my shaders I don't use any sqrt at the moment, I'll ook into how I do my attenuation at the moment.
Of course there are some normalizations in my VS/PS, which I think are needed (and cannot be done without a square root).


Are you sure that pow(a, 2) is reduced to a*a and not exp(log(a)+2) which is significantly more expensive?

Also sqrt and pow are the double precision functions. The float functions are sqrtf and powf and if you want the compiler to decide based on the parameters then it is std::sqrt and std::pow.

Even if you don't store the squared radius, computing it is faster then computing the square root. You should have a sqrLength or a dot function so you get the squared distance as sqrDistance = (vec1-vec2).SqrLength(); Then you check (sqrDistance < radius*radius). No need for sqrt.

Edit: And you should pass vectors per reference, not per value! Edited by Ohforf sake
2

Share this post


Link to post
Share on other sites

GPUs have a sqrt instruction which is a single instruction (edit: it's not just 1 cycle I think), so taking the x1*x1 + y1*y1 ...> x2*x2 + y2*y2 ... comparison can actually end up being slower than just doing sqrt(vecn(...),vecn(...0). 

 

...good job they've also got a dot product instruction then...!

 

(dot (v1, v1) > dot (v2, v2))

Edited by mhagain
1

Share this post


Link to post
Share on other sites

@Ohforf sake: thanks, I made a 2nd CoordToCoord distance function, now for quared distance:

float CoordToCoordDistSqr(const D3DXVECTOR3 &pv1, const D3DXVECTOR3 &pv2)
{
	return D3DXVec3LengthSq(&D3DXVECTOR3(pv1-pv2));
}

To be sure I use any possible optimizations, I also changed the non-squared version:

float CoordToCoordDist(const D3DXVECTOR3 &pv1, const D3DXVECTOR3 &pv2)
{
	return D3DXVec3Length(&D3DXVECTOR3(pv1 - pv2));
}

Now that's done, I'll go through my code where I call the CoordToCoord distance function and see what I compare the result too. For example radius of a sphere I can do "radius*radius" like you said. The same probably goes for checking distance between mesh/ renderable center's and point lights, versus point light radius (radius would be radius*radius then).

 

Thanks for the help.

0

Share this post


Link to post
Share on other sites

1. My CoordToCoord distance function:

float CoordToCoordDist(const D3DXVECTOR3 pv1, const D3DXVECTOR3 pv2)
{
return sqrt(pow(pv1.x - pv2.x, 2) + pow(pv1.y - pv2.y, 2) + pow(pv1.z - pv2.z, 2));
}

OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.
Just do
float tmp = pv1.x - pv2.x;
tmp = tmp * 2; //or tmp = tmp + tmp;
Whether "tmp * 2" is better than "tmp + tmp" depends on the architecture you're running. On one hand, you've got addition vs multiplication, and often addition has lower latency than multiplication. On the other hand, the multiplication is against a constant value, and some archs may have special optimizations for that (i.e. custom opcodes, better pipelining). However both of them will be a zillion times better than a pow( tmp, 2 ).

Second, to answer the OP; like others have said, work smarter (i.e. don't use sqrt if it's unnecessary); but if you're curious, yes sqrt has gotten faster; but more importantly CPUs have gotten better at hiding the latency (this is called pipelining: executing instructions that come after and don't depend on the sqrt's result, while this sqrt hasn't finished yet). Tricks like the famous Carmack's sqrt "fast approximation" actually hurt performance in today's hardware (because they tend to hinder pipelining, or involve RAM roundtrips, and ALU has gotten faster, but memory latency hasn't changed much in the last 10 years). Edited by Matias Goldberg
0

Share this post


Link to post
Share on other sites


OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.
Just do
float tmp = pv1.x - pv2.x;
tmp = tmp * 2; //or tmp = tmp + tmp;Whether "tmp * 2" is better than "tmp + tmp" depends on the architecture you're running.

 

I'm afraid this is not correct, shouldn't it be tmp * tmp?

(tmp * 2 would only work if it were always 2 :))

 

I now have 2 coord to coord distance functions, one squared and one non-squared.

Next step is going through my codebase and see where I can use the squared one and multiply the other variable by itself (that or saving the original value squared, the 2nd doesn't sound that good because I would then have to keep track of this always and rename all member vars).

0

Share this post


Link to post
Share on other sites

OMG! You have three pow calls and you worry about a sqrt?! A pow is significantly more expensive.


I don't encourage people relying on compiler optimizations too much, but gcc optimizes calls to pow where the exponent is a positive integer, turning the computation into a sequence of multiplies. I don't know if Visual C++ would do the same.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0