Fast Square Root For Distance Calculations?

Started by
12 comments, last by Kylotan 7 years, 8 months ago

I am facing a problem that calculating length of many vectors leads to huge performance drops and I am pretty much sure that the main reason is the square root algorithm that the programming language uses. What I thought - is there any faster, yet accurate approximation that I could use for the vector length calculation? Either faster square root equation or whole different vector length calculation.

Thanks!

Advertisement

Do you need the length? Often you can make do without, when comparing distances for example you can instead use the squared distance.


// These are the same
if(vec.Length() < 5)
   ...
if(vec.SquaredLength() < 5*5)
   ...

A little bit of a proof:

sqrt(x^2+y^2+z^2) < 5

square both sides and you get

x^2+y^2+z^2 < 5^2

There may be times when you know what the distance will be just by how you have initialised and modified your vectors, in that case you can just divide by the known distance rather than having to work it out first.

I don't know of a faster square root algorithm but I believe there are some, I've certainly seen mention a few times but haven't investigated myself to really know what they are. The best choice is to just avoid using it where possibly but that's not always a choice.

Interested in Fractals? Check out my App, Fractal Scout, free on the Google Play store.

Depends on what you need, exact length (slowest) or just some length (use squared length) or just a comparison of what's closer.
There's this classic: https://en.wikipedia.org/wiki/Fast_inverse_square_root
Also Manhattan and Chebyshev's distances might be of interest.

I am pretty much sure that the main reason is the square root algorithm that the programming language uses.

Which language? Most square roots will go direct to the processor these days, rather than be implemented via a software algorithm. This means it's unlikely you're going to get a faster implementation that is accurate.

If you have space, you might consider calculating and caching vector length whenever the vector value changes, making accessing it a O(1) operation - useful if you change the vector infrequently but measure them often.
This kind of stuff was essential in the 90's: https://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/

But as above: I'd be extremely sceptical of a situation where a square root was a big performance issue on a modern CPU / in the distance test case rearrange the math to deal with squared values :)
Moving the floating point data required to compute the square root from RAM to the CPU is probably slower than actually computing it!

Hello!

Thanks for answers. Well, I don't need to compare 2 distances so the squared distance won't work. It turns out that square root wasn't my problem. I have the vector length calculated in a for loop together with few more(but actually lighter) equations. I was surprised how 1500 calculations in loop like this and a target game frequency of 500 Hz(100 Hz game frequency and 5 sub-steps) ended up in 20 FPS. That's incredibly bad result(especially knowing that I have decent laptop CPU(2.3 GHz, 4 cores). I am using Python and after a few analysis I found the problem - it turns out that Python is incredibly bad at for loops, like 50 times slower that C++ or Java. Hm... I didn't expect this. I am wondering - can I somehow optimize Python for loops and/or class access?

This isn't that important anyways as I'm using Python only for test stage(as it's incredibly easy to code in Python), I'll switch to C++ after.

You started with an instance of XY problem and now you have changed your question into something that I will call the XZ problem. How about you just tell us what it is you are trying to accomplish, what you have tried so far and how it has failed?

Well, it's working. I got a Python script that does mathematical calculations with use of vectors(but no matrices or any other similars). It does it all in loops. I write it in OOP. Somehow it seems that I got performance issues, but after checking my code several times I find it being clean and well-optimized. I read at some resources that loops in Python are incredibly slow. I just wanna get confirmation for this and, if possible, some ways to speed it up.

One way to speed these things up in Python is to use numpy:

import numpy as np

vectors = np.random.rand(1500, 3)
lengths = np.linalg.norm(vectors, axis=1)

Voila, no (Python) loops.

Python is not the optimal language for CPU-intensive operations because it intrinsically has layers of indirection between the language and the processor - the bytecode interpreter, the C objects representing your variables, the system calls implementing those objects, etc.

Therefore optimising Python is often a case of reducing the number of Python operations and replacing them with C operations. This can be done a few ways:

1) Replacing algorithms implemented in Python with standard library or built-in functions. For example, sometimes an explicit loop can be replaced with a call to the map() function - this moves the looping part into C code, and therefore is sometimes faster.
2) 3rd party C libraries like numpy (see above). For heavy mathematical lifting this is the number 1 choice.
3) Replacing the standard Python interpreter with one that compiles down to machine code. PyPy is the main choice here. Note that some extensions won't work with it.

4) Creating extensions in C that your Python program can use. You can make a module in C that Python can import and it'll usually run a lot faster than the equivalent Python code. Cython can help you a lot here.

5) Ditching Python entirely in favour of a faster language. Python is great but sometimes it's the wrong tool for the job. This might be a good time to consider whether you're using the right language. Consider also the target platforms and deployment method - Python is not great for mobile, won't run on the web, and deployment is a mess.

This topic is closed to new replies.

Advertisement