Just delete the post since the community seems to think it's useless.
It's kind of important that you realize why they might think that. From a purely mathematical sense you should be starting in 2D and not 3D. Ideally any normalization algorithm created in 2D will expand to 3D or so I believe.
Also here's a paste of your code running on 100 random vectors. Your algorithm doesn't work at all.
If we exclude first and last instructions (which move data from and into memory) actual calculation is just 10 operations. Accuracy isn't an issue either.
Not tested, and I'm not 100% sure the immediate value for dpps is right.
Should be right. Didn't knew there's dot instruction. Sadly my CPU doesn't have SSE4
This has to be a troll at this point, right? Please let this be a troll...
No it's not troll. Just wondering why FPU if there's SSE versions. It's probably even easier, like Tachikoma suggested, it can fit in one line
__m128 normalize(const __m128 vector) {
return _mm_div_ps(vector, _mm_sqrt_ps(_mm_dp_ps(vector, vector, 0xFF))); // on second thought it might be FF, not CC
}
As others have already mentioned in this thread, that algorithm is no longer practical on a modern architecture. But it remains as a mathematical curiously none the less.
Maybe a better read: http://assemblyrequi...ng-square-root/ and http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/