• Create Account

## Normalization Approximation in 30 operations

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

27 replies to this topic

### #21Sirisian  Members

Posted 21 January 2012 - 01:34 AM

Just delete the post since the community seems to think it's useless.

It's kind of important that you realize why they might think that. From a purely mathematical sense you should be starting in 2D and not 3D. Ideally any normalization algorithm created in 2D will expand to 3D or so I believe.

Also here's a paste of your code running on 100 random vectors. Your algorithm doesn't work at all.

### #22Ripiz  Members

Posted 21 January 2012 - 03:53 AM

What's wrong with SSE?
__declspec(noinline) void normalize(__m128 &col0) { // not inlined for testing purpose
__m128 dot = _mm_mul_ps(col0, col0);
dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));
col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));
}


Original topic is about normalization in 30 operations, according to disassembly it's just 12 operations
__declspec(noinline) void normalize(__m128 &col0) {
__m128 dot = _mm_mul_ps(col0, col0);
012D1340  movaps	  xmm1,xmmword ptr [eax]
012D1343  movaps	  xmm2,xmm1
012D1346  mulps	   xmm2,xmm1
dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));
012D1349  movaps	  xmm0,xmm2
012D134C  shufps	  xmm0,xmm2,0B1h
col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));
012D1353  movaps	  xmm2,xmm0
012D1356  shufps	  xmm2,xmm0,0Fh
012D135D  sqrtps	  xmm0,xmm2
012D1360  divps	   xmm1,xmm0
012D1363  movaps	  xmmword ptr [eax],xmm1
}


If we exclude first and last instructions (which move data from and into memory) actual calculation is just 10 operations. Accuracy isn't an issue either.

### #23Tachikoma  Members

Posted 21 January 2012 - 07:55 AM

Quick hack-up in SSE4:

movaps xmm0,xmmword ptr [Vector]
movaps xmm1,xmm0
dpps xmm0,xmm0,CCh
rsqrtps xmm0,xmm0
mulps xmm1,xmm0
movaps xmmword ptr [Vector],xmm1


Not tested, and I'm not 100% sure the immediate value for dpps is right.
Latest project: Sideways Racing on the iPad

### #24osmanb  Members

Posted 21 January 2012 - 07:55 AM

This has to be a troll at this point, right? Please let this be a troll...

### #25Ripiz  Members

Posted 21 January 2012 - 11:42 AM

Not tested, and I'm not 100% sure the immediate value for dpps is right.

Should be right. Didn't knew there's dot instruction. Sadly my CPU doesn't have SSE4

This has to be a troll at this point, right? Please let this be a troll...

No it's not troll. Just wondering why FPU if there's SSE versions. It's probably even easier, like Tachikoma suggested, it can fit in one line
__m128 normalize(const __m128 vector) {
return _mm_div_ps(vector, _mm_sqrt_ps(_mm_dp_ps(vector, vector, 0xFF))); // on second thought it might be FF, not CC
}


### #26clb  Members

Posted 21 January 2012 - 11:50 AM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

### #27Tachikoma  Members

Posted 21 January 2012 - 09:07 PM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

As others have already mentioned in this thread, that algorithm is no longer practical on a modern architecture. But it remains as a mathematical curiously none the less.
Latest project: Sideways Racing on the iPad

### #28Chris_F  Members

Posted 21 January 2012 - 09:28 PM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

Maybe a better read: http://assemblyrequi...ng-square-root/ and http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.