Normalization Approximation in 30 operations

Butabee · 2012-01-22T03:28:59

I think I finally figured out a version that works... it is this squaredvector = position.x * position.x + position.y * position.y + position.z * position.z onediv = 1 / squaredvector position.x *= signx position.y *= signy position.z *= signz xper = position.x * onediv yper = position.y * onediv zper = position.z * onediv posyz = position.y + position.z posxz = position.x + position.z posxy = position.x + position.y xmul = 1.0 - (posyz * onediv) ymul = 1.0 - (posxz * onediv) zmul = 1.0 -(posxy * onediv) normal.x = xper * (xmul * posyz)* signx normal.y = yper * (ymul * posxz)* signy normal.z = zper * (zmul * posxy) * signz Think this is the final version.

Graphics and GPU Programming Programming

Started by Butabee January 17, 2012 02:50 PM

26 comments, last by Chris_F 12 years, 2 months ago

Sirisian

2,263

January 21, 2012 07:34 AM

Just delete the post since the community seems to think it's useless.

It's kind of important that you realize why they might think that. From a purely mathematical sense you should be starting in 2D and not 3D. Ideally any normalization algorithm created in 2D will expand to 3D or so I believe.

Also here's a paste of your code running on 100 random vectors. Your algorithm doesn't work at all.

Ripiz

539

January 21, 2012 09:53 AM

What's wrong with SSE?



__declspec(noinline) void normalize(__m128 &col0) { // not inlined for testing purpose

__m128 dot = _mm_mul_ps(col0, col0);

dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));

col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));

}

Original topic is about normalization in 30 operations, according to disassembly it's just 12 operations



__declspec(noinline) void normalize(__m128 &col0) {

__m128 dot = _mm_mul_ps(col0, col0);

012D1340  movaps	  xmm1,xmmword ptr [eax] 

012D1343  movaps	  xmm2,xmm1 

012D1346  mulps	   xmm2,xmm1 

dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));

012D1349  movaps	  xmm0,xmm2 

012D134C  shufps	  xmm0,xmm2,0B1h 

012D1350  addps	   xmm0,xmm2 

col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));

012D1353  movaps	  xmm2,xmm0 

012D1356  shufps	  xmm2,xmm0,0Fh 

012D135A  addps	   xmm2,xmm0 

012D135D  sqrtps	  xmm0,xmm2 

012D1360  divps	   xmm1,xmm0 

012D1363  movaps	  xmmword ptr [eax],xmm1 

}

If we exclude first and last instructions (which move data from and into memory) actual calculation is just 10 operations. Accuracy isn't an issue either.

Tachikoma

575

January 21, 2012 01:55 PM

Quick hack-up in SSE4:



movaps xmm0,xmmword ptr [Vector] 

movaps xmm1,xmm0 

dpps xmm0,xmm0,CCh

rsqrtps xmm0,xmm0

mulps xmm1,xmm0

movaps xmmword ptr [Vector],xmm1

Not tested, and I'm not 100% sure the immediate value for dpps is right.

Latest project: Sideways Racing on the iPad

osmanb

2,082

January 21, 2012 01:55 PM

This has to be a troll at this point, right? Please let this be a troll...

Ripiz

539

January 21, 2012 05:42 PM

Not tested, and I'm not 100% sure the immediate value for dpps is right.

Should be right. Didn't knew there's dot instruction. Sadly my CPU doesn't have SSE4

This has to be a troll at this point, right? Please let this be a troll...

No it's not troll. Just wondering why FPU if there's SSE versions. It's probably even easier, like Tachikoma suggested, it can fit in one line



__m128 normalize(const __m128 vector) {

   return _mm_div_ps(vector, _mm_sqrt_ps(_mm_dp_ps(vector, vector, 0xFF))); // on second thought it might be FF, not CC

}

clb

2,152

January 21, 2012 05:50 PM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

Tachikoma

575

January 22, 2012 03:07 AM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

As others have already mentioned in this thread, that algorithm is no longer practical on a modern architecture. But it remains as a mathematical curiously none the less.

Latest project: Sideways Racing on the iPad

Chris_F

3,030

January 22, 2012 03:28 AM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

Maybe a better read: http://assemblyrequi...ng-square-root/ and http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/

Normalization Approximation in 30 operations

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Normalization Approximation in 30 operations

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines