Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Normalization Approximation in 30 operations


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
27 replies to this topic

#21 Sirisian   Crossbones+   -  Reputation: 1792

Like
1Likes
Like

Posted 21 January 2012 - 01:34 AM

Just delete the post since the community seems to think it's useless.

It's kind of important that you realize why they might think that. From a purely mathematical sense you should be starting in 2D and not 3D. Ideally any normalization algorithm created in 2D will expand to 3D or so I believe.

Also here's a paste of your code running on 100 random vectors. Your algorithm doesn't work at all.

Sponsor:

#22 Ripiz   Members   -  Reputation: 529

Like
1Likes
Like

Posted 21 January 2012 - 03:53 AM

What's wrong with SSE?
__declspec(noinline) void normalize(__m128 &col0) { // not inlined for testing purpose
__m128 dot = _mm_mul_ps(col0, col0);
dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));
col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));
}

Original topic is about normalization in 30 operations, according to disassembly it's just 12 operations
__declspec(noinline) void normalize(__m128 &col0) {
__m128 dot = _mm_mul_ps(col0, col0);
012D1340  movaps	  xmm1,xmmword ptr [eax] 
012D1343  movaps	  xmm2,xmm1 
012D1346  mulps	   xmm2,xmm1 
dot = _mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(2, 3, 0, 1)));
012D1349  movaps	  xmm0,xmm2 
012D134C  shufps	  xmm0,xmm2,0B1h 
012D1350  addps	   xmm0,xmm2 
col0 = _mm_div_ps(col0, _mm_sqrt_ps(_mm_add_ps(dot, _mm_shuffle_ps(dot, dot, _MM_SHUFFLE(0, 0, 3, 3)))));
012D1353  movaps	  xmm2,xmm0 
012D1356  shufps	  xmm2,xmm0,0Fh 
012D135A  addps	   xmm2,xmm0 
012D135D  sqrtps	  xmm0,xmm2 
012D1360  divps	   xmm1,xmm0 
012D1363  movaps	  xmmword ptr [eax],xmm1 
}

If we exclude first and last instructions (which move data from and into memory) actual calculation is just 10 operations. Accuracy isn't an issue either.

#23 Tachikoma   Members   -  Reputation: 552

Like
1Likes
Like

Posted 21 January 2012 - 07:55 AM

Quick hack-up in SSE4:

movaps xmm0,xmmword ptr [Vector] 
movaps xmm1,xmm0 
dpps xmm0,xmm0,CCh
rsqrtps xmm0,xmm0
mulps xmm1,xmm0
movaps xmmword ptr [Vector],xmm1

Not tested, and I'm not 100% sure the immediate value for dpps is right.
Latest project: Sideways Racing on the iPad

#24 osmanb   Crossbones+   -  Reputation: 1628

Like
0Likes
Like

Posted 21 January 2012 - 07:55 AM

This has to be a troll at this point, right? Please let this be a troll...

#25 Ripiz   Members   -  Reputation: 529

Like
0Likes
Like

Posted 21 January 2012 - 11:42 AM

Not tested, and I'm not 100% sure the immediate value for dpps is right.


Should be right. Didn't knew there's dot instruction. Sadly my CPU doesn't have SSE4 Posted Image




This has to be a troll at this point, right? Please let this be a troll...


No it's not troll. Just wondering why FPU if there's SSE versions. It's probably even easier, like Tachikoma suggested, it can fit in one line
__m128 normalize(const __m128 vector) {
   return _mm_div_ps(vector, _mm_sqrt_ps(_mm_dp_ps(vector, vector, 0xFF))); // on second thought it might be FF, not CC
}


#26 clb   Members   -  Reputation: 1787

Like
0Likes
Like

Posted 21 January 2012 - 11:50 AM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#27 Tachikoma   Members   -  Reputation: 552

Like
0Likes
Like

Posted 21 January 2012 - 09:07 PM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.

As others have already mentioned in this thread, that algorithm is no longer practical on a modern architecture. But it remains as a mathematical curiously none the less.
Latest project: Sideways Racing on the iPad

#28 Chris_F   Members   -  Reputation: 2460

Like
0Likes
Like

Posted 21 January 2012 - 09:28 PM

Also, for people who want to try to hack normalizations to be faster, this is a very interesting read: Chris Lomont - Fast Inverse Square Root.


Maybe a better read: http://assemblyrequi...ng-square-root/ and http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS