SSE vector normalization

Started by
13 comments, last by Dave Eberly 11 years, 8 months ago
There is actually a very valid reason to simply crash on a 0 vector in your normalization function and I promise it has nothing to do with "ignorance or laziness". The entire point of using SSE is that it is high performance code. Normalization of a 0 vector, as previously stated is technically an invalid operation.

Adding vector validation (i.e. checking for a 0 vector) in your normalization code will introduce unnecessary run-time overhead (in the form of potential branch mis-predictions and LHS) to a performance-sensitive area of your code. As the operation in question is technically invalid, those concerned with performance will opt to have the function crash rather than introduce the overhead. If this code crashes, the real bug lies elsewhere (the attempt to normalize a 0 vector ... why is your vector 0? Why are you trying to normalize it if it is? These are the bugs you should be concerned with).

If there is a case in your code where you *may* be normalizing a 0 vector (direction derived via velocity when player is standing still perhaps?), then you should validate the vector *before* the attempt to normalize. The reason for this is that these cases are likely few and far between, and introducing the overhead that I explained above to *every* instance of a call to normalize is unfairly penalizing everyone who calls the function, whether they have a chance to pass a 0 vector or not.
Advertisement

*snip*

Sounds to me like a perfect time to use assert, like was suggested...
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

Sounds to me like a perfect time to use assert, like was suggested...


Agreed, I had meant to reiterate that a debug-only assertion was a valid solution! :)

[quote name='Cornstalks' timestamp='1343401501' post='4963662']
Sounds to me like a perfect time to use assert, like was suggested...


Agreed, I had meant to reiterate that a debug-only assertion was a valid solution! smile.png
[/quote]
Ah, I see, I thought you were saying an assert was a bad idea. Looks like we're on the same page :)
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]


inline const CVector3SSE& CVector3SSE::Normalize()
{
static const __m128 almostZero = _mm_set1_ps(1e-5f);
__m128 dp = _mm_dp_ps(m_fValsSSE, m_fValsSSE, 0x7F);
const __m128 cmp = _mm_gt_ps(dp, almostZero);
dp = _mm_rsqrt_ps(dp);
m_fValsSSE = _mm_mul_ps(m_fValsSSE, _mm_and_ps(dp, cmp));
return *this;
}



Although yours is the standard way folks do the normalization, for large components the dot product overflows. If you need something that is robust for all finite floating-point inputs,

inline __m128 MaximumAbsoluteComponent (__m128 const v)
{
__m128 SIGN = _mm_set1_ps(0x80000000u);
__m128 vAbs = _mm_andnot_ps(SIGN, v);
__m128 max0 = _mm_shuffle_ps(vAbs, vAbs, _MM_SHUFFLE(0,0,0,0));
__m128 max1 = _mm_shuffle_ps(vAbs, vAbs, _MM_SHUFFLE(1,1,1,1));
__m128 max2 = _mm_shuffle_ps(vAbs, vAbs, _MM_SHUFFLE(2,2,2,2));
__m128 max3 = _mm_shuffle_ps(vAbs, vAbs, _MM_SHUFFLE(3,3,3,3));
max0 = _mm_max_ps(max0, max1);
max2 = _mm_max_ps(max2, max3);
max0 = _mm_max_ps(max0, max2);
return max0;
}

inline __m128 Normalize (__m128 const v)
{
// Compute the maximum absolute value component.
__m128 maxComponent = MaximumAbsoluteComponent(v);

// Divide by the maximum absolute component. This is potentially a divide by zero.
__m128 normalized = _mm_div_ps(v, maxComponent);

// Set to zero when the original length is zero.
__m128 zero = _mm_setzero_ps();
__m128 mask = _mm_cmpneq_ps(zero, maxComponent);
normalized = _mm_and_ps(mask, normalized);

// (sqrLength, sqrLength, sqrLength, sqrLength)
__m128 sqrLength = _mm_dp_ps(normalized, normalized, 0x7F);

// (length, length, length, length)
__m128 length = _mm_sqrt_ps(sqrLength);

// Divide by the length to normalize. This is potentially a divide by zero.
normalized = _mm_div_ps(normalized, length);

// Set to zero when the original length is zero or infinity. In the latter case, this is considered to be an unexpected condition.
normalized = _mm_and_ps(mask, normalized);
return normalized;
}

This topic is closed to new replies.

Advertisement