I have a very math laden function that works fine in it's non-sse version but causes some very odd behavior when I try to switch to using SSE Intrinsics.
Here is the non-sse function that works. It's purpose is to simulate a gravitational pull on a large array of objects.
void CalculateMath ()
{
float RadX, RadY, RadiusSquared, Mag,
xPos, yPos, xVel, yVel;
const float AccelFactor = -dT*GravityDirection*GravityMultiplier*100000.0f;
for (int i=0; i<Population; i++)
{
xPos = xpos; //Copy array values into normal variables; mainly for readability.
yPos = ypos;
xVel = xvel;
yVel = yvel;
RadX = xPos-CenterX;// X & Y components of the distance to the center.
RadY = yPos-CenterY;
RadiusSquared = RadY * RadY + RadX * RadX;
Mag = AccelFactor/(RadiusSquared * sqrtf(RadiusSquared)); //Fancy math
xVel += RadX * Mag;
yVel += RadY * Mag;
xPos += dT*xVel;
yPos += dT*yVel;
xpos = xPos; //Put values back into array
ypos = yPos;
xvel = xVel;
yvel = yVel;
}
}
Here is the "equvalent" sse code that I've come up with so far and doesn't work.
I can't find the problem with it.
void CalculateMathSSE()
{
__m128 xPos, yPos, xVel, yVel,
RadX, RadY, RadiusSquared, Mag,
AccelFactor = _mm_set_ps1( -dT*GravityDirection*GravityMultiplier*100000.0f ),
sseCenterX = _mm_load_ps1( &CenterX ),
sseCenterY = _mm_load_ps1( &CenterY ),
sse_dT = _mm_load_ps1( &dT );
for (int i=0; i<Population; i+=4)
{
xPos = _mm_load_ps( &xpos ); //Load 4 floats at a time from the array into a _m128 value
yPos = _mm_load_ps( &ypos );
xVel = _mm_load_ps( &xvel );
yVel = _mm_load_ps( &yvel );
RadX = _mm_sub_ps( xPos, sseCenterX ); //Distance
RadY = _mm_sub_ps( yPos, sseCenterY );
RadiusSquared = _mm_add_ps( _mm_mul_ps( xPos, xPos ), _mm_mul_ps( yPos, yPos ));
Mag = _mm_div_ps( AccelFactor, _mm_mul_ps( RadiusSquared, _mm_sqrt_ps( RadiusSquared ))); //Fancy math
xVel = _mm_add_ps( xVel, _mm_mul_ps( RadX, Mag ));
yVel = _mm_add_ps( yVel, _mm_mul_ps( RadY, Mag ));
xPos = _mm_add_ps( xPos, _mm_mul_ps( sse_dT, xVel ));
yPos = _mm_add_ps( yPos, _mm_mul_ps( sse_dT, yVel ));
_mm_store_ps( &xpos, xPos ); //Put values back into arrays
_mm_store_ps( &ypos, yPos );
_mm_store_ps( &xvel, xVel );
_mm_store_ps( &yvel, yVel );
}
}
What am I doing wrong? I removed some code that would simulate friction because it worked in both versions, so I think the problem lies within the "fancy math" part.
All the ugly stuff like alignment is taken care of, and it runs without crashing, but it just doesn't behave like the original.
[Edited by - KaerfSusej on September 19, 2008 12:16:03 PM]