Back to General and Gameplay Programming

[C++] SSE Intrinsics Help

General and Gameplay Programming Programming

Started by KaerfSusej September 17, 2008 07:07 PM

7 comments, last by Ryan_001 15 years, 7 months ago

KaerfSusej

122

Author

September 17, 2008 07:07 PM

I have a very math laden function that works fine in it's non-sse version but causes some very odd behavior when I try to switch to using SSE Intrinsics. Here is the non-sse function that works. It's purpose is to simulate a gravitational pull on a large array of objects.

void CalculateMath ()
{
float RadX, RadY, RadiusSquared, Mag, 
      xPos, yPos, xVel, yVel;
const float AccelFactor = -dT*GravityDirection*GravityMultiplier*100000.0f;

	for (int i=0; i<Population; i++)
	{
		xPos = xpos; //Copy array values into normal variables; mainly for readability.
		yPos = ypos;
		xVel = xvel;
		yVel = yvel;


		RadX = xPos-CenterX;// X & Y components of the distance to the center.
		RadY = yPos-CenterY;

		RadiusSquared = RadY * RadY + RadX * RadX; 
		Mag = AccelFactor/(RadiusSquared * sqrtf(RadiusSquared)); //Fancy math
		
		xVel += RadX * Mag;
		yVel += RadY * Mag;

		xPos += dT*xVel;
		yPos += dT*yVel;

		xpos = xPos; //Put values back into array
		ypos = yPos;
		xvel = xVel;
		yvel = yVel;
	}
}

Here is the "equvalent" sse code that I've come up with so far and doesn't work. I can't find the problem with it.

void CalculateMathSSE()
{
	__m128 xPos, yPos, xVel, yVel,
		RadX, RadY, RadiusSquared, Mag,
		AccelFactor = _mm_set_ps1( -dT*GravityDirection*GravityMultiplier*100000.0f ),
		sseCenterX = _mm_load_ps1( &CenterX ),
		sseCenterY = _mm_load_ps1( &CenterY ),
		sse_dT = _mm_load_ps1( &dT );

	for (int i=0; i<Population; i+=4)
	{
		xPos = _mm_load_ps( &xpos ); //Load 4 floats at a time from the array into a _m128 value
		yPos = _mm_load_ps( &ypos );
		xVel = _mm_load_ps( &xvel );
		yVel = _mm_load_ps( &yvel );

		RadX = _mm_sub_ps( xPos, sseCenterX ); //Distance
		RadY = _mm_sub_ps( yPos, sseCenterY );

		RadiusSquared = _mm_add_ps( _mm_mul_ps( xPos, xPos ), _mm_mul_ps( yPos, yPos ));
		Mag = _mm_div_ps( AccelFactor, _mm_mul_ps( RadiusSquared, _mm_sqrt_ps( RadiusSquared ))); //Fancy math

		xVel = _mm_add_ps( xVel, _mm_mul_ps( RadX, Mag ));
		yVel = _mm_add_ps( yVel, _mm_mul_ps( RadY, Mag ));

		xPos = _mm_add_ps( xPos, _mm_mul_ps( sse_dT, xVel ));
		yPos = _mm_add_ps( yPos, _mm_mul_ps( sse_dT, yVel ));

		_mm_store_ps( &xpos, xPos ); //Put values back into arrays
		_mm_store_ps( &ypos, yPos );
		_mm_store_ps( &xvel, xVel );
		_mm_store_ps( &yvel, yVel );

	}
}

What am I doing wrong? I removed some code that would simulate friction because it worked in both versions, so I think the problem lies within the "fancy math" part. All the ugly stuff like alignment is taken care of, and it runs without crashing, but it just doesn't behave like the original. [Edited by - KaerfSusej on September 19, 2008 12:16:03 PM]

KaerfSusej

122

Author

September 17, 2008 08:07 PM

And this might be better off in the math and physics forum. I don't know.

RobTheBloke

2,553

September 18, 2008 04:03 AM

Whats actually wrong with it? The maths? Does it crash?

moosedude

145

September 18, 2008 04:55 AM

hows the memory alignment for the arrays xpos, ypos ,xvel & yvel?

have you got your compiler configured for 16 byte alignment?

http://www.fotofill.co.uk

KaerfSusej

122

Author

September 19, 2008 12:14 PM

All the ugly stuff like alignment is taken care of, and it runs without crashing, but it just doesn't behave like the original.

BlindSide

136

September 19, 2008 02:12 PM

How different is the output? "_mm_sqrt_ps" is quite inaccurate afaik.

RobTheBloke

2,553

September 20, 2008 12:08 AM

Quote:Original post by BlindSide
How different is the output? "_mm_sqrt_ps" is quite inaccurate afaik.

using 32 bit instead of 80 bit is inaccurate fullstop...

CastorX

132

September 20, 2008 04:25 AM

This is just an idea, but I think you should rewrite it into simple assembly SSE code. Without this Intel __mm_addps.... stuff.
And this works correctly if you have n*4 elements in your population.

Ryan_001

3,477

September 20, 2008 04:53 AM

What compiler are you using? I've heard of a few cases of Visual C++ producing incorrect intrinsic code. I might be worth taking a look at the disassembly.

[C++] SSE Intrinsics Help

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

[C++] SSE Intrinsics Help

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines