Back to General and Gameplay Programming

Floating point behavior of xmm registers

General and Gameplay Programming Programming

Started by Laval B September 18, 2011 04:46 PM

5 comments, last by Laval B 12 years, 7 months ago

Laval B

12,400

Author

September 18, 2011 04:46 PM

Hi everyone.

I'm working on a 4x4 matrix product using SSE instruction set. To develop, i'm using inline assembly with Visual Studio 2010 in C++. I'm not using intrinsics. I just stumbled accross something i can't explain. For the sake of discussion let's assume a function defined like the following :





inline void Multiply(const float *a, const float *b, float *c)

{

	__asm

	{

		mov eax, a

//        mov ecx, b

		mov edx, c



		movss xmm0, dword ptr [eax]

		movaps xmmword ptr [edx], xmm0



	}

}

Basically, i store a 32 bits floating point number into the first 32 bits element of the xmm0 register then i store back the content of the register into a floating point array (having room for 16 floats). Yes, i have alligned my data on 16 bytes (no exception is raised). The problem is that the value in c is not the value i had stored from a. a[0] contained 18.283001 and the value returned into c[0] is 1.2000000. If i do



shufps	xmm0, xmm0,	0h

before i store to c, the value in the first component of xmm0 is propagated to the other components but it is still the same, 1.2000000. I don't understand why putting a value in an xmm register makes that value change.

Anyone have an idea ?

We think in generalities, but we live in details.
- Alfred North Whitehead

Erik Rufelt

5,903

September 18, 2011 05:10 PM

You move 'c' into edx, not 'b', so you store in the wrong place. Use ecx or move 'b' into edx instead of 'c'. As of now you overwrite memory you shouldn't.

Laval B

12,400

Author

September 18, 2011 05:33 PM

Ok, i edited my original post, c is also a pointer, which is the case in my code. So i store a into xmm0 then store back xmm0 into c. The result is the same.

We think in generalities, but we live in details.
- Alfred North Whitehead

Erik Rufelt

5,903

September 18, 2011 06:39 PM

I ran your code, it works perfectly fine and outputs 18.283001 if that is the input.
Always post a complete working sample that compiles and runs with the stated behavior when asking these kinds of questions. Your error is in another part of your program.

Laval B

12,400

Author

September 18, 2011 07:56 PM

I ran your code, it works perfectly fine and outputs 18.283001 if that is the input.
Always post a complete working sample that compiles and runs with the stated behavior when asking these kinds of questions. Your error is in another part of your program.

Yes you are completly right, i had another test that i forgot about and it was messing my variables. I'm really sorry about that and thanks alot for your time and answer.

I have cleanedup a bit of stuff and i got another problem problem though. Here is the code of the function (complete)





/////////////////////////////////////////////////////////////////////////////

//	Multiply two 4x4 row-major matrices using SSE instructions.

__forceinline void SSEMultAlligned(const f32 *a, const f32 *b, f32 *c)

{

	__asm

	{

		//	Get pointers to matrices

		//	into registers.

		mov eax, a

		mov ecx, b

		mov edx, c



		movss	xmm0,	dword ptr [eax]			//	Move a[0] into xmm0 first element.

		movaps	xmm1,	xmmword ptr [ecx]		//	Move row 0 of b into xmm1.

		shufps	xmm0, xmm0,	0h					//	Broadcast a[0] in all xmm0.

//		mulps   xmm0, xmm1						//	Multiply a[0]with row 0 of b.

/*

		//	Row 0.

		movss	xmm0,	dword ptr [eax]			//	Move a[0] into xmm0 first element.

		movaps	xmm1,	xmmword ptr [ecx]		//	Move row 0 of b into xmm1.

		shufps	xmm0, xmm0,	0h					//	Broadcast a[0] in all xmm0.

		movss	xmm2,	dword ptr [eax+10h]		//	Move a[1] into xmm2 first element.

		mulps   xmm0, xmm1						//	Multiply a[0]with row 0 of b.

		shufps  xmm2, xmm2, 0h					//	Broadcast a[1] in all xmm2.

		movaps  xmm3,	xmmword ptr [ecx+10h]	//	Move row 1 of b into xmm3.

		movss   xmm4,	dword ptr [eax+20h]		//	Move a[2] into xmm4.

		mulps	xmm2, xmm3						//	Multiply a[1] with row 1 of b

		shufps  xmm4, xmm4, 0h					//	Broadcast a[2] into xmm4.

		addps   xmm0, xmm2						//	Accumulate result into xmm0.

		movaps  xmm2,	xmmword ptr [ecx+20h]	//	Move row 2 of b into xmm2.

		mulps   xmm4, xmm2						//	Multiply a[2] with row 2 of b.

		movss	xmm1,	dword ptr [eax + 30h]	//	Load a[3] into xmm1 first element.

		addps   xmm0, xmm4						//	Accumulate result into xmm0.

*/

		movaps  xmmword ptr [edx], xmm0			//	Store first line of result into c.



	}

}

It is called like that :





	__declspec(align(16)) float aa[16] = {1.20f, 0.50f, 1.30f, 1.82f,

                                              		6.28f, 3.40f, 2.27f, 1.55f,

			                      		1.40f, 0.25f, 9.82f, 1.75f,

			                      		2.20f, 1.80f, 1.10f, 3.17f};



	__declspec(align(16)) float bb[16] = {0.10f, -1.1f, 1.25f, 0.82f,

                                              		2.01f, 6.10f, 4.02f,-1.87f,

			                      		1.12f, 2.25f, 1.10f, 7.30f,

			                      		2.40f, 1.75f, 6.10f, 4.20f};



	__declspec(align(16)) float cc[16];





	SSEMultAlligned(aa, bb, cc);

If you run the code with the function as it is and put aa and cc in the debugger watch, cc will contain 1.200000, 1.200000, 1.200000, 1.200000 as it should. If you change xmm0 for xmm1 in the last line of the function and repeate, cc will conatin 0.10000000, -1.1000000, 1.2500000 and 0.81999999 as it should as well (the first row of bb).

However, if you uncomment the line mulps xmm0, xmm1 and output xmm0 again into cc, only the first component is multiplied correctly. The others are not. I get :

[0] 0.12000000 (which is ok, 0.12 = 1.2 * 0.1)
[1] -1.3200001 (wrong, should be 0.5 * -1.1 = -0.55)
[2] 1.5000000 (wrong again, should be 1.3 * 1.25 = 1.625)
[3] 0.98400003 (wrong too, should be 1.82 * 0.82 = 1.4924).

It looks as though only the first floating component is multiplied correctly, i don't know where the others come from but they are quite far from what they should be.

We think in generalities, but we live in details.
- Alfred North Whitehead

Erik Rufelt

5,903

September 18, 2011 08:06 PM

xmm0 is filled with 1.2 1.2 1.2 1.2, then that is multiplied by xmm1, so no, every value in xmm1 should be multiplied by 1.2, which seems to be exactly the output you get.

Laval B

12,400

Author

September 18, 2011 08:22 PM

xmm0 is filled with 1.2 1.2 1.2 1.2, then that is multiplied by xmm1, so no, every value in xmm1 should be multiplied by 1.2, which seems to be exactly the output you get.

That's right, i was looking at the wrong variable in my watch. Evereything is just fine.

Thanks alot.

We think in generalities, but we live in details.
- Alfred North Whitehead

Floating point behavior of xmm registers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Floating point behavior of xmm registers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines