Back to General and Gameplay Programming

Inline asm conditional branching

General and Gameplay Programming Programming

Started by Schinizer August 09, 2011 12:31 PM

4 comments, last by Schinizer 12 years, 8 months ago

Schinizer

122

Author

August 09, 2011 12:31 PM

Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.

japro

887

August 09, 2011 01:13 PM

Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.

Super short version: You "cmp" on a pair of whatever you want to test and then use the appropriate jump command.



cmpl %eax, %ebx

je Label #jump if equal

jl Label #jump if lower

jle Label #jump if lower or equal

...

But when you do this mainly to use SIMD stuff i highly recommend using intrinsics (google "xmmintrin") instead of inline assembler. Like this you don't have to deal with all that "clobber list" stuff and whatnot, can use the "high level" functionality of ifs etc. and performance is pretty much equal to what inline asm would get you.

Tweet tweet!
My videos on YouTube
OpenGL Example Collection

Adam_42

3,664

August 09, 2011 01:51 PM

When using SIMD you're usually best off using branch free code, using instructions like CMPLTPS. That lets you do four compares with one instruction, and even with the extra work of computing both answers and masking off the one you don't want it'll still be quicker than four compares and jumps most of the time (branches tend to be expensive on modern processors).

Schinizer

122

Author

August 09, 2011 03:09 PM

Lets take this function for example:



int LineIntersect(

float x1, float y1,

float x2, float y2,

float x3, float y3,

float x4, float y4,

float *x, float *y)

{

	float mua,mub;

	float denom,numera,numerb;



	denom  = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);

	numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);

	numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);



	/* Are the line coincident? */

	if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS) 

	{

		*x = (x1 + x2) / 2;

		*y = (y1 + y2) / 2;

		return(TRUE);

	}



	/* Are the line parallel */

	if (ABS(denom) < EPS) 

	{

		*x = 0;

		*y = 0;

		return(FALSE);

	}



	/* Is the intersection along the the segments */

	mua = numera / denom;

	mub = numerb / denom;



	if (mua < 0 || mua > 1 || mub < 0 || mub > 1) 

	{

		*x = 0;

		*y = 0;

		return(FALSE);

	}



	*x = x1 + mua * (x2 - x1);

	*y = y1 + mua * (y2 - y1);

	return(TRUE);

}

After reading from the post by adam_42, am I right to say that the code below perform about the same as another version coded in pure asm? Perhaps only faster by few microseconds? (I know the copy operations make it slow but I'm lazy to change the floats here to a proper vector2. I'm just using this as an example)



int LineIntersect(

float x1, float y1,

float x2, float y2,

float x3, float y3,

float x4, float y4,

float *x, float *y)

{

	float mua,mub;

	float denom,numera,numerb;



	float pt1[2] = {x1, y1};

	float pt2[2] = {x2, y2};

	float pt3[2] = {x3, y3};

	float pt4[2] = {x4, y4};



	//denom  = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);

	//numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);

	//numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);



	//Lets take this part as calculation for denom, numera and numerb

	__asm

	(

		"..."

		"..."

		"..."

		: "=r"(denom), "=r"(numera), "=r"(numerb)

		: "o"(pt1), "o"(pt2), "o"(pt3), "o"(pt4)

	);



	/* Are the line coincident? */

	if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS) 

	{

		//Lets take this part as calculation for intersection point

		/**x = (x1 + x2) / 2;

		*y = (y1 + y2) / 2;*/

		__asm

		(

			"..."

			"..."

			"..."

			: "=r"(x), "=r"(y)

		);

		return(TRUE);

	}



	/* Are the line parallel */

	if (ABS(denom) < EPS) {

		*x = 0;

		*y = 0;

		return(FALSE);

	}



	/* Is the intersection along the the segments */

	/*mua = numera / denom;

	mub = numerb / denom;*/



	__asm

	(

		"..."

		"..."

		"..."

		: "=r"(mua), "=r"(mub)

	);



	if (mua < 0 || mua > 1 || mub < 0 || mub > 1) {

		*x = 0;

		*y = 0;

		return(FALSE);

	}



	//Lets take this part as calculation for intersection point

	/**x = x1 + mua * (x2 - x1);

	*y = y1 + mua * (y2 - y1);*/

	__asm

	(

		"..."

		"..."

		"..."

		: "=r"(x), "=r"(y)

	);



	return(TRUE);

}

Adam_42

3,664

August 09, 2011 04:18 PM

The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find http://software.intel.com/en-us/blogs/2009/08/12/parallelization-and-optimization-of-the-line-segment-intersection-problem/ interesting / useful.

Schinizer

122

Author

August 09, 2011 05:07 PM

The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find http://software.inte...ection-problem/ interesting / useful.

I see, I'll certainly go measure them. All your information helped me :cool:

Inline asm conditional branching

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Inline asm conditional branching

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines