Inline asm conditional branching

Started by
4 comments, last by Schinizer 12 years, 8 months ago
Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.
Advertisement

Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.

Super short version: You "cmp" on a pair of whatever you want to test and then use the appropriate jump command.

cmpl %eax, %ebx
je Label #jump if equal
jl Label #jump if lower
jle Label #jump if lower or equal
...

But when you do this mainly to use SIMD stuff i highly recommend using intrinsics (google "xmmintrin") instead of inline assembler. Like this you don't have to deal with all that "clobber list" stuff and whatnot, can use the "high level" functionality of ifs etc. and performance is pretty much equal to what inline asm would get you.
When using SIMD you're usually best off using branch free code, using instructions like CMPLTPS. That lets you do four compares with one instruction, and even with the extra work of computing both answers and masking off the one you don't want it'll still be quicker than four compares and jumps most of the time (branches tend to be expensive on modern processors).
Lets take this function for example:


int LineIntersect(
float x1, float y1,
float x2, float y2,
float x3, float y3,
float x4, float y4,
float *x, float *y)
{
float mua,mub;
float denom,numera,numerb;

denom = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);
numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);
numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);

/* Are the line coincident? */
if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS)
{
*x = (x1 + x2) / 2;
*y = (y1 + y2) / 2;
return(TRUE);
}

/* Are the line parallel */
if (ABS(denom) < EPS)
{
*x = 0;
*y = 0;
return(FALSE);
}

/* Is the intersection along the the segments */
mua = numera / denom;
mub = numerb / denom;

if (mua < 0 || mua > 1 || mub < 0 || mub > 1)
{
*x = 0;
*y = 0;
return(FALSE);
}

*x = x1 + mua * (x2 - x1);
*y = y1 + mua * (y2 - y1);
return(TRUE);
}


After reading from the post by adam_42, am I right to say that the code below perform about the same as another version coded in pure asm? Perhaps only faster by few microseconds? (I know the copy operations make it slow but I'm lazy to change the floats here to a proper vector2. I'm just using this as an example)


int LineIntersect(
float x1, float y1,
float x2, float y2,
float x3, float y3,
float x4, float y4,
float *x, float *y)
{
float mua,mub;
float denom,numera,numerb;

float pt1[2] = {x1, y1};
float pt2[2] = {x2, y2};
float pt3[2] = {x3, y3};
float pt4[2] = {x4, y4};

//denom = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);
//numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);
//numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);

//Lets take this part as calculation for denom, numera and numerb
__asm
(
"..."
"..."
"..."
: "=r"(denom), "=r"(numera), "=r"(numerb)
: "o"(pt1), "o"(pt2), "o"(pt3), "o"(pt4)
);

/* Are the line coincident? */
if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS)
{
//Lets take this part as calculation for intersection point
/**x = (x1 + x2) / 2;
*y = (y1 + y2) / 2;*/
__asm
(
"..."
"..."
"..."
: "=r"(x), "=r"(y)
);
return(TRUE);
}

/* Are the line parallel */
if (ABS(denom) < EPS) {
*x = 0;
*y = 0;
return(FALSE);
}

/* Is the intersection along the the segments */
/*mua = numera / denom;
mub = numerb / denom;*/

__asm
(
"..."
"..."
"..."
: "=r"(mua), "=r"(mub)
);

if (mua < 0 || mua > 1 || mub < 0 || mub > 1) {
*x = 0;
*y = 0;
return(FALSE);
}

//Lets take this part as calculation for intersection point
/**x = x1 + mua * (x2 - x1);
*y = y1 + mua * (y2 - y1);*/
__asm
(
"..."
"..."
"..."
: "=r"(x), "=r"(y)
);

return(TRUE);
}
The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find http://software.intel.com/en-us/blogs/2009/08/12/parallelization-and-optimization-of-the-line-segment-intersection-problem/ interesting / useful.

The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find http://software.inte...ection-problem/ interesting / useful.


I see, I'll certainly go measure them. All your information helped me :cool:

This topic is closed to new replies.

Advertisement