Sign in to follow this  
Schinizer

Inline asm conditional branching

Recommended Posts

Schinizer    122
Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.

Share this post


Link to post
Share on other sites
japro    887
[quote name='Schinizer' timestamp='1312893103' post='4846650']
Hello everyone,

I'm trying to learn some conditional branching so that I can use it with SIMD (Lets assume in this case its windows and gcc). Googling return vague information on branching.

Anyone here has done conditional branching before? Care to share some resources or perhaps show me how it works?

Thank you very much.
[/quote]
Super short version: You "cmp" on a pair of whatever you want to test and then use the appropriate jump command.
[code]
cmpl %eax, %ebx
je Label #jump if equal
jl Label #jump if lower
jle Label #jump if lower or equal
...
[/code]
But when you do this mainly to use SIMD stuff i highly recommend using intrinsics (google "xmmintrin") instead of inline assembler. Like this you don't have to deal with all that "clobber list" stuff and whatnot, can use the "high level" functionality of ifs etc. and performance is pretty much equal to what inline asm would get you.

Share this post


Link to post
Share on other sites
Adam_42    3629
When using SIMD you're usually best off using branch free code, using [url="http://msdn.microsoft.com/en-us/library/w8kez9sf%28v=vs.71%29.aspx"]instructions like CMPLTPS[/url]. That lets you do four compares with one instruction, and even with the extra work of computing both answers and masking off the one you don't want it'll still be quicker than four compares and jumps most of the time (branches tend to be expensive on modern processors).

Share this post


Link to post
Share on other sites
Schinizer    122
Lets take this function for example:

[code]
int LineIntersect(
float x1, float y1,
float x2, float y2,
float x3, float y3,
float x4, float y4,
float *x, float *y)
{
float mua,mub;
float denom,numera,numerb;

denom = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);
numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);
numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);

/* Are the line coincident? */
if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS)
{
*x = (x1 + x2) / 2;
*y = (y1 + y2) / 2;
return(TRUE);
}

/* Are the line parallel */
if (ABS(denom) < EPS)
{
*x = 0;
*y = 0;
return(FALSE);
}

/* Is the intersection along the the segments */
mua = numera / denom;
mub = numerb / denom;

if (mua < 0 || mua > 1 || mub < 0 || mub > 1)
{
*x = 0;
*y = 0;
return(FALSE);
}

*x = x1 + mua * (x2 - x1);
*y = y1 + mua * (y2 - y1);
return(TRUE);
}
[/code]

After reading from the post by adam_42, am I right to say that the code below perform about the same as another version coded in pure asm? Perhaps only faster by few microseconds? (I know the copy operations make it slow but I'm lazy to change the floats here to a proper vector2. I'm just using this as an example)

[code]
int LineIntersect(
float x1, float y1,
float x2, float y2,
float x3, float y3,
float x4, float y4,
float *x, float *y)
{
float mua,mub;
float denom,numera,numerb;

float pt1[2] = {x1, y1};
float pt2[2] = {x2, y2};
float pt3[2] = {x3, y3};
float pt4[2] = {x4, y4};

//denom = (y4-y3) * (x2-x1) - (x4-x3) * (y2-y1);
//numera = (x4-x3) * (y1-y3) - (y4-y3) * (x1-x3);
//numerb = (x2-x1) * (y1-y3) - (y2-y1) * (x1-x3);

//Lets take this part as calculation for denom, numera and numerb
__asm
(
"..."
"..."
"..."
: "=r"(denom), "=r"(numera), "=r"(numerb)
: "o"(pt1), "o"(pt2), "o"(pt3), "o"(pt4)
);

/* Are the line coincident? */
if (ABS(numera) < EPS && ABS(numerb) < EPS && ABS(denom) < EPS)
{
//Lets take this part as calculation for intersection point
/**x = (x1 + x2) / 2;
*y = (y1 + y2) / 2;*/
__asm
(
"..."
"..."
"..."
: "=r"(x), "=r"(y)
);
return(TRUE);
}

/* Are the line parallel */
if (ABS(denom) < EPS) {
*x = 0;
*y = 0;
return(FALSE);
}

/* Is the intersection along the the segments */
/*mua = numera / denom;
mub = numerb / denom;*/

__asm
(
"..."
"..."
"..."
: "=r"(mua), "=r"(mub)
);

if (mua < 0 || mua > 1 || mub < 0 || mub > 1) {
*x = 0;
*y = 0;
return(FALSE);
}

//Lets take this part as calculation for intersection point
/**x = x1 + mua * (x2 - x1);
*y = y1 + mua * (y2 - y1);*/
__asm
(
"..."
"..."
"..."
: "=r"(x), "=r"(y)
);

return(TRUE);
}
[/code]

Share this post


Link to post
Share on other sites
Adam_42    3629
The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find http://software.intel.com/en-us/blogs/2009/08/12/parallelization-and-optimization-of-the-line-segment-intersection-problem/ interesting / useful.

Share this post


Link to post
Share on other sites
Schinizer    122
[quote name='Adam_42' timestamp='1312906703' post='4846755']
The only way to be certain of the performance of code is to measure it. Those measurements are most accurate when the code is being used in a real program and not a simple test harness, because of caches and other things that can have performance changing side effects.

By the way you may find [url="http://software.intel.com/en-us/blogs/2009/08/12/parallelization-and-optimization-of-the-line-segment-intersection-problem/"]http://software.inte...ection-problem/[/url] interesting / useful.
[/quote]

I see, I'll certainly go measure them. All your information helped me :cool:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this