Sign in to follow this  
synth_cat

why is this function so slow?

Recommended Posts

Hello all - I wrote a function for determining the intersection between two lines. In a certain section of my game, this function is called no more than maybe one-hundred times. This causes a dramatic fps drop from about 140 to 80. I can't seem to figure exactly why this is happening
bool SynthGame::LineLineIntersect(const D3DXVECTOR2& p1, const D3DXVECTOR2& p2,
	               const D3DXVECTOR2& p3, const D3DXVECTOR2& p4,
				   D3DXVECTOR2* result)
{
	D3DXVECTOR2 mov_vec=p2-p1;
    if(!(D3DXVec2Length(&mov_vec)>0)) return false;
	mov_vec=p4-p3;
    if(!(D3DXVec2Length(&mov_vec)>0)) return false;

	float x1=p1.x;
	float y1=p1.y;
	float x2=p2.x;
	float y2=p2.y;
	float x3=p3.x;
	float y3=p3.y;
	float x4=p4.x;
	float y4=p4.y;

	float denom=0;
	float ua=0;
	float ub=0;

	bool intersection=false;

	//define denominator 
	denom=(y4-y3)*(x2-x1)-(x4-x3)*(y2-y1);

	//check if lines are not parallel
	if(denom)
	{
		intersection=true;
		ua=( (x4-x3)*(y1-y3)-(y4-y3)*(x1-x3) )/denom;
		ub=( (x2-x1)*(y1-y3)-(y2-y1)*(x1-x3) )/denom;

		(*result).x=x1+ua*(x2-x1);
		(*result).y=y1+ua*(y2-y1);
	
	}

	return intersection;
}



Note: The fps goes back to normal if I comment out the line denom=(y4-y3)*(x2-x1)-(x4-x3)*(y2-y1); Can anyone explain why this function is so slow? Thanks for any help! -synth_cat

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
If you comment out that line, the denominator becomes zero and your program should explode, not run faster.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
So that is your reason. You skip all the bulk of the mathematics in the function, and when that occurs it runs faster. You just need to find a more efficient method.

Share this post


Link to post
Share on other sites
That code doesn't look like it should bog down the CPU at all. You should be able to do that thousands of times a frame easily.

My guess is that what you do as a result of when this function returns true is what is causing it to slow down. Does your execution follow a different branch where this function is called if it returns true rather than false?

Share this post


Link to post
Share on other sites
Quote:

This causes a dramatic fps drop from about 140 to 80


Try actually timing the function; don't use FPS as a benchmark. Tell us exactly (or as close as you can get) how many ms the function takes, and we will see if it's actually slow or if it's something else.

Like the previous poster said, I bet you'll find the function itself is fast. Just because you skip one little part of the code doesn't mean that's what's causing the slowdown, it just means it results in a state where the slow code is being executed.

When you force denom to always be 0, you always return false, which means lines never intersect. I'm sure this means you are skipping some other intensive parts of your code that deal with collision. Show us the part of your code that deals with the intersection, until then we can't really tell you what's so slow [smile]

Share this post


Link to post
Share on other sites
You can easily double the speed (at least) of this function in 2 ways.

First off, you're checking that the distance between 2 points is greater than 0. You should really just be comparing the X and Y co-ordinates of the vectors, and save yourself a square root (or 2 in your case)

The other thing I notice is you're creating 8 local variables that serve virtually no purpose, aside from making things more readible. Although it could be considered "splitting hairs", each variable you create on the local stack is a few cycles to create, and initialize. Since you're at the optimization stage, you should really consider changing these to pre-calculations, for example:

const float x4x3 = (p4.x-p3.x);
const float y1y3 = (p1.y-p3.y);
const float y4y3 = (p4.y-p3.y);
const float x1x3 = (p1.x-p3.x);
const float x2x1 = (p2.x-p1.x);
const float y2y1 = (p2.y-p1.y);

and then sub these into your equations:

denom = y4y3 * x2x1 - x4x3 * y2y1;
...
ua=( x4x3 * y1y3 - y4y3 * x1x3 )/denom;
ub=( x2x1 * y1y3 - y2y1 * x1x3 )/denom;
result->x = p1.x + ua * x2x1;
result->y = p1.y + ua * y2y1;


Also adding the const to the variables allows the compiler to make assumptions, and potential optimizations. It may be irrelevant in this case, but it's good practice, and safer.

Anyway, that covers some of the basic code optimizations. You could probably simplify the math calculations themselves, and save yourself some multiplications, or divisions. But if you make the changes I mentioned above, you'll double the speed of this routine.

Hope that helps!

Andrew

Share this post


Link to post
Share on other sites
Quote:
Original post by arudson
You can easily double the speed (at least) of this function in 2 ways.

I don't believe you.
Quote:
Original post by arudson
First off, you're checking that the distance between 2 points is greater than 0. You should really just be comparing the X and Y co-ordinates of the vectors, and save yourself a square root (or 2 in your case)

Reasonable enough.
Quote:
Original post by arudson
...each variable you create on the local stack is a few cycles to create...

No they don't.
Quote:

...and initialize. Since you're at the optimization stage, you should really consider changing these to pre-calculations, for example:

const float x4x3 = (p4.x-p3.x);
const float y1y3 = (p1.y-p3.y);
const float y4y3 = (p4.y-p3.y);
const float x1x3 = (p1.x-p3.x);
const float x2x1 = (p2.x-p1.x);
const float y2y1 = (p2.y-p1.y);

and then sub these into your equations:

denom = y4y3 * x2x1 - x4x3 * y2y1;
...
ua=( x4x3 * y1y3 - y4y3 * x1x3 )/denom;
ub=( x2x1 * y1y3 - y2y1 * x1x3 )/denom;
result->x = p1.x + ua * x2x1;
result->y = p1.y + ua * y2y1;

The compiler has to perform those calculations in advance anyhow. Specifying them explicitly provides nominal additional information, at best.
Quote:
Original post by arudson
Also adding the const to the variables allows the compiler to make assumptions, and potential optimizations. It may be irrelevant in this case, but it's good practice, and safer.

No it isn't. The compiler learns nothing from a const modifier in this case, as it is incapable of making any assumptions it couldn't figure out on its own anyhow. That just leaves personal safety, and if your routines are so convoluted that you aren't sure you're not changing a local variable, then perhaps you have other problems.
Quote:
Original post by arudson
Anyway, that covers some of the basic code optimizations. You could probably simplify the math calculations themselves, and save yourself some multiplications, or divisions. But if you make the changes I mentioned above, you'll double the speed of this routine.

Again, I don't believe you.

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
No it isn't. The compiler learns nothing from a const modifier in this case, as it is incapable of making any assumptions it couldn't figure out on its own anyhow. That just leaves personal safety, and if your routines are so convoluted that you aren't sure you're not changing a local variable, then perhaps you have other problems.
CM


There are compilers that totally suck at generating decent math code. By writing out your math code one operation per line with const modifiers and by separating load instructions from store instructions can easily double the speed of the code snippet. True, sometimes this kind of code starts looking convoluted and assembly-like but it's worth it, especially during an optimization stage. Arudson, I believe you.

Share this post


Link to post
Share on other sites
Quote:
Original post by deathkrush
There are compilers that totally suck at generating decent math code. By writing out your math code one operation per line with const modifiers and by separating load instructions from store instructions can easily double the speed of the code snippet. True, sometimes this kind of code starts looking convoluted and assembly-like but it's worth it, especially during an optimization stage. Arudson, I believe you.

Any modern compiler that bothers with an optimization pass or two will be completely unaffected by his suggestions. Until the OP suggests he's programming for a cell phone, recommending that he rewrite perfectly functional code is foolish. Working code is better than code that is theoretically faster on a system he probably isn't even developing for.

CM

Share this post


Link to post
Share on other sites
I'm not going to claim how much faster this would make your function, but those sqrts within the D3DXVec2Length are definately going to be the slowest parts. Second to that are the two divisions, when you can do it with only one.

What I would do:
1. Eliminate those early tests. Early outs are only good if they are fast. Yours are very slow. It can probably calculate the denominator and test it against zero, fast enough to not need to try and abort early, even if you do it without sqrts. This is also good if most likely lines are not parallel, which should be the case.
2. Tidy it up and shorten it a lot removing unnecessary variables, and make some of the logic clearer. This may or may not help the compiler, but it certainly helps the reader.
3. No need to initialise variables if you always change the value before it is used anyway. You may as well not declare it until you're ready to initialise it.
4. Get rid of the unnecessary boolean.
5. I see no reason not to make the out parameter a reference. If you don't do that however, then I feel that you're obliged to assert that it isn't NULL, because the function assumes this it isn't NULL.
6. This looks like a member function. If so, you should make it static.
7. You are using ua twice, and never use ub. Perhaps that is a typo and one of the lines below should use ub? If not, what is ub for?


static bool SynthGame::LineLineIntersect(const D3DXVECTOR2& p1, const D3DXVECTOR2& p2,
const D3DXVECTOR2& p3, const D3DXVECTOR2& p4,
D3DXVECTOR2& result)
{
D3DXVECTOR2 vec1 = p2-p1;
D3DXVECTOR2 vec2 = p4-p3;

//define denominator
float denom = vec2.y*vec1.x - vec2.x*vec1.y;

//check if lines are not parallel
if (denom)
{
denom = 1.0/denom;

D3DXVECTOR2 vec3=p1-p3;
float ua=(vec2.x*vec3.y - vec2.y*vec3.x)*denom;
float ub=(vec1.x*vec3.y - vec1.y*vec3.x)*denom;

result.x = p1.x + ua*vec1.x;
result.y = p1.y + ua*vec1.y;
return true;
}
return false;
}


[Edited by - iMalc on October 4, 2006 2:59:36 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
Any modern compiler that bothers with an optimization pass or two will be completely unaffected by his suggestions. Until the OP suggests he's programming for a cell phone, recommending that he rewrite perfectly functional code is foolish. Working code is better than code that is theoretically faster on a system he probably isn't even developing for.
CM


That's true, but not everyone is blessed with a smart compiler. I'm working on a game for a certain PowerPC architecture and the compiler (GCC) is completely braindead when it comes to generating efficient floating point code. Code rearrangements like the one Arudson posted help out a great deal. That being said, I see no reason for something like that on Windows/MSVC++.

Share this post


Link to post
Share on other sites
While I agree with Conner McCloud's analysis of the performance benefits, I would still make most of those changes for *cleanliness* - and a few more.

iMalc's version is pretty good, but assumes D3DXVECTOR2 has the appropriate math operators overloaded (which I'm not at all sure of) and I'd structure things slightly differently:


bool SynthGame::LineLineIntersect(
const D3DXVECTOR2& p1, const D3DXVECTOR2& p2,
const D3DXVECTOR2& p3, const D3DXVECTOR2& p4,
D3DXVECTOR2& result) { // Just return by reference, yeah?

float x4x3 = p4.x - p3.x;
float x2x1 = p2.x - p1.x;

float y4y3 = p4.y - p3.y;
float y2y1 = p2.y - p1.y;

// We don't actually really need a check for p1 == p2 or p3 == p4 at all;
// in these cases, the denominator calculation will yield 0 automatically.
// So, optimization-wise, we shouldn't be adding a check for it unless we
// expect it to be a common case!

// Don't initialize variables to a stupid value and then immediately
// assign afterwards. Initialize with the initial value. Duh.
float denom = y4y3 * x2x1 - x4x3 * y2y1;

// We shouldn't compare to 0 directly.
const float EPSILON = 1.0e-6; // or whatever

// If this fails, either one or both lines is a point (or too short to
// work with), or the lines are parallel.
if (denom < EPSILON && denom > -EPSILON) { return false; }

// The optimization iMalc mentioned.
float denom_inv = 1.0f / denom;

float x1x3 = p1.x - p3.x;
float y1y3 = p1.y - p3.y;

// Optional: inline the ua and ub calculations, if only because
// the names 'ua' and 'ub' are not very meaningful ;)
result.x = x1 + (x4x3 * y1y3 - y4y3 * x1x3) * denom_inv * x2x1;
// I'm assuming the 'ua' that was here before should be 'ub'
result.y = y1 + (x2x1 * y1y3 - y2y1 * x1x3) * denom_inv * y2y1;

return true;
}

Share this post


Link to post
Share on other sites
Seeing as you're only calling it ~100 times per frame, Agony was probably correct.

Quote:
My guess is that what you do as a result of when this function returns true is what is causing it to slow down. Does your execution follow a different branch where this function is called if it returns true rather than false?


Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Conner McCloud
Quote:
Original post by arudson
You can easily double the speed (at least) of this function in 2 ways.

I don't believe you.
Quote:
Original post by arudson
First off, you're checking that the distance between 2 points is greater than 0. You should really just be comparing the X and Y co-ordinates of the vectors, and save yourself a square root (or 2 in your case)

Reasonable enough.
Already getting rid of the distance calculations more than doubles the speed for me, instrisics and all optimizations enabled on VS2003, running on Athlon XP. So why won't you believe arudson in that doubling the speed is possible, hmm? Surprisingly enough, also his other suggestion led to a (minor) speedup.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by Conner McCloud
Quote:
Original post by arudson
You can easily double the speed (at least) of this function in 2 ways.

I don't believe you.
Quote:
Original post by arudson
First off, you're checking that the distance between 2 points is greater than 0. You should really just be comparing the X and Y co-ordinates of the vectors, and save yourself a square root (or 2 in your case)

Reasonable enough.
Already getting rid of the distance calculations more than doubles the speed for me, instrisics and all optimizations enabled on VS2003, running on Athlon XP. So why won't you believe arudson in that doubling the speed is possible, hmm? Surprisingly enough, also his other suggestion led to a (minor) speedup.


Because he knows something about how compilers work.

Anyway, you might double the speed of the function, but it still can't really be your bottleneck. If it were the only thing responsible for a slowdown from 140 FPS to 80 FPS, that would imply that executing it "maybe 100 times" required 1/80 - 1/140 seconds. That works out to at least 60 microseconds per call. A floating point division takes no more than (being very generous, I suspect) 99 cycles at 3 billion cycles per second == 33 *nano*seconds.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by deathkrush
That's true, but not everyone is blessed with a smart compiler. I'm working on a game for a certain PowerPC architecture and the compiler (GCC) is completely braindead when it comes to generating efficient floating point code.


Fortunately, gcc -being an opensource project- enables people (like you) who find its output "braindead" to contribute in a meaningful fashion by providing the corresponding patches.

If you check out the gcc mailing lists, you'll find that we are quite aware of the limitations of the current floating point code, however unlike you we may happen to realize that addressing this issue requires some more "brain" than simply telling that something is "braindead"...

So, you are seriously encouraged to constructively address this issue in a fashion that's worthy of being included in future versions of gcc.

PTC

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by deathkrush
That's true, but not everyone is blessed with a smart compiler. I'm working on a game for a certain PowerPC architecture and the compiler (GCC) is completely braindead when it comes to generating efficient floating point code.


Fortunately, gcc -being an opensource project- enables people (like you) who find its output "braindead" to contribute in a meaningful fashion by providing the corresponding patches.

If you check out the gcc mailing lists, you'll find that we are quite aware of the limitations of the current floating point code, however unlike you we may happen to realize that addressing this issue requires some more "brain" than simply telling that something is "braindead"...


Come on... he's quite entitled to vent his frustration when working on an optimisation problem without feeling obliged to learn how to write compilers. I'm sure everybody appreciates that fixing it is not necessarily trivial, but that is not the original poster's problem.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by arudson
...each variable you create on the local stack is a few cycles to create...

No they don't.

Yea, they do, especially for floats. You've gotta move it in and out of registers for assigning values, push it down on the stack, and derefernce it.

Share this post


Link to post
Share on other sites
Quote:

Original post by Conner McCloud:
I don't believe you.


Try it. And excuse me for saying so, but nobody asked what you believe. :)

As for the rest of your response, I'm just going to assume your intentions were less to help synth_cat, and more to start some kind of religious war. Well done, enjoy.

synth_cat: glad to hear my suggestions helped you. Cheers.

Share this post


Link to post
Share on other sites
Quote:
Original post by arudson
Try it. And excuse me for saying so, but nobody asked what you believe. :)

As for the rest of your response, I'm just going to assume your intentions were less to help synth_cat, and more to start some kind of religious war. Well done, enjoy.

synth_cat: glad to hear my suggestions helped you. Cheers.
His intentions were to stop anybody taking onboard your erroneous suggestions. Only the replacement of the call to the length function with x and y comparisons has any value (and is a significant speedup). On any non-braindead compiler running with optimisations enabled the suggestions you made regarding precalculations and const will have no impact on the runtime performance of the code. None. Nada. Zilch. Don't believe me?
Original Code (~64M calls):
Borland 5.82: 7.796 seconds
GCC 3.3.1: 3.921 seconds
Visual C++ 8.0: 4.968 seconds

"Improved" Code (~64M calls):
Borland 5.82: 7.781 seconds
GCC 3.3.1: 3.953 seconds
Visual C++ 8.0: 4.937 seconds
Σnigma

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yah, he was probably seeing improvements on a debug build with optimizations disabled...

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
While I agree with Conner McCloud's analysis of the performance benefits, I would still make most of those changes for *cleanliness* - and a few more.

iMalc's version is pretty good, but assumes D3DXVECTOR2 has the appropriate math operators overloaded (which I'm not at all sure of) and I'd structure things slightly differently:


bool SynthGame::LineLineIntersect(
const D3DXVECTOR2& p1, const D3DXVECTOR2& p2,
const D3DXVECTOR2& p3, const D3DXVECTOR2& p4,
D3DXVECTOR2& result) { // Just return by reference, yeah?

float x4x3 = p4.x - p3.x;
float x2x1 = p2.x - p1.x;

float y4y3 = p4.y - p3.y;
float y2y1 = p2.y - p1.y;

// We don't actually really need a check for p1 == p2 or p3 == p4 at all;
// in these cases, the denominator calculation will yield 0 automatically.
// So, optimization-wise, we shouldn't be adding a check for it unless we
// expect it to be a common case!

// Don't initialize variables to a stupid value and then immediately
// assign afterwards. Initialize with the initial value. Duh.
float denom = y4y3 * x2x1 - x4x3 * y2y1;

// We shouldn't compare to 0 directly.
const float EPSILON = 1.0e-6; // or whatever

// If this fails, either one or both lines is a point (or too short to
// work with), or the lines are parallel.
if (denom < EPSILON && denom > -EPSILON) { return false; }

// The optimization iMalc mentioned.
float denom_inv = 1.0f / denom;

float x1x3 = p1.x - p3.x;
float y1y3 = p1.y - p3.y;

// Optional: inline the ua and ub calculations, if only because
// the names 'ua' and 'ub' are not very meaningful ;)
result.x = x1 + (x4x3 * y1y3 - y4y3 * x1x3) * denom_inv * x2x1;
// I'm assuming the 'ua' that was here before should be 'ub'
result.y = y1 + (x2x1 * y1y3 - y2y1 * x1x3) * denom_inv * y2y1;

return true;
}
Yeah I must admit I know nothing of D3DXVECTOR2 but, the first line of the OP's code assumes that a subtraction operator is defined. I don't know if it uses doubles or floats either.

I nearly did the epsilon thing too. Not sure why I didn't now.

I would have inlined the ua and ub calculations, until I saw that ub wasn't being used in the original code, which threw me a bit.

The 'static' put at the front of my version was a mistake btw. What I meant is that you'd make it static in the header file.


The point somebody made about this not being the bottleneck but that fact that if it returns false all the time you do less work elsewhere, is probably very true.

Share this post


Link to post
Share on other sites
I just cleaned it up a bit...


bool LineLineIntersect(const D3DXVECTOR2& p1,const D3DXVECTOR2& p2,const D3DXVECTOR2& p3,const D3DXVECTOR2& p4,D3DXVECTOR2* result)
{
D3DXVECTOR2 mov_vec = {p2.x - p1.x, p2.x - p1.x};

if((D3DXVec2Length(&mov_vec)<=0)) return false;

mov_vec.x = p4.x - p3.x;
mov_vec.y = p4.y - p3.y;

if((D3DXVec2Length(&mov_vec)<=0)) return false;

float denom, ua, ub;

//define denominator
denom = (p4.y-p3.y)*(p2.x-p1.x)-(p4.x-p3.x)*(p2.y-p1.y);

//check if lines are not parallel
if(denom)
{
ua = ((p4.x-p3.x)*(p1.y-p3.y)-(p4.y-p3.y)*(p1.x-p3.x))/denom;
ub = ((p2.x-p1.x)*(p1.y-p3.y)-(p2.y-p1.y)*(p1.x-p3.x))/denom;

result->x = p1.x + ua * (p2.x-p1.x);
result->y = p1.y + ua * (p2.y-p1.y);
return true;
}

return false;
}



That should work better, maybe a bit faster. I'm used to having a crappy compiler, so forgive the optimizations that I put in that it might already do...

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by dbzprogrammer
I just cleaned it up a bit...
Better versions have already been given in this thread.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this