# [Solved] 2D linear regression problem

This topic is 2296 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Edit: The problem has been solved, if anyone has the same problem see post #4.

I've been looking for a simple linear regression method, I've implemented one but it doesn't work the way I would like. Can someone either recommend me a free, easy to use C or C++ library which has least squares linear regression and tell me how to use it as well, or point out the mistake in my code?
At the moment my code doesn't use least squares, just a simple linear regression, but I don't think it should be a problem, should it?
This is the algorithm I've implemented: http://easycalculati...-regression.php

My problem with the current linear regression implementation:
The purple line fitted to the white points should be vertical, not horisontal:

However, if I swap x and y data, it works:

However, in the game where I use this function, I can't swap x and y, because the function must work every direction. Same problems in-game:
The purple line is fitted by my function to the white points inside the green box, but the correctly fitted line would be like the orange one which I have drawn by hand:

Here is my code:
(It's the algorithm from the link above, I've only added two checks not to divide by zero.)
[source lang="cpp"]//return: equation of line: y = *retA + (*retB)*x
//p is a vector containing the points, CVector2 is my class containing two double values, x and y
void gLinReg( std::vector< CVector2 >* p, double* retA, double* retB) {

unsigned int i;
std::vector<double> xy, x2;
double sx=0.0, sy=0.0, sxy=0.0, sx2=0.0;

//the number of values
unsigned int n = p->size();
if( n == 0 ) { return; }

//find X*Y and X^2 for all values
for(i=0; i<n; i++) {
xy.push_back( p->at(i).x*p->at(i).y );
x2.push_back( p->at(i).x*p->at(i).x );
}

//find sumx, sumy, sumxy, sumx2
for(i=0; i<n; i++) {
sx += p->at(i).x;
sy += p->at(i).y;
sxy += xy.at(i);
sx2 += x2.at(i);
}

//get slope
//B = (NSXY - (SX)(SY)) / (NSX2 - (SX)2)
double denom = (n*sx2 - sx*sx); //the denominator
//prevent dividing by 0 in the next step
if( denom == 0.0 ) {
*retB = signF((n*sxy - sx*sy))*100.0; //*100 instead of / (NSX2 - (SX)2) to get a big slope
//signF is just a sign function returning 1 or -1
}
else
*retB = (n*sxy - sx*sy) / (n*sx2 - sx*sx);

//get the interception point
//Intercept(a) = (SY - b(SX)) / N
*retA = (sy - (*retB)*sx) / n;

}[/source]

And Here is how I use it:
[source lang="cpp"]//...
std::vector< CVector2 > p;
p.push_back(CVector2(60,10));
p.push_back(CVector2(60,20));
p.push_back(CVector2(60,80));
p.push_back(CVector2(60,90));
p.push_back(CVector2(60,100));
p.push_back(CVector2(55,30));
p.push_back(CVector2(45,40));
p.push_back(CVector2(40,50));
p.push_back(CVector2(40,60));
p.push_back(CVector2(50,70));

int c;
double rA, rB;
gLinReg( &p, &rA, &rB);
drawLine( game.screen, rA, rB, 50);
for(c=0; c < p.size(); c++)

//...

//And here is how I draw the line for testing:

//px is the center of the line
void drawLine( SDL_Surface* dest, double A, double B, int px) {

CVector2 lineA, lineB;
lineA.set( px-50, A+B*(px-50));
lineB.set( px+50, A+B*(px+50));
lineRGBA( dest, lineA.x, lineA.y,
lineB.x, lineB.y, 0xFF, 0, 0xFF, 0xFF );
}[/source]

Vaclav

##### Share on other sites
I recommend lapack, or its C interface, lapacke. The function you are looking for is "LAPACK_dgels", http://www.netlib.org/lapack/lapacke.html

##### Share on other sites
The most common form of linear regression minimizes the sum of the squares of the distances between the line and the data points measured vertically. What you want is called total least squares, which minimizes the sum of the squares of the distances using both coordinates.

##### Share on other sites
@ alvaro Thank you, you're right. It's total least squares linear regression.

@ Spline Thanks, but that library and others I found don't mention total least squares. Also to me learning how to install, use and include these seem to be more time than writing one myself. Also I forgot to mention that my project is cross platform (windos and linux), I would need a cross platform library.

If anyone will need it later as well, after watching these videos I understand least squares regression much better:
 (parts 1-4) 

I've also found that total least squares regression is also called Deming regression: http://en.wikipedia....ming_regression

On this wiki page, at the Solution section, the formulas are very similar to my previous ones but they are extended.
So in the end I'll get the equotion on the line : y = ?0 + ?1x
I only have to calculate ?1 and ?0 from the given formulas, substituting the means and the sums. The only problem might be with delta in the formula.
About delta at the Specification section the wiki writes: "In practice the variance of the x and y parameters is often unknown which complicates the estimate of delta but where the measurement method for x and y is the same they are likely to be equal so that delta = 1 for this case." Though in my case the x and y values aren't measured ones.

EDIT: I've implemented the deming regression and it works exactly how I wanted it to (first post). Delta = 1 proved to be good. Edited by Vaclav

1. 1
2. 2
3. 3
Rutin
12
4. 4
5. 5

• 26
• 10
• 9
• 9
• 11
• ### Forum Statistics

• Total Topics
633695
• Total Posts
3013383
×