Standard Error of Coefficient in Ordinary Least Squares

Started by
13 comments, last by choffstein 16 years, 10 months ago
In a least squares regression with multiple independent variables, how does one go about solving for the standard error of the different coefficients calculated? All the examples online seem to be specific for 1 dependent variable only, and don't seem to extrapolate to the case of multiple independent variables. Thanks.
Advertisement
A nice way to solve a least-squares problem is to use QR factorisation:

Suppose A ∈ ℜm×n, x ∈ ℜn, b ∈ ℜm. For an overdetermined system, we have m > n. Define
|v|2 to be the Euclidean length of v. Then the least-squares problem is that of finding x to minimise |Ax - b|2. We can factorise

A = QR, where
Q ∈ ℜm×m is orthogonal (QTQ = I),
R ∈ ℜm×n is upper-triangular.

Then, as orthogonal transformations preserve the 2-norm, the following quantities are minimised simultaneously:

|Ax - b|2
|QRx - b|2
|Rx - QTb|2

So write RT = [R1 | R2]T for R1 ∈ ℜn×n, R2 ∈ ℜ(m-n)×n (i.e. split off the upper square part of R - the transposes are only there for the sake of typing). Since R is upper-triangular, we now have R2 = 0;
Similarly split (QTb)T = [c | d]T, where c ∈ ℜn, d ∈ ℜm-n. Then the system can be split into two pieces, each of which must be minimised:

|R1x - QTc|2
|R2x - QTd|2

Now our work pays off. The first expression may be minimised to zero (provided A is nonsingular) as the quantity inside the norm is a square-matrix system. Moreover, it can be solved directly by back-substitution, since R1 is upper-triangular. This solves the linear least squares problem.

Also, the second expression simplifies (as R2 = 0) to

|QTd|2

Clearly, we can't do anything to minimise this further. Hence this value is exactly the error in the fit we just calculated.

I've made it all sound rather complicated, but most of my text is explanatory garnish. In summary:

1. QR factorise A
2. Truncate R to its upper square submatrix
3. Evaluate c = QTb
4. Solve the direct system with the new R and the upper part of c.
5. The size the remaining part of c is the error.

The entire algorithm is somewhere between O(m2) and O(mn), depending on the relative sizes of m and n.


If you've already got a least-squares system up-and-running, and don't want to rewrite the whole thing, then you can just compute the error manually, once you've established x:

error = |Ax - b|2

Hope this helps
Admiral
Ring3 Circus - Diary of a programmer, journal of a hacker.
Well, I appreciate the huge write up, but I don't think we are on the same page... Currently, I am calculating it as,
x* = (A.transpose * A).inverse * (A.transpose * b)

x* holds the coefficients. Now, calculating the R^2 error is as simple as just summing the square of the residuals (measured b minus calculated b*).

The issue I am facing is the standard error of each coefficient, not the entire equation. I am at a total loss on how to do it, but somehow excel can calculate it for an ordinary least squares for the coefficient on each independent variable.

Or am I just misinterpretting your response?

Thanks for the help lately!
Just sum the squares of the residuals maybe? Squaring each component of each residual individually instead of taking the dot product of the residual with itself.

Edit: what the heck was I thinking

[Edited by - Vorpy on June 10, 2007 12:34:28 PM]
Vorpy, that is exactly what I said was not what I was looking for -- I don't want the error of the equation -- just the error of each coefficient individually.

Thanks.
From my book statistics course,

Var(β*) = σ2 (tAA)-1

Where β* is the vector of estimated coefficients, σ is the theoretical standard error of the gaussian noise in the equation, and A is the matrix of observations.

You may want to use the empirical standard error σ* instead of the theoretical one to make the above evaluation feasible.
Ah. Wonderful! Thank you sir! I am a bit confused by the statement 'standard error of the gaussian noise in the equation'. Isn't sigma squared just the variance? Can you explain what you mean?

Either way -- it seems like the exact equation I want! What did you find it under?
Rethinking that equation, it doesn't really make much sense...

Assume I have an equation of the sort: y = B1 + B2x + B3y
With 4 cases of observations. This makes the observation matrix a 4x3, which means the transpose is a 3x4. 3x4 * 4x3 = 3x3.

So the variance of 3x1 = sigma^2 * 3x3?

Even if sigma^2 were a vector, it simply wouldn't work. How can I get a 3x1 back? I can't!

I am very, very confused.
Ah, a thought (perhaps)! Maybe that 3x3 is actually what I want after all. The variance of each coefficient should be the (i,i) part of the matrix, with covariances between variables filling the rest of the spots...
Quote:Original post by visage
Ah, a thought (perhaps)! Maybe that 3x3 is actually what I want after all. The variance of each coefficient should be the (i,i) part of the matrix, with covariances between variables filling the rest of the spots...


Yes. The variance of a vector is a variance-covariance matrix.

This topic is closed to new replies.

Advertisement