# Standard Error of Coefficient in Ordinary Least Squares

This topic is 3840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

In a least squares regression with multiple independent variables, how does one go about solving for the standard error of the different coefficients calculated? All the examples online seem to be specific for 1 dependent variable only, and don't seem to extrapolate to the case of multiple independent variables. Thanks.

##### Share on other sites
A nice way to solve a least-squares problem is to use QR factorisation:

Suppose A ∈ ℜm×n, x ∈ ℜn, b ∈ ℜm. For an overdetermined system, we have m > n. Define
|v|2 to be the Euclidean length of v. Then the least-squares problem is that of finding x to minimise |Ax - b|2. We can factorise

A = QR, where
Q ∈ &realm×m is orthogonal (QTQ = I),
R ∈ &realm×n is upper-triangular.

Then, as orthogonal transformations preserve the 2-norm, the following quantities are minimised simultaneously:

|Ax - b|2
|QRx - b|2
|Rx - QTb|2

So write RT = [R1 | R2]T for R1 ∈ &realn×n, R2 ∈ &real(m-n)×n (i.e. split off the upper square part of R - the transposes are only there for the sake of typing). Since R is upper-triangular, we now have R2 = 0;
Similarly split (QTb)T = [c | d]T, where c ∈ &realn, d ∈ &realm-n. Then the system can be split into two pieces, each of which must be minimised:

|R1x - QTc|2
|R2x - QTd|2

Now our work pays off. The first expression may be minimised to zero (provided A is nonsingular) as the quantity inside the norm is a square-matrix system. Moreover, it can be solved directly by back-substitution, since R1 is upper-triangular. This solves the linear least squares problem.

Also, the second expression simplifies (as R2 = 0) to

|QTd|2

Clearly, we can't do anything to minimise this further. Hence this value is exactly the error in the fit we just calculated.

I've made it all sound rather complicated, but most of my text is explanatory garnish. In summary:

1. QR factorise A
2. Truncate R to its upper square submatrix
3. Evaluate c = QTb
4. Solve the direct system with the new R and the upper part of c.
5. The size the remaining part of c is the error.

The entire algorithm is somewhere between O(m2) and O(mn), depending on the relative sizes of m and n.

If you've already got a least-squares system up-and-running, and don't want to rewrite the whole thing, then you can just compute the error manually, once you've established x:

error = |Ax - b|2

Hope this helps

##### Share on other sites
Well, I appreciate the huge write up, but I don't think we are on the same page... Currently, I am calculating it as,
x* = (A.transpose * A).inverse * (A.transpose * b)

x* holds the coefficients. Now, calculating the R^2 error is as simple as just summing the square of the residuals (measured b minus calculated b*).

The issue I am facing is the standard error of each coefficient, not the entire equation. I am at a total loss on how to do it, but somehow excel can calculate it for an ordinary least squares for the coefficient on each independent variable.

Or am I just misinterpretting your response?

Thanks for the help lately!

##### Share on other sites
Just sum the squares of the residuals maybe? Squaring each component of each residual individually instead of taking the dot product of the residual with itself.

Edit: what the heck was I thinking

[Edited by - Vorpy on June 10, 2007 12:34:28 PM]

##### Share on other sites
Vorpy, that is exactly what I said was not what I was looking for -- I don't want the error of the equation -- just the error of each coefficient individually.

Thanks.

##### Share on other sites
From my book statistics course,

Var(β*) = σ2 (tAA)-1

Where β* is the vector of estimated coefficients, σ is the theoretical standard error of the gaussian noise in the equation, and A is the matrix of observations.

You may want to use the empirical standard error σ* instead of the theoretical one to make the above evaluation feasible.

##### Share on other sites
Ah. Wonderful! Thank you sir! I am a bit confused by the statement 'standard error of the gaussian noise in the equation'. Isn't sigma squared just the variance? Can you explain what you mean?

Either way -- it seems like the exact equation I want! What did you find it under?

##### Share on other sites
Rethinking that equation, it doesn't really make much sense...

Assume I have an equation of the sort: y = B1 + B2x + B3y
With 4 cases of observations. This makes the observation matrix a 4x3, which means the transpose is a 3x4. 3x4 * 4x3 = 3x3.

So the variance of 3x1 = sigma^2 * 3x3?

Even if sigma^2 were a vector, it simply wouldn't work. How can I get a 3x1 back? I can't!

I am very, very confused.

##### Share on other sites
Ah, a thought (perhaps)! Maybe that 3x3 is actually what I want after all. The variance of each coefficient should be the (i,i) part of the matrix, with covariances between variables filling the rest of the spots...

##### Share on other sites
Quote:
 Original post by visageAh, a thought (perhaps)! Maybe that 3x3 is actually what I want after all. The variance of each coefficient should be the (i,i) part of the matrix, with covariances between variables filling the rest of the spots...

Yes. The variance of a vector is a variance-covariance matrix.

##### Share on other sites
Well, now that I can get the variance ... how can I translate this into the t-stat or the standard error? Nobody seems to actually know how to calculate these guys from a ordinary least squares regression, and yet they are standard in the excel output. It is getting rather frustrating.

Thanks for the help.

##### Share on other sites
Student's t test comes from the normality of the estimated coefficients. The standard formula is for testing whether βk = a, where a a fixed real number and βk is the real parameter (the value of which is unknown, since only the estimated βk* parameter is known). Because of normality:

βk* ~ N(βk, σ2)

Which, under the hypothesis of equality to a (which we are testing) means that the following transform follows the normal distribution:

k* - a) / σ2 ~ N(0,1)

As before, σ is unknown (because we don't have access to the underlying distribution of the error, only the observed one) and we must examine the same equation with σ* instead (which, if you remember, is the empirical standard deviation, which you computed from your observation set). I don't have the justification right here, but it follows a Student law with n-p degrees of liberty (n the number of observations, p the number of parameters):

T* = (βk* - a) / σ*2 ~ Tn-p

Therefore, you have estimated an empirical T (your t-test value), and you must now compare it against the distribution function of the Student law to determine how likely it is to observe your particular empirical measure.

EDIT: and σ is the standard deviation of the parameter βk*, not of the entire model, as implied by the first equation above.

##### Share on other sites
ToohrVyk, you are being a real life saver here. Thanks for the hand holding.

Truth be told, all along, I have been trying to solve for the t-value, which is normally solved as Coefficient / Standard Error.

So my question here is, without knowing T* or a, is there a way to find one of them?

My search continues on! Thanks for the lead!

##### Share on other sites
Quote:
 Original post by visageSo my question here is, without knowing T* or a, is there a way to find one of them?

Well, finding the t-value without knowing a is silly. The reason is that the t-value allows you to determine how likely it is for the real-world (unknown) coefficient βk to be equal to a given a. That is, you (the stastician) choose any a you wish, and you get the t-value for the hypothesis βk = a (and since the Student law distribution is known, you know how likely it is for you to observe the empirical t-value if the hypothesis were true, so that you may reject it). Long stories short, you don't have to find or compute a, you have to choose it. The t-value is computed as mentioned above:

T*(a) = (βk* - a) / σ*2

The typical t-value, the one displayed in most statistical analysis programs, is the one for a = 0, because it's very frequent for statisticians to choose a = 0 (since a zero coefficient means there is no impact of the descriptive variable on the explained variable, which is something interesting to know). However, there are as many t-values as there are possible values for a (and you can choose any real you wish).

In the case of a = 0, the t-value becomes equal to:

T*(0) = βk* / σ*2

That is, coefficient over squared standard error (yes, the standard error is usually the empirical standard deviation of the noise in your model, as opposed to the theoretical one, noted σ).

##### Share on other sites
Herm -- yeah, it doesn't make much sense, huh? Unfortunately, a=0 isn't really what I am looking for. I basically did a regression in excel and am trying to figure out how it solved for the standard error and the t-statistic for each coefficient -- but all the pieces just seem to be missing. Whenever I find any sites about it, they simply say 't-stat = coefficient / standard error' -- and don't explain how to find the t-stat or the standard error.

Unfortunately, I don't know a in this situation either.

##### Share on other sites

This topic is 3840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.