# solving an equation in least square,

This topic is 2622 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Suppose A is a matrix, b and x are column vectors . Then what it means to say solving Ax =b in a least square sense.
Here in my case, no of equations is not equal to no of unknowns.

When no of unknowns and no of equations, we solve using the LU Decompostion and Conjugate Gradient method.
How can solve using linear least square fashion?

##### Share on other sites
Solving the least squares problem means finding the x such that ||A * x - b|| is as small as possible. Usually you'll have more equations than unknowns. The way I solve it is by multiplying both sides of the equation by transpose(A) on the left:

transpose(A) * A * x = transpose(A) * b

Now you have as many equations as unknowns, and you can proceed to use LU decomposition or any other method.

##### Share on other sites
That only works, though, if A[sup]T[/sup]A is full rank (often it isn't). And even if it is full rank, if the matrix A is large, it's not very numerically stable.

I think the best trade off in practice between stability and computational cost is the "complete orthogonal transformation". You can use two QR decompositions with column pivoting to create it. This is something LAPACK can do: http://www.netlib.or...lug/node43.html

##### Share on other sites

That only works, though, if A[sup]T[/sup]A is full rank (often it isn't).

If A[sup]T[/sup]A doesn't have full rank, it means the solution is not unique, so any other technique will have the same problems. If I am wrong about this, I would love to get a detailed explanation.

And even if it is full rank, if the matrix A is large, it's not very numerically stable.[/quote]

I have heard that criticism before, but this has never become an issue in practice in my experience. Perhaps this bites more in other applications.

##### Share on other sites

If A[sup]T[/sup]A doesn't have full rank, it means the solution is not unique, so any other technique will have the same problems. If I am wrong about this, I would love to get a detailed explanation.

Although the optimum isn't unique, you can restrict yourself to a unique minimum by choosing the minimizer that itself has smallest norm. This means choosing a minimizer orthogonal to A's nullspace -- i.e., in the row space of A.

In addition to this condition, we also have the requirement "A[sup]T[/sup] (Ax - b) = 0 ," which you already mentioned, that the residual be orthogonal to the range of A.

If we construct linearly-independent bases for each of these spaces -- e.g., by Gram-Schmidt / QR -- we can find a unique x satisfying both of these conditions.

Let the columns of Q[sub]1[/sub] be such a basis for the range of A, and those of Q[sub]2[/sub] a basis for the row space of A. With these in hand, our requirements can be expressed,
1.) x = Q[sub]2[/sub] z, for some z
2.) Q[sub]1[/sub][sup]T[/sup](A x - b) = 0 .

We can solve for z and reconstruct x from it, as

x = Q[sub]2[/sub] (Q[sub]1[/sub][sup]T[/sup]A Q[sub]2[/sub])[sup]-1[/sup] Q[sub]1[/sub][sup]T[/sup]b .

The matrix we need to invert, M = Q[sub]1[/sub][sup]T[/sup]A Q[sub]2[/sub], has dimension,

[# columns of Q1] x [# columns of Q2]

or

dim(col(A)) x dim(row(A)) .

By the fundamental theorem of linear algebra, these two dimensions are equal, so the matrix is square. And it's easy enough to see that M has neither a left- nor a right- (nontrivial) nullspace.

I am not super-familiar with the numerical method Numsgil describes, but I assume it works along these lines. The technique I have seen is the following: Let A = U S V[sup]T[/sup] be the singular value decomposition of A, and S' be the matrix you get by replacing each nonzero singular value by its reciprocal. Then construct the pseudoinverse as A' = V S' U[sup]T[/sup]. The solution is x = A' b. I don't know how the efficiency of this compares to the complete orthogonal factorization. It's possible that computing the full SVD is doing more work than you need to, but it still seems like a common approach.

@rajesh: Also, using the conjugate gradient method on "A[sup]T[/sup] A x = A[sup]T[/sup]b" returns the same thing as the above if you start with an initial guess of zero (or anything else in row(A)). That's because your gradients are all going to be orthogonal to null(A), so if you start orthogonal to null(A), you'll stay orthogonal to it. And you can implement this without ever explicitly constructing A[sup]T[/sup] A.

This page (thanks, Numsgil) tells you everything LAPACK has for you (including the complete orthogonal factorization based method, and others). And my favorite linear algebra library, Eigen, describes how to efficiently solve these problems (using the SVD method) here.

##### Share on other sites

Although the optimum isn't unique, you can restrict yourself to a unique minimum by choosing the minimizer that itself has smallest norm. This means choosing a minimizer orthogonal to A's nullspace -- i.e., in the row space of A.

I am not super-familiar with the numerical method Numsgil describes, but I assume it works along these lines. The technique I have seen is the following: Let A = U S V[sup]T[/sup] be the singular value decomposition of A, and S' be the matrix you get by replacing each nonzero singular value by its reciprocal. Then construct the pseudoinverse as A' = V S' U[sup]T[/sup]. The solution is x = A' b. Although this works, I assume that computing the full SVD is doing more work than you need to.

That way of calculating the pseudo-inverse appears correct. The solution you get is the one with the minimum norm you mentioned in the first quote. There are, however, other solutions that satisfies other definitions of "minimum", one common one being the minimum non-zero solution.

As an interesting side-note to this, which is something many Matlab users appear to be unaware of, is the difference between the various ways of solving equation systems in Matlab.
1. x = inv(A)*b
2. x = pinv(A)*b
3. x = A\b
All three are equivalent for square and full-rank matrices A, and the only case where 1 is possible. For overdetermined systems, solutions 2 and 3 are equivalent and is the least-squares solution. But for underdetermined systems, 2 gives the minimum norm solution, while 3 gives the minimum non-zero solution. The minimum non-zero solution is the solution with maximum number of zero-elements in the solution vector x.

But that was just a side-note, that there are many solutions and different methods give different solutions.

##### Share on other sites

That way of calculating the pseudo-inverse appears correct. The solution you get is the one with the minimum norm you mentioned in the first quote. There are, however, other solutions that satisfies other definitions of "minimum", one common one being the minimum non-zero solution.

As an interesting side-note to this, which is something many Matlab users appear to be unaware of, is the difference between the various ways of solving equation systems in Matlab.
1. x = inv(A)*b
2. x = pinv(A)*b
3. x = A\b
All three are equivalent for square and full-rank matrices A, and the only case where 1 is possible. For overdetermined systems, solutions 2 and 3 are equivalent and is the least-squares solution. But for underdetermined systems, 2 gives the minimum norm solution, while 3 gives the minimum non-zero solution. The minimum non-zero solution is the solution with maximum number of zero-elements in the solution vector x.

Woah, thanks. I'd only heard of minimum-norm solutions, and didn't realize #2 and #3 were different in the underdetermined case. (It's cool you can minimize the 0-norm at all; normally the best I usually expect is to minimize the 1-norm, and hope that gives sparsity...)

I love this place...

##### Share on other sites
Thanks to all.

Will it give a good solution?

##### Share on other sites

And even if it is full rank, if the matrix A is large, it's not very numerically stable.

I have heard that criticism before, but this has never become an issue in practice in my experience. Perhaps this bites more in other applications.
[/quote]

It really depends what you want the solution for, and on the nature of your matrix, and especially how big it is. There's some formalism you can use to see how bad it is, but bleh it's boring . But it is something to be aware of. On the plus side the math is easy to understand (you just need matrix multiplication + Cholesky decomposition), so it's not that hard to dump your own solution into your project without having to figure out how to get LAPACK working (I gave up on my own project, actually) or understanding high level linear algebra to roll your own.

Will it give a good solution?

Cholseky is great iff:

2. Your matrix is full rank
4. And it's positive definite.

Those things are true of the A^T*A matrix that alvaro mentioned, assuming it's full rank.

1. 1
2. 2
Rutin
19
3. 3
4. 4
khawk
14
5. 5
A4L
13

• 13
• 26
• 10
• 11
• 44
• ### Forum Statistics

• Total Topics
633743
• Total Posts
3013643
×