# Quadratic forms, with a twist (attention ury)

## Recommended Posts

This is probably something you, ury, are going to find trivial (I hope). Others are welcome to help too ;) This is somewhat related to my previous post about second order surfaces. Well, atleast the matrix R has the same properties as in the previous post, but instead now I have a fixed location on that surface. OK, so I have this: sum(e2n) = sum(x2n) - rTAT(ARAT)-1Ar, where r is Mx1, A is KxM and R is MxM and K<M. My objective is to minimize the square sum: sum(e2n). Now the problem is to find matrix A such that rTAT(ARAT)-1Ar is maximized. I know that upper limit for rTAT(ARAT)-1Ar is given by rTR-1r, and this corresponds to situation where K=M and A is invertible. Again I'm intrested in the whole solution space and all information you can give me about this particular problem. I would be very grateful for any pointers on this subject. edit: Just noticed that post sounds like I'm asking you to do my "job", but I have thought this problem quite a lot and I'm just looking for a way to progress... [Edited by - Winograd on October 22, 2005 5:30:10 AM]

##### Share on other sites
ury    476

At least for me, this problem is hardly trivial.

##### Share on other sites
Thanks for your effort in advance. If it helps here's some more background info:

Vector r is calculated from xn like this:
r=sum(xnyn), where xn is scalar from time-serie x and yn=[xn-1, ..., xn-N]. So r is a vector containing autocorrelation of the serie x with N first lags (excluding lag 0).

Matrix R is covariance matrix calculated as R=sum(ynynT).

Error en = xn-wTAyn, where w is Kx1 and w = (ARAT)-1Ar. This is the least squares solution for minimizing the error sum(e2n).

I implemented simple (naive) numerical solver which seems to work suprisingly well, but it's written in Octave and therefore it's insanely slow. So i'm really intrested on what you can tell me about A. I'm intrested in how it's related to r and R. I post my results if the numerical solutions give more light on this problem.

##### Share on other sites
Dave Eberly    1173
The standard least squares problem is of the form: Choose x to minimize Ax-y. The solution when A^T*A is invertible is x = (A^T*A)^{-1}*A*y. In this formulation, the matrix A is *known* and x is the variable.

In your problem, your "matrix" contains the unknowns, so this does not look like a standard least squares problem. You want to choose A to minimize the error function E(A) = sum_n [x_n - r^T*A^T*(A*R*A^T)^{-1}*A*y_n]^2, where the x_n, y_n, r, and R values are all known. That is, the only variable is A. The matrix B = A*R*A^T is KxK and presumed to be invertible. Each entry of B is a quadratic function of the entries of A. The inverse is B^{-1} = adjoint(B)/det(B). Each entry of adjoint(B) is a degree 2*(K-1) polynomial in the entries of A and det(B) is a degree 2*K polynomial in the entries of A. Thus, the elements of A^T*B*A are rational polynomials in the entries of A. The numerators are of degree 2*(K+1) and the common denominator, which is det(B), has degree 2*K.

Proceeding as you would with least squares, compute the gradient of E with respect to the entries of A and set all the equations to zero. Since A is KxM, you have K*M equations to solve, each equation a rational polynomial set to zero. Because you started with a common denominator, you can simplify these equations to regular polynomial equations. This is a *lot* of nonlinear equations to solve numerically.

##### Share on other sites
Quote:
 Original post by Dave EberlyThe standard least squares problem is of the form: Choose x to minimize Ax-y. The solution when A^T*A is invertible is x = (A^T*A)^{-1}*A*y. In this formulation, the matrix A is *known* and x is the variable.

In my case the "standard" least squares problem is to minimize the square sum
sum(e2n), where en = xn-wTyn. This is know as linear prediction. We find coefficient vector w such that we can optimally (least squares sense) predict the signal x from its previous samples. The solution is simply w=R-1r (look definition for R and r above). And I have a reason for addition of the matrix A to this "standard" problem.

Quote:
 Original post by Dave EberlyIn your problem, your "matrix" contains the unknowns, so this does not look like a standard least squares problem. You want to choose A to minimize the error function E(A) = sum_n [x_n - r^T*A^T*(A*R*A^T)^{-1}*A*y_n]^2, where the x_n, y_n, r, and R values are all known. That is, the only variable is A. The matrix B = A*R*A^T is KxK and presumed to be invertible. Each entry of B is a quadratic function of the entries of A. The inverse is B^{-1} = adjoint(B)/det(B). Each entry of adjoint(B) is a degree 2*(K-1) polynomial in the entries of A and det(B) is a degree 2*K polynomial in the entries of A. Thus, the elements of A^T*B*A are rational polynomials in the entries of A. The numerators are of degree 2*(K+1) and the common denominator, which is det(B), has degree 2*K.

Yes, I'm sure you're correct with all this (btw. why "matrix" and not matrix?). Don't get me wrong, I appreciate this information and it is valuable if (or when) I see that numerical solutions are my only option to get any information about this problem, _but_ I'm more intrested in how A is related to R and r.

If you looked at the thread I gave link at the first post, there ury gave an elegant solution for the problem, which shed light on the whole geometry of the error surface. I'm hoping to see something similar. :)

Quote:
 Original post by Dave EberlyProceeding as you would with least squares, compute the gradient of E with respect to the entries of A and set all the equations to zero. Since A is KxM, you have K*M equations to solve, each equation a rational polynomial set to zero. Because you started with a common denominator, you can simplify these equations to regular polynomial equations. This is a *lot* of nonlinear equations to solve numerically.

This is what i'm more or less doing now numerically. The most important reason why I want to stay away from this rather heavy solution is that I'm looking at M which might be anything from 10 to 1000 and K from 2 to 50. (In the long run I'm intrested on what happens when M approaches infinity)

##### Share on other sites
Dave Eberly    1173
Quote:
 Original post by WinogradYes, I'm sure you're correct with all this (btw. why "matrix" and not matrix?).

I just wanted to stress that your matrix was not the one normally seen in a least-squares problem.

Quote:
 Don't get me wrong, I appreciate this information and it is valuable if (or when) I see that numerical solutions are my only option to get any information about this problem, _but_ I'm more intrested in how A is related to R and r.

I do not believe you are going to find closed-form expressions for the relationship. This is similar to asking for an algebraic relationship for the roots of a degree 5 polynomial in terms of its coefficients. This is theoretically not possible.

Quote:
 If you looked at the thread I gave link at the first post, there ury gave an elegant solution for the problem, which shed light on the whole geometry of the error surface. I'm hoping to see something similar. :)

The previous thread is basic linear algebra. Given a quadratic form Q = x^T*B*x, where B is an n-by-n symmetric matrix, you can use an eigendecomposition on B to obtain an orthonormal set of n eigenvectors V_1 through V_n and a corresponding set of eigenvalues d_1 through d_n (not all necessarily distinct). The matrix B = d_1*V_1*V_1^T + ... + d_n*V_n*V_n^T. If y = (y_1,...,y_n) with y_i = Dot(V_i,x), then the quadratic form is Q = sum_{i=1}^n d_i y_i^2, the "standard form". What hypersurface this represents depends on the signs of the d_i, just as in the characterization of quadric surfaces. If all d_i > 0, then you have a hyperellipsoid, and the maximum Q occurs for x = V_s where d_s is the maximum eigenvalue. The minimum Q occurs at the eigenvector for the minimum eigenvalue.

Your current thread is not basic linear algebra, so I see no reason to expect a simple formula for optimizing the quadratic form Q = x^T*A^T*(A*B*A^T)^{-1}*A*x with respect to A. The previous problem searched for an x to optimize Q. Your current problem thinks of x as fixed and searches for an A.

You mentioned that A is k-by-m with k < m. You observed that if k = m and A is invertible, the quadratic form reduces to Q = x^T*B*x, in which case *all* invertible matrices A optimize the quadratic form as long as you also choose x to optimize x^T*B*x. Having infinitely many choices for A makes this problem not well defined, so you cannot expect a numerical optimizer to behave correctly.

Similar nonuniqueness rears its ugly head even when k < m and you look at a special subset of A matrices. Suppose that B = V*D*V^T, where V is an orthogonal m-by-m matrix and D is an invertible diagonal m-by-m matrix (an eigendecomposition of B). Restrict the subset of A matrices so that the singular value decompositions have U as the left matrix. That is, A = U*S*V^T, where V is the already mentioned orthogonal m-by-m matrix, U is an orthogonal k-by-k matrix, and S is k-by-m is partitioned into S = [E|0]. The matrix E is k-by-k and is diagonal with nonnegative entries. The 0 indicates a k-by-(m-k) matrix of zeros. Let's assume the diagonal entries of E are positive, so E is invertible.

The matrix D is partitioned to D = diagonal(D1,D2), where D1 is a k-by-k diagonal matrix and D2 is an (m-k)-by-(m-k) diagonal matrix. Some basic algebra will eventually lead you to A^T*(A*B*A^T)^{-1}*A = V*diagonal(D1^{-1},0)*V^T, where the 0 indicates an (m-k)-by-(m-k) matrix of zeros. The quadratic form is Q = x^T*V*diagonal(D1^{-1},0)*V^T*x. This form is unaffected by the choice of U or E in the decomposition for A. You can then optimize the quadratic form by choosing the appropriate vector for x. However, you still have infinitely many choices for invertible E and for orthogonal U, none of them affecting the optimum value for Q. Once again, you have nonuniqueness and you cannot expect a numerical optimizer to behave correctly.

##### Share on other sites
Dave: I must admit that I have some troubles following you (which just tells something about me ;)). Mainly I mean that it's getting late and I'm getting lost into the amount of symbols... I'll read it with thought in the morning.

I'm aware of the fact that there is multiple solutions (infinite) and that this is ill-posed problem. I'm not by any means an expert in numerical methods, but if the numerical optimizer converges to some solution satisfying the criteria, I consider that the optimizer has indeed behaved correctly, no?(note, I'm not claiming my optimizer does that :)) Or is there some other reasons why I can't even expect this behaviour? Perhaps problems with local maximums?

I have few constraints (which of I haven't mentioned) for this problem which will restrict the problem to one solution (I think). Most importantly I have a set of signals (xni) and thus I have set of autocorrelations (ri) and autocovariances (Ri) corresponding to the signals. Now the matrix A is common to all these. From this follows that instead of maximizing rTAT(ARAT)-1Ar, we would maximize sum(riTAT(ARiAT)-1Ari), for A. With sufficiently large set of signals I should have one unique solution, correct? But do still I also have problems with local maximums? Probably...

##### Share on other sites
Dave Eberly    1173
Until I actually see your additional constraints, I have no intuition whether these lead to a unique solution. My reaction to your modification for choosing A to optimize the sum is that this still will not help you. What matters in the end is that you have a well-posed problem, at which point trying some numerical method makes sense.

##### Share on other sites
Dave: I probably edited my post while you were writing.

If the sum does not give me unique solution then I have no idea what does. Additional constraints are related to what I'm aiming at (which I think is irrelevant at this point). But for a reason the matrix A should be something like this:

[a11,a12, a13, ... ,a1M]
[0, a22, a23, ... ,a2M]
[0, 0, a33, ... ,a3M]
.
.
.
[...]

So that rows are padded with zeros in increasing order. Other constraints comes from how the coefficients a are going to be expressed.