Jump to content
  • Advertisement
Sign in to follow this  
junliu

matrix derivatives

This topic is 4133 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Suppose x is a column vector, and v' denotes its transpose, for a quadratic scalar function f f = x' * A * x I want to calculate df/dx. Assuming that A is symmetric, we can get df/dx = 2Ax However, if A is a function of x, i.e A = g(x), how can I compute df/dx? Any answer or reference would be very much appreciated! Jun

Share this post


Link to post
Share on other sites
Advertisement
Use tensor notation for the computation. Let x = [x_i] and A = [a_{ij}]. The quadratic function is f = a_{ij}*x_i*x_j, where you sum over i and sum over j.

When you compute a partial derivative of a tensor T_{i1 i2 ... iN} with respect to x_k, the notation used is T_{i1 i2 ... iN, k}. The indices to the left of the comma are the original tensor indices. The indices to the right of the comma denote derivatives with respect to the components of x.

In your example, the gradient of f is compute using the product rule:

f_{,k} = a_{ij}*x_i*x_{j,k} + a_{ij}*x_{i,k}*x_j + a_{ij,k}*x_i*x_j

The Kronecker delta is d_{ij}, which is 1 when i = j but 0 otherwise. It is the case that x_{i,k} = d_{ik} and x_{j,k} = d_{jk}, so

f_{,k} = a_{ij}*x_i*d_{jk} + a_{ij}*d_{ik}*x_j + a_{ij,k}*x_i*x_j

The Kronecker delta has the property that in a tensor product, you can replace a repeated index by the other Kronecker index. For example, b_{ij}*d_{jk} = b_{ik}. So

f_{,k} = a_{ik}*x_i + a_{kj}*x_j + a_{ij,k}*x_i*x_j

When A is a constant, symmetric matrix, then a_{ij,k} = 0 for all i, j, and k. Then

f_{,k} = a_{ik}*x_i + a_{kj}*x_j = a_{ki}*x_i + a_{kj}*x_j = 2*a_{ki}*x_i

which is what you posted.

Share this post


Link to post
Share on other sites
Hi Dave,

Thank you very much for your reply. I'm not quite familiar with the tensor notations, but I'll read something and try to understand that.

OK, let me make the problem more specific. Suppose that A and B are matrices (A is symmetric), and

f = x' * B' * ( (I@x)' * A * (I@x) )^-1 * B * x

where @ denotes the Kronecker product, I is an identity matrix, and ^-1 denotes matrix inverse. Is there any matrix representation of df/dx ??? Or do i have to use tensor notation?

Thanks in advance!!!

Share this post


Link to post
Share on other sites
Isn't f a scalar function after that outside product between x' and x? If so then how could it have a matrix representation since it is no longer a matrix. Also x is a vector so what does it mean to differentiate something with respect to a vector, unless like Dave Eberly I think assumed, you mean you want to differentiate with respect to a component of x.

Maybe I'm just not understanding the question though...

Share this post


Link to post
Share on other sites
Sorry, the final derivative would be a vector, and by matrix representation i mean general matrix/vector multiplication etc., rather than tensor notation.

Jun

Share this post


Link to post
Share on other sites
Quote:
Original post by My_Mind_Is_Going
A is a matrix, right? So Ax is another column vector, and x' A x is a scalar (1x1 matrix).



Yes, and that's why df/dx is a vector.


Share this post


Link to post
Share on other sites
If f = x'Ax and x'Ax is a scalar, then how is df/dx a vector? What that even mean to differentiate with respect to that vector, unless you mean df/dx_k or something where k is the kth component of x. df/dx_k (partial derivatives really) would be the kth component of grad f. This is where it becomes easier to adopt tensor notation though.

Or you could call x, r instead and then you can have r = (x,y,z).

Then you can differentiate f with respect to x, y, or z but what you are currently describing sounds like you want to differentiate with respect to my r.

Share this post


Link to post
Share on other sites
Quote:
Original post by junliu
OK, let me make the problem more specific. Suppose that A and B are matrices (A is symmetric), and

f = x' * B' * ( (I@x)' * A * (I@x) )^-1 * B * x

where @ denotes the Kronecker product, I is an identity matrix, and ^-1 denotes matrix inverse. Is there any matrix representation of df/dx ??? Or do i have to use tensor notation?


I believe that tensor notation is necessary in the short term, because you will have doubly-indexed quantities that are differentiated with respect to components of x, leading to triply-indexed tensors. The final expression might be representable using vectors, matrices, and Kronecker products...

I assume you intend that A and B are matrices of constants (with respect to x). Generally, let x be an n-by-1 vector, let B be an m-by-n matrix, and let I be an r-by-r matrix. The Kronecker product I@x is (r*n)-by-r, so A must be an (r*n)-by-(r*n) matrix.

The matrix C = (I@x)' * A * (I@x) is r-by-r and has entries that are quadratic polynomials in the components of x. To see this, write A as an r-by-r block matrix A = [A_{ij}], where each block A_{ij} is n-by-n, then C is a block matrix, say, C = [C_{ij}] with C_{ij} = x'*A_{ij}*x. The adjugate of C is denoted adj(C), the determinant of C is denoted det(C), and the inverse of C is C^{-1} = adj(C)/det(C). The adjugate entries and the determinant are high-degree polynomials, so directly computing dC^{-1}/dx is complicated.

Instead, let C = [c_{ij}], where the c_{ij} are scalar entries that depend on x. Let C^{-1} = [u_{ij}] be the inverse matrix. Let d_{ij} be the Kronecker delta. Then d_{ik} = c_{ij}*u_{jk} and d_{mj} = u_{mi}*c_{ij}. Differentiate the first of these expressions with respect to x_s to obtain:

0 = c_{ij}*u_{jk,s} + c_{ij,s}*u_{jk}

Contract with u_{mi} to obtain

0 = u_{mi}*c_{ij}*u_{jk,s} + u_{mi}*c_{ij,s}*u_{jk} = d_{mj}*u_{jk,s} + u_{mi}*c_{ij,s}*u_{jk} = u_{mk,s} + u_{mi}*c_{ij,s}*u_{jk}

Therefore, the derivative of C^{-1} is

u_{mk,s} = -u_{mi}*c_{ij,s}*u_{jk}

in which case you need only differentiate the entries of C with respect to x.

And then you still have to use the product rule applied to f = (B*x)'*C^{-1}*(B*x).

Share this post


Link to post
Share on other sites
Quote:
Original post by Dave Eberly
Quote:
Original post by junliu
OK, let me make the problem more specific. Suppose that A and B are matrices (A is symmetric), and

f = x' * B' * ( (I@x)' * A * (I@x) )^-1 * B * x

where @ denotes the Kronecker product, I is an identity matrix, and ^-1 denotes matrix inverse. Is there any matrix representation of df/dx ??? Or do i have to use tensor notation?


I believe that tensor notation is necessary in the short term, because you will have doubly-indexed quantities that are differentiated with respect to components of x, leading to triply-indexed tensors. The final expression might be representable using vectors, matrices, and Kronecker products...

I assume you intend that A and B are matrices of constants (with respect to x). Generally, let x be an n-by-1 vector, let B be an m-by-n matrix, and let I be an r-by-r matrix. The Kronecker product I@x is (r*n)-by-r, so A must be an (r*n)-by-(r*n) matrix.

The matrix C = (I@x)' * A * (I@x) is r-by-r and has entries that are quadratic polynomials in the components of x. To see this, write A as an r-by-r block matrix A = [A_{ij}], where each block A_{ij} is n-by-n, then C is a block matrix, say, C = [C_{ij}] with C_{ij} = x'*A_{ij}*x. The adjugate of C is denoted adj(C), the determinant of C is denoted det(C), and the inverse of C is C^{-1} = adj(C)/det(C). The adjugate entries and the determinant are high-degree polynomials, so directly computing dC^{-1}/dx is complicated.

Instead, let C = [c_{ij}], where the c_{ij} are scalar entries that depend on x. Let C^{-1} = [u_{ij}] be the inverse matrix. Let d_{ij} be the Kronecker delta. Then d_{ik} = c_{ij}*u_{jk} and d_{mj} = u_{mi}*c_{ij}. Differentiate the first of these expressions with respect to x_s to obtain:

0 = c_{ij}*u_{jk,s} + c_{ij,s}*u_{jk}

Contract with u_{mi} to obtain

0 = u_{mi}*c_{ij}*u_{jk,s} + u_{mi}*c_{ij,s}*u_{jk} = d_{mj}*u_{jk,s} + u_{mi}*c_{ij,s}*u_{jk} = u_{mk,s} + u_{mi}*c_{ij,s}*u_{jk}

Therefore, the derivative of C^{-1} is

u_{mk,s} = -u_{mi}*c_{ij,s}*u_{jk}

in which case you need only differentiate the entries of C with respect to x.

And then you still have to use the product rule applied to f = (B*x)'*C^{-1}*(B*x).




That solved my problem. Many thanks, Dave!!!

Jun

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!