Using the product rule for a partial derivative of a vector/matrix function...

Started by
6 comments, last by alvaro 10 years, 4 months ago

I was working on a physics animation, and I need to compute the Jacobian in order to solve a partial differential equation.

Suppose we have a function consisting of a series of matrices multiplied by a vector:
f(X) = A * B * b
--where X is a vector containing elements that are contained within A, b, and/or b,
--A is a matrix, B is a matrix, and b is a vector

Each Matrix and the vector is expressed as more terms, ie...
X = (x1, x2, x3)

A =
[ x1 + y1 y4 y7 ]
[ y2 x2 + y5 y8 ]
] y3 y6 x3 + y9 ]

B =
[ y1 x2 + y4 x3 + y7 ]
[x1 + y2 y5 y8 ]
] y3 y6 y9 ]

b = [y1 y2 y3]' (' means transposed)

Now we want to find the Jacobian of f - ie the partial derivative of f wrt X.

One way to do this is to multiply the two matrices and then multiply that by the vector, creating one 3x1 vector in which each element is an algebraic expression resulting from matrix multiplication. The partial derivative could then be computed per element to form a 3x3 Jacobian. This would be feasible in the above example, but the one I'm working is a lot more complicated (and so I would also have to look for patterns in order to simplify it afterwards).

I was wanting to try to use the chain rule and/or the product rule for partial derivatives if possible. However, with the product rule you end up with A' * B * b + A * B' * b + A * B * b', where each derivative is wrt to the vector X. I understand that the derivative of a matrix wrt a vector is actually a 3rd order tensor, which is not easy to deal with. If this is not correct, the other terms still have to evaluate to matrices in order for matrix addition to be valid. If I use the chain rule instead, I still end up with the derivative of a matrix wrt a vector.

Is there an easier way to break down a matrix calculus problem like this? I've scoured the web and cannot seem to find a good direction.

Advertisement
Are you trying to program something that can evaluate the Jacobian, or are you doing this by hand?

I'm doing it by hand. It will then be programmed into C++.

To be clear - the Jacobian I find will be programmed into C++.

I'm not creating a system to find Jacobians arbitrarily.

You can make a type that knows the value of a function and its derivatives with respect to some variables, overload operators and functions so the correct rules are applied, and then just evaluate an expression using this type will give you the Jacobian. Would something like that work for you?

Here's some code that does it for a single variable:

#include <iostream>
#include <cmath>

struct Jet {
  double value;
  double derivative;
 
  Jet(double value, double derivative = 0.0)
    : value(value),
      derivative(derivative) {
  }
};

Jet operator+(Jet j1, Jet j2) {
  return Jet(j1.value + j2.value, j1.derivative + j2.derivative);
}

Jet operator-(Jet j1, Jet j2) {
  return Jet(j1.value - j2.value, j1.derivative - j2.derivative);
}

Jet operator*(Jet j1, Jet j2) {
  return Jet(j1.value * j2.value, j1.value * j2.derivative + j2.value * j1.derivative);
}

Jet operator/(Jet j1, Jet j2) {
  return Jet(j1.value / j2.value, (j1.derivative * j2.value - j1.value * j2.derivative) / (j2.value * j2.value));
}

// More operators here...


Jet exp(Jet j) {
  return Jet(std::exp(j.value), std::exp(j.value) * j.derivative);
}

Jet log(Jet j) {
  return Jet(std::log(j.value), j.derivative / j.value);
}

Jet pow(Jet j1, Jet j2) {
  return exp(j2 * log(j1));
}

Jet sin(Jet j) {
  return Jet(std::sin(j.value), std::cos(j.value) * j.derivative);
}

Jet cos(Jet j) {
  return Jet(std::cos(j.value), -std::sin(j.value) * j.derivative);
}

// More functions here...

template <typename R>
R my_function(R x) {
  return x * sin(x) - pow(x, cos(x));
}

int main() {
  Jet x(2.0, 1.0);
 
  Jet fx = my_function(x);
 
  std::cout << fx.value << ' ' << fx.derivative << '\n';
}
 

This probably would work if I just needed the derivative of something to be applied during a simple physics implementation, but I need to arbitrarily find the Jacobian matrix so that it can be fed into a conjugate gradient implementation that solves a system of equations. What I'm doing is a bit theoretical, but it would still need be able to be implemented.

Writing a program like this to find it is an interesting idea - I wonder if I could construct some sort of list or other data structure that keeps track of my terms as they're found?

I've worked a lot with matrix calculus and tensors (phd student, use it in my research all the time). In this post I'm going spend a lot of time explaining the basic ideas/techniques in the way I think about them, and then at the end apply it to your problem. Eventually we get a formula for the derivative object you want, K.

---- Background ----

If you want to generally understand matrix calculus, it's much easier if you get comfortable with tensors. Tensors are not so bad. The most important thing is understanding how to unfold them into matrices or vectors, do something to them, and then fold them back up into tensors. If you become comfortable with that, then everything becomes straightforward. The first few sections of this paper are really good, with pictures:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.2059&rep=rep1&type=pdf

If you have access to matlab and you want to get an intuitive feel for tensors, then I'd highly suggest making up tensors/higher dimensional arrays and playing around with the commands "reshape" and "permute" to get a feel for how it works.

---- Meaning and application of the third order derivative tensor ----

Ok now back to the original question. Let's think about a N-by-N matrix, A, parameterized by k variables, A(x_1,...,x_k). If you take the derivative, you get a k-by-N-by-N box of numbers A_prime, where

- the first slice A_prime(1,all,all) is the derivative of A with respect to x_1,

- the second slice A_prime(2,all,all) is the derivative of A with respect to x_2,

- and so forth.

The first order of business is to understand how this third order tensor is applied to a vector to give the directional derivative in any given direction, v. The meaning of this directional derivative we want just what you expect - if you evaluate the matrix at the point X and at a nearby point X + epsilon*v, then how much does the matrix change. Ie,

[derivative of A in the direction v] = lim_s->0 (A(x1 + s*v1, x_2 + s*v2, ..., xk + s*vk) - A(x1, x_2, ..., xk))/s

To apply the third derivative tensor to a vector, you unfold A_prime into a k-by-N^2 matrix so as to expose the first mode to the left, and then left-multiply by the vector you are applying, and then fold the result back up into a tensor of one lower dimension (N-by-N matrix), like so:

[derivative of A in the direction v] = reshape(v^T * reshape(A_prime, k, N*N), N, N)^T

---- Including right matrix multiplication ----

The next order of business is to understand what to do when there is another matrix B multiplying on the right,

[derivative of A in the direction v]*B

Obviously if we had 'v', we could apply the reshaping procedure to find the derivative, and then multiply by B. However, we want to think of v as being a free variable, and find the object that describes the whole process at once. In other words, what is the 3rd order tensor, K, such that, when you do the standard reshaping and multiplying process, you get the desired result.

[derivative of A in the direction v]*B = reshape(v^T * reshape(K, k, N*N), N, N)^T //what is K though?

To get this K, we unfold A_prime to expose the rightmost mode, then rightmultiply by B, then fold it back up:

K = reshape(reshape(A_prime, k*N, N)*B, k, N, N)

---- Including left matrix multiplication ----

Now we know how to deal with matrices multiplying on the right, but what about if there is a matrix multiplying on the left instead? Ie:

B*[derivative of A in the direction v]

To deal with this, you have to permute the dimensions of the tensor to gain access to the left matrix mode, then unfold it, then left multiply by B. After that, you fold it back up and undo the permutation. The necessary permutation to move the middle mode to the left is [2,1,3], since that takes (k,N,N) -> (N,k,N). The inverse permutations is also [2,1,3] to do (N,k,N) -> (k,N,N). In detail, the overall tensor for the derivative and left multiplying process is,

K = permute(reshape(A*reshape(permute(A_prime, [2,1,3]), N, k*N), N, k, N), [2,1,3])

---- Including matrix sandwich ----

Ok, finally what about the combination of both - when the derivative is sandwiched inbetween matrices like so:

B*[derivative of A in the direction v]*C

Here you just do both of the things we did before - unfold to expose the right mode, right multiply by C, fold back up, permute and unfold to expose the middle mode on the left, left multiply by B, fold back up, and unpermute.

K = permute(reshape(B*reshape(permute(reshape(reshape(A_prime, k*N, N)*C, k, N, N), [2,1,3]), N, k*N), N, k, N), [2,1,3])

---- Application of the techniques to your problem ----

This covers all the cases necessary to deal with this sort of matrix equation. In our case, we seek to find a 3rd order k-by-N-by-1 tensor, K, such that

reshape(v^T * reshape(K, k, N*N), 1, N)^T = ...

[derivative of A in direction v]*B*b + ...
A*[derivative of B in direction v]*b + ...
A*B*[derivative of b in direction v]

Using the above techniques, the tensor is K = K1 + K2 + K3, where

K1 = reshape(reshape(A_prime, k*N, N)*B*b, k, N, 1)

K2 = permute(reshape(A*reshape(permute(reshape(reshape(B_prime, k*N, N)*b, k, N, 1), [2,1,3]), N, k*1), N, k, 1), [2,1,3])

K3 = permute(reshape(A*B*reshape(permute(b_prime, [2,1,3]), N, k*1), N, k, 1), [2,1,3])

Since b is a vector not a matrix, some of the stuff above is extraneous. Basically we can remove singleton dimensions like (k,N,1) -> (k,N), and replace permute [2,1] with a matrix transpose. These simplifications lead to the following form.

K1 = reshape(reshape(A_prime, k*N, N)*B*b, k, N)

K2 = (A*reshape(reshape(B_prime, k*N, N)*b, k, N)^T))^T

K3 = A*B*b_prime^T

and application via,

[derivative of f in direction v] = (v^T * K)^T

---- Conclusion ----

Hope this helps. It's a lot to absorb, but it eventualy makes sense. It took me a long time to understand this stuff, but I think the learning process could be greatly expedited if you play around with reshaping and permuting multidimensional arrays. Other than the paper linked at the start I haven't found many good resources to learn from.

This probably would work if I just needed the derivative of something to be applied during a simple physics implementation, but I need to arbitrarily find the Jacobian matrix so that it can be fed into a conjugate gradient implementation that solves a system of equations. What I'm doing is a bit theoretical, but it would still need be able to be implemented.

Writing a program like this to find it is an interesting idea - I wonder if I could construct some sort of list or other data structure that keeps track of my terms as they're found?

The extension of the technique I showed you to computing a Jacobian matrix is fairly straight-forward. If you can't see it, I can try to write another piece of sample code.

This topic is closed to new replies.

Advertisement