I've worked a lot with matrix calculus and tensors (phd student, use it in my research all the time). In this post I'm going spend a lot of time explaining the basic ideas/techniques in the way I think about them, and then at the end apply it to your problem. Eventually we get a formula for the derivative object you want, K.
---- Background ----
If you want to generally understand matrix calculus, it's much easier if you get comfortable with tensors. Tensors are not so bad. The most important thing is understanding how to unfold them into matrices or vectors, do something to them, and then fold them back up into tensors. If you become comfortable with that, then everything becomes straightforward. The first few sections of this paper are really good, with pictures:
If you have access to matlab and you want to get an intuitive feel for tensors, then I'd highly suggest making up tensors/higher dimensional arrays and playing around with the commands "reshape" and "permute" to get a feel for how it works.
---- Meaning and application of the third order derivative tensor ----
Ok now back to the original question. Let's think about a N-by-N matrix, A, parameterized by k variables, A(x_1,...,x_k). If you take the derivative, you get a k-by-N-by-N box of numbers A_prime, where
- the first slice A_prime(1,all,all) is the derivative of A with respect to x_1,
- the second slice A_prime(2,all,all) is the derivative of A with respect to x_2,
- and so forth.
The first order of business is to understand how this third order tensor is applied to a vector to give the directional derivative in any given direction, v. The meaning of this directional derivative we want just what you expect - if you evaluate the matrix at the point X and at a nearby point X + epsilon*v, then how much does the matrix change. Ie,
[derivative of A in the direction v] = lim_s->0 (A(x1 + s*v1, x_2 + s*v2, ..., xk + s*vk) - A(x1, x_2, ..., xk))/s
To apply the third derivative tensor to a vector, you unfold A_prime into a k-by-N^2 matrix so as to expose the first mode to the left, and then left-multiply by the vector you are applying, and then fold the result back up into a tensor of one lower dimension (N-by-N matrix), like so:
[derivative of A in the direction v] = reshape(v^T * reshape(A_prime, k, N*N), N, N)^T
---- Including right matrix multiplication ----
The next order of business is to understand what to do when there is another matrix B multiplying on the right,
[derivative of A in the direction v]*B
Obviously if we had 'v', we could apply the reshaping procedure to find the derivative, and then multiply by B. However, we want to think of v as being a free variable, and find the object that describes the whole process at once. In other words, what is the 3rd order tensor, K, such that, when you do the standard reshaping and multiplying process, you get the desired result.
[derivative of A in the direction v]*B = reshape(v^T * reshape(K, k, N*N), N, N)^T //what is K though?
To get this K, we unfold A_prime to expose the rightmost mode, then rightmultiply by B, then fold it back up:
K = reshape(reshape(A_prime, k*N, N)*B, k, N, N)
---- Including left matrix multiplication ----
Now we know how to deal with matrices multiplying on the right, but what about if there is a matrix multiplying on the left instead? Ie:
B*[derivative of A in the direction v]
To deal with this, you have to permute the dimensions of the tensor to gain access to the left matrix mode, then unfold it, then left multiply by B. After that, you fold it back up and undo the permutation. The necessary permutation to move the middle mode to the left is [2,1,3], since that takes (k,N,N) -> (N,k,N). The inverse permutations is also [2,1,3] to do (N,k,N) -> (k,N,N). In detail, the overall tensor for the derivative and left multiplying process is,
K = permute(reshape(A*reshape(permute(A_prime, [2,1,3]), N, k*N), N, k, N), [2,1,3])
---- Including matrix sandwich ----
Ok, finally what about the combination of both - when the derivative is sandwiched inbetween matrices like so:
B*[derivative of A in the direction v]*C
Here you just do both of the things we did before - unfold to expose the right mode, right multiply by C, fold back up, permute and unfold to expose the middle mode on the left, left multiply by B, fold back up, and unpermute.
K = permute(reshape(B*reshape(permute(reshape(reshape(A_prime, k*N, N)*C, k, N, N), [2,1,3]), N, k*N), N, k, N), [2,1,3])
---- Application of the techniques to your problem ----
This covers all the cases necessary to deal with this sort of matrix equation. In our case, we seek to find a 3rd order k-by-N-by-1 tensor, K, such that
reshape(v^T * reshape(K, k, N*N), 1, N)^T = ...
[derivative of A in direction v]*B*b + ...
A*[derivative of B in direction v]*b + ...
A*B*[derivative of b in direction v]
Using the above techniques, the tensor is K = K1 + K2 + K3, where
K1 = reshape(reshape(A_prime, k*N, N)*B*b, k, N, 1)
K2 = permute(reshape(A*reshape(permute(reshape(reshape(B_prime, k*N, N)*b, k, N, 1), [2,1,3]), N, k*1), N, k, 1), [2,1,3])
K3 = permute(reshape(A*B*reshape(permute(b_prime, [2,1,3]), N, k*1), N, k, 1), [2,1,3])
Since b is a vector not a matrix, some of the stuff above is extraneous. Basically we can remove singleton dimensions like (k,N,1) -> (k,N), and replace permute [2,1] with a matrix transpose. These simplifications lead to the following form.
K1 = reshape(reshape(A_prime, k*N, N)*B*b, k, N)
K2 = (A*reshape(reshape(B_prime, k*N, N)*b, k, N)^T))^T
K3 = A*B*b_prime^T
and application via,
[derivative of f in direction v] = (v^T * K)^T
---- Conclusion ----
Hope this helps. It's a lot to absorb, but it eventualy makes sense. It took me a long time to understand this stuff, but I think the learning process could be greatly expedited if you play around with reshaping and permuting multidimensional arrays. Other than the paper linked at the start I haven't found many good resources to learn from.
Edited by Nick Alger, 30 November 2013 - 06:14 AM.