# Introduction

The forums are replete with people trying to learn how to efficiently find inverses of matrices or solve the classic \(Ax=b\) problem. Here's a decent method that is fairly easy to learn and implement. Hopefully it might also serve as a stepping stone to learning some of the more advanced matrix factorization methods, like Cholesky, QR, or SVD.

## Overview

In 1948, Alan Turing came up with LU decomposition, a way to factor a matrix and solve \(Ax=b\) with numerical stability. Although there are many different schemes to factor matrices, LU decomposition is one of the more commonly-used algorithms. Interestingly enough, Gauss elimination can be implemented as LU decomposition. The computational effort expended is about the same as well.

So why would anyone want to use this method? First of all, the time-consuming elimination step can be formulated so that only the entries of \(A\) are required. As well, if there are several matrices of \(b\) that need to be computed for \(Ax=b\), then we only need to do the decomposition once. This last benefit helps us compute the inverse of \(A\) very quickly.

## How It Works

Let's start with an \(Ax=b\) problem, where you have an \(n\)x\(n\) matrix \(A\) and \(n\)x\(1\) column vector \(b\) and you're trying to solve for a \(n\)x\(1\) column vector \(x\). The idea is that we break up \(A\) into two matrices: \(L\), which is a lower triangular matrix, and \(U\), which is an upper triangular matrix. It would look something like this:

\[

\left [

\begin{matrix}

a_{11} & a_{12} & a_{13} \\

a_{21} & a_{22} & a_{23} \\

a_{31} & a_{32} & a_{33} \\

\end{matrix} \right ] = \left [

\begin{matrix}

l_{11} & 0 & 0 \\

l_{21} & l_{22} & 0 \\

l_{31} & l_{32} & l_{33} \\

\end{matrix} \right ] \left [

\begin{matrix}

u_{11} & u_{12} & u_{13} \\

0 & u_{22} & u_{23} \\

0 & 0 & u_{33} \\

\end{matrix} \right ]

\]

This seems like a lot more work, but allows us to use some cool math tricks. Our original equation, substituted with our decomposition is now \((LU)x=b\):

\[

\left (

\left [ \begin{matrix}

l_{11} & 0 & 0 \\

l_{21} & l_{22} & 0 \\

l_{31} & l_{32} & l_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

u_{11} & u_{12} & u_{13} \\

0 & u_{22} & u_{23} \\

0 & 0 & u_{33} \\

\end{matrix} \right ]

\right )

\left [ \begin{matrix}

x_1 \\

x_2 \\

x_3 \\

\end{matrix} \right ] =

\left [ \begin{matrix}

b_1 \\

b_2 \\

b_3 \\

\end{matrix} \right ]

\]

If we shift the parentheses, we get the following:

\[

\left [ \begin{matrix}

l_{11} & 0 & 0 \\

l_{21} & l_{22} & 0 \\

l_{31} & l_{32} & l_{33} \\

\end{matrix} \right ]

\left (

\left [ \begin{matrix}

u_{11} & u_{12} & u_{13} \\

0 & u_{22} & u_{23} \\

0 & 0 & u_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

x_1 \\

x_2 \\

x_3 \\

\end{matrix} \right ]

\right )

=

\left [ \begin{matrix}

b_1 \\

b_2 \\

b_3 \\

\end{matrix} \right ]

\]

Looking just inside the parentheses, we can see another \(Ax=b\) type problem. If we say that \(Ux=d\), where \(d\) is a different column vector from \(b\), we have 2 separate \(Ax=b\) type problems:

\[

\left [ \begin{matrix}

l_{11} & 0 & 0 \\

l_{21} & l_{22} & 0 \\

l_{31} & l_{32} & l_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

d_1 \\

d_2 \\

d_3 \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

b_1 \\

b_2 \\

b_3 \\

\end{matrix} \right ]

\\

\left [ \begin{matrix}

u_{11} & u_{12} & u_{13} \\

0 & u_{22} & u_{23} \\

0 & 0 & u_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

x_1 \\

x_2 \\

x_3 \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

d_1 \\

d_2 \\

d_3 \\

\end{matrix} \right ]

\]

It looks like we just created more work for ourselves, but we actually made it easier. If we look at the problems, the matrices on the left are in a form like row echelon form. Using forward and back substitutions, we can solve easily for all the elements of the \(x\) and \(d\) matrices. We first solve \(Ld=b\) for \(d\), then we substitute into \(Ux=d\) to solve for \(x\). The real trick to this method is the decomposition of \(A\).

## The Catch

There are a couple catches to this method:

- The matrix \(A\) must be square to use LU factorization. Other factorization schemes will be necessary if \(A\) is rectangular.
- We have to be sure that \(A\) is a nonsingular (i.e. invertible) matrix. If it can't be inverted, then the decomposition will produce an \(L\) or \(U\) that is singular and the method will fail because there is no unique solution.
- To avoid division by zero or by really small numbers, we have to implement a pivoting scheme just like with Gaussian elimination. This makes the problem take the form \(PA=LU\), where P is a permutation matrix that allows us to swap the rows of A. P is usually the identity matrix with rows swapped such that \(PA\) produces the \(A\) matrix with the same rows swapped as P. Then the \(Ax=b\) problem takes the form \(LUx=Pb\) since \(PA=LU\).
- Some of the entries in the \(L\) and \(U\) matrices must be known before the decomposition, or else the system has too many unknowns and not enough equations to solve for all the entries of both matrices. For what's formally known as Doolittle decomposition, the diagonal entries of the \(L\) matrix are all 1. If we use Crout decomposition, the diagonals of the \(U\) matrix are all 1.

# The Algorithm

For a square matrix \(A\) with entries \(a_{ij},\,i=1,\cdots,n,\,j=1,\cdots,n\), the Crout decomposition is as follows:

First:

\[

l_{i1} = a_{i1},\,\textrm{for}\,i=1,2,\cdots,n\\

u_{1j} = \frac{a_{1j}}{l_{11}},\,\textrm{for}\,j=2,3,\cdots,n

\]

For \(j=2,3,\cdots,n-1\):

\[

l_{ij} = a_{ij}-\sum_{k=1}^{j-1}l_{ik}u_{kj},\,\textrm{for}\,i=j,j+1,\cdots,n \\

u_{jk} = \frac{a_{jk}-\sum_{i=1}^{j-1}l_{ji}u_{ik}}{l_{jj}},\,\textrm{for}\,k=j+1,j+2,\cdots,n

\]

Finally:

\[

l_{nn}=a_{nn}-\sum_{k=1}^{n-1}l_{nk}u_{kn}

\]

That's it! If you notice, \(A\) is being traversed in a specific way to build \(L\) and \(U\). To start building \(L\), we start with \(a_{11}\) and then traverse down the rows in the same column until we hit \(a_{n1}\). To start building \(U\), we start at \(a_{12}\) and traverse along the same row until we hit \(a_{1n}\). We then build \(L\) further by starting at \(a_{22}\) and traverse along the column until we hit \(a_{n2}\), then we build \(U\) further by starting at \(a_{23}\) and traverse along the row until \(a_{2n}\), and so on.

Since the 1's on the diagonal of \(U\) are given as well as the 0's in both the \(L\) and \(U\) matrices, there's no need to store anything else. In fact, there's a storage-saving scheme where all the calculated entries of both matrices can be stored in a single matrix as large as \(A\).

From here, \(d\) can be solved very easily with forward-substitution:

\[

d_1 = \frac{b_1}{l_{11}} \\

d_i = \frac{b_i - \sum_{j=1}^{i-1}l_{ij}d_j}{l_{ii}},\,\textrm{for}\,i=2,3,\cdots,n

\]

As well, \(x\) can be solved very easily with back-substitution:

\[

x_n = d_n \\

x_i = d_i - \sum_{j=i+1}^n u_{ij}x_j,\,\textrm{for}\,i=n-1,n-2,\cdots,1

\]

## A Numerical Example

Let's try to solve the following problem using LU decomposition:

\[

\left [ \begin{matrix}

3 & -0.1 & -0.2 \\

0.1 & 7 & -0.3 \\

0.3 & -0.2 & 10 \\

\end{matrix} \right ]

\left [ \begin{matrix}

x_1 \\ x_2 \\ x_3 \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

7.85 \\ -19.3 \\ 71.4 \\

\end{matrix} \right ]

\]

First, we copy the first column to the \(L\) matrix and the scaled first row except the first element to the \(U\) matrix:

\[

L =

\left [ \begin{matrix}

3 & 0 & 0 \\

0.1 & - & 0 \\

0.3 & - & - \\

\end{matrix} \right ],\,\,\,

U =

\left [ \begin{matrix}

1 & -0.0333 & -0.0667 \\

0 & 1 & - \\

0 & 0 & 1 \\

\end{matrix} \right ]

\]

Then we compute the following entries:

\[

l_{22} = a_{22} - l_{21}u_{12} = 7-(0.1)(-0.0333) = 7.00333 \\

l_{32} = a_{32} - l_{31}u_{12} = -0.2-(0.3)(-0.0333) = -0.19 \\

u_{23} = \frac{a_{23}-l_{21}u_{13}}{l_{22}} = \frac{-0.3-(0.1)(-0.0667)}{7} = -0.0419 \\

l_{33} = a_{33}-l_{31}u_{13}-l_{32}u_{23} = 10-(0.3)(-0.0667)-(-0.19)(-0.0419) = 10.012 \\

\]

Inserting them into our matrices:

\[

L =

\left [ \begin{matrix}

3 & 0 & 0 \\

0.1 & 7.00333 & 0 \\

0.3 & -0.19 & 10.012 \\

\end{matrix} \right ],\,\,\,

U =

\left [ \begin{matrix}

1 & -0.0333 & -0.0667 \\

0 & 1 & -0.0419 \\

0 & 0 & 1 \\

\end{matrix} \right ]

\]

This is the LU factorization of the matrix. You can check if \(LU=A\). Now, we use back- and forward-substitution to solve the problem:

\[

d_1 = \frac{b_1}{l_{11}} = 7.85/3 = 2.6167 \\

d_2 = \frac{b_2-l_{21}d_1}{l_{22}} = (-19.3-(0.1)(2.6167))/7.00333 = -2.7932 \\

d_3 = \frac{b_3-l_{31}d_1-l_{32}d_2}{l_{33}} = (71.4-(0.3)(2.6167)-(-0.19)(-2.7932))/10.012 = 7 \\

\]

\[

x_3 = d_3 = 7 \\

x_2 = d_2 - u_{23}x_3 = -2.7932-(-0.0419)(7) = -2.5 \\

x_1 = d_1 - u_{12}x_2 - u_{13}x_3 = 2.6167-(-0.0333)(-2.5)-(-0.0667)(7) = 3 \\

\]

So the solution to the problem is:

\[

\left [ \begin{matrix}

x_1 \\ x_2 \\ x_3 \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

3 \\ -2.5 \\ 7 \\

\end{matrix} \right ]

\]

This solution can easily be verified by plugging it back into \(Ax=b\).

# MATLAB Code

Here's some quick MATLAB code for LU decomposition:

function [L,U] = lucrout(A) [~,n] = size(A); L = zeros(n,n); U = eye(n,n); L(1,1) = A(1,1); for j=2:n L(j,1) = A(j,1); U(1,j) = A(1,j) / L(1,1); end for j=2:n-1 for i=j:n L(i,j) = A(i,j); for k=1:j-1 L(i,j) = L(i,j) - L(i,k)*U(k,j); end end for k=j+1:n U(j,k) = A(j,k); for i=1:j-1 U(j,k) = U(j,k) - L(j,i)*U(i,k); end U(j,k) = U(j,k) / L(j,j); end end L(n,n) = A(n,n); for k=1:n-1 L(n,n) = L(n,n) - L(n,k)*U(k,n); end end

# Matrix Inverse with LU Decomposition

LU decomposition is nice for solving a series of \(Ax=b\) problems with the same \(A\) matrix and different \(b\) matrices. This is advantageous for computing the inverse of \(A\) because only one decomposition is required. The definition of the inverse of a matrix \(A^{-1}\) is a matrix such that \(AA^{-1}=I\), where \(I\) is the identity matrix. This looks like this for a general 3x3 matrix:

\[

\left [ \begin{matrix}

A_{11} & A_{12} & A_{13} \\

A_{21} & A_{22} & A_{23} \\

A_{31} & A_{32} & A_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

a_{11}^\prime & a_{12}^\prime & a_{13}^\prime \\

a_{21}^\prime & a_{22}^\prime & a_{23}^\prime \\

a_{31}^\prime & a_{32}^\prime & a_{33}^\prime \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

1 & 0 & 0 \\

0 & 1 & 0 \\

0 & 0 & 1 \\

\end{matrix} \right ]

\]

This can be set up like \(n\) \(Ax=b\) problems with different \(b\) matrices and the \(x\) matrix becomes the nth column in the inverse matrix. For example, the first problem is:

\[

\left [ \begin{matrix}

A_{11} & A_{12} & A_{13} \\

A_{21} & A_{22} & A_{23} \\

A_{31} & A_{32} & A_{33} \\

\end{matrix} \right ]

\left [ \begin{matrix}

a_{11}^\prime \\

a_{21}^\prime \\

a_{31}^\prime \\

\end{matrix} \right ]

=

\left [ \begin{matrix}

1 \\

0 \\

0 \\

\end{matrix} \right ]

\]

The second column of the inverse can be computed by changing \(b\) to \([0,1,0]^T\), the third column with \([0,0,1]^T\), and so on. This method is quick because only back- and forward-substitution is required to solve for the column vectors after the initial LU decomposition.

**WARNING:**As you can see, to get the inverse of this matrix, we end up having to solve

*n*Ax=b problems for each of the columns of the inverse. If you're trying to get the inverse of the matrix just to solve an Ax=b problem, you're introducing more numerical error into your solution and slowing down your computation time. In short, make sure you really need the matrix inverse and

**never use the matrix inverse to solve a system of equations!**

# Beyond LU Decomposition

There are a lot of other matrix factorization schemes besides LU, like Cholesky or QR factorization, but the general idea of decomposing a matrix into other matrices is roughly the same. The real key to computational savings comes from knowing beforehand what kind of matrix you're factoring and choosing the appropriate algorithm. For example, in structural finite element analysis, the matrix being decomposed is always symmetric positive definite. Cholesky decomposition is way more efficient and quicker than LU for those kinds of matrices, so it's preferred.

# Conclusion

LU decomposition is a great tool for anyone working with matrices. Hopefully you can make use of this simple, yet powerful method.

# Article Update Log

**28 Apr 2014**: Added warning to matrix inverse

**16 Apr 2014**: Initial release

## About the Author(s)

*I'm an engineer that designs computer-aided design tools. I'm new to game architectures and game programming, but I'm fairly well versed in mathematics and computational geometry.*

## License

*GDOL (Gamedev.net Open License)*

I think I even read this in the Matlab documentation, that you should never explicitly compute the inverse of a matrix, but rather stick with the factors of the factorization.

Matrix systems that arise from applications (e.g. in engineering) are typically sparse and large; think of matrices of size larger than 100000x100000 with only 10 entries per row differing from zero. Of course, such matrices can be stored efficiently by only storing non-zero entries. For 8 byte doubles this requires ~7.5MB of memory.

The problem is that sparseness does not propagate to the inverse -- the inverse of a sparse matrix is usually full. Suddenly our memory requirement for storage has gone through the roof; we now need a whopping 74GB to store all entries! Not to mention the increase of computational cost for matrix * vector in case of full matrices.

Solving an equation system with > 100000 variables is simply not feasible with today's machines. That's one of the main reasons it is highly discouraged to compute the inverse of a matrix to solve a system of equations.

Sometimes you need an inverse. In that case you can compute the inverse just fine using LU decomposition. For solving equations there is an abundant amount of algorithms that only require matrix * vector ( O(n) for sparse matrices ) and vector * vector ( O(n) ) multiplication.

The main statement (that should be stressed much more IMHO) is that you

should never compute the inverse of a matrix to solve a system of equations!What open-source libraries do you recommend for using Cholesky decomposition?

I'm looking for a library that has a BSD/MIT type license, so my app can use it commercially.

I looked at a library called CHOLMOD, but this is GPL (Supernodal module), so I can't use it for my purposes.

LAPACK is a great linear algebra library that's written in Fortran (so you know it's fast), but with a C wrapper for easier interaction. It has routines for symmetric positive definite matrices, including Cholesky decomposition. It's got a modified BSD license, so you can use it commercially. I've used it for some FEA projects before and it's served me well.

Thanks. Figuring out how to compile these libraries for Windows seem to be the most difficult part.

There is a simple, stand-alone implementation in Bullet, which is free for commercial use.

-Josh

Have you looked at the NIST implementations? These are government created public-domain (

I believe) implementations for matrices. The JAMA libraries have implementations for Cholesky, LU, SVD, Eigenvalues, and QR Factorizations.