This method only works for special 4x4 matrices in the following form
a00 a01 a02 0
a10 a11 a12 0
a20 a21 a22 0
a30 a31 a32 1
These matrices are only suitable for 3D affine transformation of row vectors (D3D style). They're no good if the transformation includes a non-affine projection! Anyway...
// Compute determinant of a
float det = a00*(a11*a22-a12*a21) - a01*(a10*a22-a12*a20) + a02*(a10*a21-a11*a20);
// Compute b=inverse(a)
float recip = 1.0f/det;
b00 = recip*(a11*a22-a12*a21);
b01 = recip*(a02*a21-a01*a22);
b02 = recip*(a01*a12-a02*a11);
b10 = recip*(a12*a20-a10*a22);
b11 = recip*(a00*a22-a02*a20);
b12 = recip*(a02*a10-a00*a12);
b20 = recip*(a10*a21-a11*a20);
b21 = recip*(a01*a20-a00*a21);
b22 = recip*(a00*a11-a01*a10);
b30 = -a30*b00-a31*b10-a32*b20;
b31 = -a30*b01-a31*b11-a32*b21;
b32 = -a30*b02-a31*b12-a32*b22;
So I count 1 divide, 45 multiplies and 23 adds/subs. I can't count very well though.
This is based on the SMLMatrix4f::Inverse routine in the (free) Intel Small Matrix Library.
I know there are much faster ways if the matrix is orthogonal or even if just the upper left 3x3 submatrix is orthogonal.