Thank you guys for the replies, but I still have to clear up something...
I hope you don't mind, and the whole community could benefit from such polemics.
1. If you're processing a large mesh, transforming, distorting or whatever other usage you've encountered then yes matrix operations can dominate in that situation. hdxpete hasn't said that he's doing so it might be that he simply wants to use an optimised library throughout his project right from the start, rather than use one he writes himself. Why pick a slower option if there's an existing better one?
I also thought at the first glance he has something seriously do with matrix calculus, but from the rest of his posts I really doubt it is so.
Transforming large meshes? On a CPU? Could you give an example? I really think that should be done on a GPU.
2. All applications you've developed required double precision? Then you're missing out on SIMD performance benefits, wasting memory needlessly and not running on a console/mobile platform. The only situation I currently encounter the usage of doubles is in space games and even there I'm pushing to get as much as possible back into float to avoid the conversions between double precision generation of the terrain and the floating point vector representation for rendering.
I have to admit that I've never developed for a console platform, but on a PC I have never noticed a performance lost when using doubles instead of single precision floats. As I've already said, those operations are rare to be noticed. Texture streaming, for example, is something more noticeable than calculation of a transformation matrix. And that's where I'm using SIMD libraries. For example, for texture compression. BTW, a large terrain rendering is a focus of my graphics programming currently.
3. This is a valid question however it does come with the caveat that simply using a decent SIMD maths library might also force you to write code in a more data centric way so having it from the start might also be a good idea.
It is not a bad idea per se, I just pointed out this could make complications without a real need for that.
2. If you actually need double precision for a matrix, then you're either doing science (and need the accuracy), or you're doing something very wrong indeed. It may be that you have a huge game world, and are losing precision when the position is far from the origin (in which case, store an integer based 3D grid reference along with the matrix), otherwise the problem is possibly something that could be fixed by reordering your equations for better accuracy, or a simple orthogonalise might be whats needed.
3. Double precision typically halves the performance of your code (doubles the amount of time spent reading / writing data). If your ultimate criteria is speed, double precision is not a good idea....
Currently I'm working on a high precision massive terrain algorithm. I've modeled Earth based on WGS84 ellipsoid, geoid undulation, and high precision (submeter precision for the whole Earth) DEM with the ability to place the viewer 1 micron above the surface without visible artifacts. Of course, there is no need to place viewer on such hight, but it is just a demonstration of the power.
Math is done partially on the CPU (in double precision, only for the viewer position) and partially on the GPU (in single precision for all vertices of the terrain). Frame-rate is really high so I'm probably doing things right.
I know that double precision on a GPU requires SM5 cards, and is at least two times slower (in fact the factor is much higher and depends on a concrete architecture). That's why I'm using single precision on a GPU. But on the CPU, I have never had performance problems with DP calculations. On the other hand, things I'm working on would be impossible without DP.