I've been digging into it a bit more and going through some lectures from MIT and UCSD, the good news is that I'm understand the rules of Matrix multiplication and transforming to different spaces but unfortunately I'm still having the same problems.
I tried your matrix but this resulted in the terms canceling each other out so that scale values had no effect. Perhaps if you could explain what that matrix should do in a bit more detail?
I tried changing the order so that rotation was applied first and then scaling, this did appear to properly preserve the scaling along the local X,Y,Z axis but at the expense of shearing the cube as well.
I guess I'm trying to understand why W= S*R*T is the commonly chosen convention as opposed to W = R*S*T as you've recommended for this specific case.
For completeness I've tried checking with other engines with my same setup. (1x1x1 unit cube, scaled x 4 in the X axis, x2 in the Y Axis, translated to the right and down and then rotating) The engine most similar to mine in terms of language and platform is Away3D (http://away3d.com/) which is supported by Adobe and also open source so I can dive in and see how they are doing things. They have the exact same problem as I do with this test case which is comforting in that I'm not missing anything obvious but disappointing in that the problem still isn't solved.
I fired up Unity3D after that and set up the same test case. Unity does seem to handle it correctly but they have methods specific to rotationAroundLocal and scaleLocal.
So the next step will be to do the multiplications by hand and compare with the values spit out from my engine and use unity as an example and see if I can figure out where things are going wrong.