We hand optimized only for the almost-min-spec because the vast majority of sales are for it, rather than say, DX11 PCs (and they can run it regardless of perfect HLSL code ). For absolute-min-spec, we just by default disabled some features to get frame-rate up, and didn't care too much because it's only your mother and she won't notice.
The interesting question is, did you leave this optimization for all hardware, or did you use (pre-processor-)branching to optimize your shader for certain videochip classes only... I can understand that external pressure (this title must run at 30 fps on my mother PCs)
Also, as you said, compilers are always getting better, so hopefully a modern PC's driver can pull apart our hand-vectorized shader code and put it back together into "modern" efficient code. On this topic -- the Unity guys actually compile their GLSL code (performing optimisations), and then output the results as regular text GLSL code -- so that on drivers with bad GLSL compilers, the result is still optimized!
Sorry for off-topic
@OP - are you using literal pow values often enough to warrant this effort? I can only remember using constants of maybe 2/3/4/5, and I've just written your unrolled versions in-place for those cases. If a compiler is smart enough to realize that pow(x,2)==x*x, then it should also be smart enough to realise that (x*x)*(x*x)==pow(x,4) and pick the best anyway -- so if the hand optimisation is harmful to a new GPU with a smart compiler, it would be able to undo your cleverness.