True that, however, the OP is only planning on using that function with constant literal arguments, and hoping that the HLSL compiler then goes ahead and evaluates the branches at compile-time.
A branching version is quite likely to run even slower than just calling pow directly
The OP mentioned that they did that, and it wasn't optimised. They're wondering if the D3D ASM -> GPU ASM compilation process done by your driver will actually perform this optimisation or not.
If you compile with fxc you can output the intermediate directx assembly text file, and just check what pow(x,2) was transformed to.
I actually trust my HLSL compiler more than my C++ compiler for a lot of things, but they're still very dumb sometimes.
The general consense on this topic is, that the compiler will be most likely better than your handwritten optimization, maybe not today, but tomorrow.
I've got a full 2x boost by 'massaging' my HLSL code into something uglier that the compiler was more able to easily digest... Actually, we wouldn't have been able to ship our last game running at 30Hz on the min specs if we didn't hand optimize all the HLSL code to do things that I originally assumed the compiler would be smart enough to do.
That said, the rules for "fast HLSL" change from time to time -- if you're targeting a DX9-era card, you want to hand-vectorize your code to use all 4 components of a float4 wherever possible to reduce instruction counts (FXC does actually do a good job of auto-vectorizing, but not as good as a human), but if you're targeting a DX10-era card, you want to mask off as few elements as possible (e.g. if you only need xyz, make sure to use a float3, or a float4 with .xyz on the end) and not be afraid of scalar math.
So, you can help the compiler to produce much faster code, but you do also need to know which kind of GPU architecture you're targeting while micro-optimizing your HLSL code.
Also, keep in mind that 1080p has ~2.5M pixels, making your pixel shaders the most intensely burdened tight loop in your entire code base, which means a small inefficiency can have a very large impact.