I was trying to get my parallax occlusion mapping shader to run a dynamically determined number of height-map samples to speed up the overall effect. Basically I based the number of samples on dot(N,V), so that the more straight on the view is to the normal the fewer samples that are needed to make it look good.
Anyways, FXC.exe (the .fx file compiler) kept on giving me an error saying that my dynamic branch didn't seem to terminate with more than 1024 iterations. There was absolutely no way that I could have been using 1024+ iterations, the for loop was termination was based on the number of samples, which was a lerped value between 8 and 50 samples.
To make it seem even stranger, the sample in the DXSDK used the exact same syntax but was able to compile just fine. Needless to say, I was stumped for quite some time.
Then, after reading a post from Simon O'Conner, I tried compiling the effect with the technology preview FXC10.exe. It gave me a subtle hint and said that it was forced to unroll my loop due to the use of a 'gradient' instruction. I had never even heard of a gradient instruction, and I wasn't using a derivative instruction either - so I had to do some research to figure out what the heck was going on.
It turns out that most of the texture sampling instruction require information from an adjacent pixel to determine the mip-map level. Since I was using dynamic branching in the pixel shader, the hardware doesn't know if the adjacent pixel is going to execute the same branch - making it illegal to use any instruction that needs information about the next pixel over.
Apparently this is why all the pixel shader architecture diagrams are shown in groups of four - so that gradient style instructions can be taken care of together. So I had to use tex2Dgrad and calculate the ddx and ddy outside of the dynamic loop to get it all to work together.
So, the end result is that my shader is now compiling and running pretty good. My desktop machine has a GF6200 in it, and it can run at 12 fps @ 320x240. That's pretty slow, but it is a pretty wimpy graphics card as well. I think I still have some optimizing to do.
Anyways, HERE is a video of the latest verion. Let me know what you think!