  1. For fast square root and FloatToInt you can use this: http://www.nvidia.com/object/fast_math_routines.html
    Even if you use 'a', the compile might still optimize the loop away and replace it by a simple assignment. Few compilers out there recognize such simple pattern very well.
