Nice and clever opinions But the best solution is neither nice nor clever... Its simple - I've removed square root, where I couldn't extract rooting I implemented nVidia fastmath.h library(Kambiz post), and the best performance was achieved only by removing standart c++ cmath header and therefore removing all pow functions. Now the power calculations are hard-coded.
Maybe It doesn' say much but I'm quite happy with the result as the engine now can initialise up to 10mil grass blades, render (I think) up to 1mil in real-time. And the best part of it that it will be dramatically optimised in the meantime So i think the numbers will multiply and my game graphics will be rather detail and nice.
Thank you all for your time and effort