float constpow(float x, uint y) { if (y == 0) return 1; //Cost 0 if (y == 1) return x; //Cost 0 float x2 = x * x; //Cost 1 if (y == 2) return x2; //Cost 1 if (y == 3) return x2 * x; //Cost 2 float x4 = x2 * x2; //Cost 2 if (y == 4) return x4; //Cost 2 if (y == 5) return x4 * x; //Cost 3 if (y == 6) return x4 * x2; //Cost 3 if (y == 7) return x4 * x2 * x; //Cost 4 float x8 = x4 * x4; //Cost 3 if (y == 8) return x8; //Cost 3 if (y == 9) return x8 * x; //Cost 4 if (y == 10) return x8 * x2; //Cost 4 if (y == 11) return x8 * x2 * x; //Cost 5 if (y == 12) return x8 * x4; //Cost 4 if (y == 13) return x8 * x4 * x; //Cost 5 if (y == 14) return x8 * x4 * x2; //Cost 5 float x16 = x8 * x8; //Cost 4 if (y == 16) return x16; //Cost 4 if (y == 17) return x16 * x; //Cost 5 if (y == 18) return x16 * x2; //Cost 5 if (y == 20) return x16 * x4; //Cost 5 if (y == 24) return x16 * x8; //Cost 5 if (y == 32) return x16 * x16; //Cost 5 return pow(x, y); }

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it. I'd obviously only use this when y is constant. I obviously don't want to have any dynamic branching here.