Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualCryZe

Posted 09 October 2012 - 05:36 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^32 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
	if (y == 0)
		return 1; //Cost 0

	if (y == 1)
		return x; //Cost 0

	float x2 = x * x; //Cost 1

	if (y == 2)
		return x2; //Cost 1

	if (y == 3)
		return x2 * x; //Cost 2

	float x4 = x2 * x2; //Cost 2

	if (y == 4)
		return x4; //Cost 2

	if (y == 5)
		return x4 * x; //Cost 3

	if (y == 6)
		return x4 * x2; //Cost 3

	if (y == 7)
		return x4 * x2 * x; //Cost 4

	float x8 = x4 * x4; //Cost 3

	if (y == 8)
		return x8; //Cost 3

	if (y == 9)
		return x8 * x; //Cost 4

	if (y == 10)
		return x8 * x2; //Cost 4

	if (y == 11)
		return x8 * x2 * x; //Cost 5

	if (y == 12)
		return x8 * x4; //Cost 4

	if (y == 13)
		return x8 * x4 * x; //Cost 5

	if (y == 14)
		return x8 * x4 * x2; //Cost 5

	float x16 = x8 * x8; //Cost 4

	if (y == 16)
		return x16; //Cost 4

	if (y == 17)
		return x16 * x; //Cost 5

	if (y == 18)
		return x16 * x2; //Cost 5

	if (y == 20)
		return x16 * x4; //Cost 5

	if (y == 24)
		return x16 * x8; //Cost 5

	if (y == 32)
		return x16 * x16; //Cost 5

	return pow(x, y);
}

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it. I'd obviously only use this when y is constant. I obviously don't want to have any dynamic branching here.

#7CryZe

Posted 09 October 2012 - 05:33 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^32 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
	if (y == 0)
		return 1; //Cost 0

	if (y == 1)
		return x; //Cost 0

	float x2 = x * x; //Cost 1

	if (y == 2)
		return x2; //Cost 1

	if (y == 3)
		return x2 * x; //Cost 2

	float x4 = x2 * x2; //Cost 2

	if (y == 4)
		return x4; //Cost 2

	if (y == 5)
		return x4 * x; //Cost 3

	if (y == 6)
		return x4 * x2; //Cost 3

	if (y == 7)
		return x4 * x2 * x; //Cost 4

	float x8 = x4 * x4; //Cost 3

	if (y == 8)
		return x8; //Cost 3

	if (y == 9)
		return x8 * x; //Cost 4

	if (y == 10)
		return x8 * x2; //Cost 4

	if (y == 11)
		return x8 * x2 * x; //Cost 5

	if (y == 12)
		return x8 * x4; //Cost 4

	if (y == 13)
		return x8 * x4 * x; //Cost 5

	if (y == 14)
		return x8 * x4 * x2; //Cost 5

	float x16 = x8 * x8; //Cost 4

	if (y == 16)
		return x16; //Cost 4

	if (y == 17)
		return x16 * x; //Cost 5

	if (y == 18)
		return x16 * x2; //Cost 5

	if (y == 20)
		return x16 * x4; //Cost 5

	if (y == 24)
		return x16 * x8; //Cost 5

	if (y == 32)
		return x16 * x16; //Cost 5

	return pow(x, y);
}

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it.

#6CryZe

Posted 09 October 2012 - 05:33 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^32 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
    if (y == 0)
	    return 1; //Cost 0
    if (y == 1)
	    return x; //Cost 0
    float x2 = x * x; //Cost 1

    if (y == 2)
	    return x2; //Cost 1
    if (y == 3)
	    return x2 * x; //Cost 2
    float x4 = x2 * x2; //Cost 2
    if (y == 4)
	    return x4; //Cost 2
    if (y == 5)
	    return x4 * x; //Cost 3
    if (y == 6)
	    return x4 * x2; //Cost 3
    if (y == 7)
	    return x4 * x2 * x; //Cost 4
    float x8 = x4 * x4; //Cost 3
    if (y == 8)
	    return x8; //Cost 3
    if (y == 9)
	    return x8 * x; //Cost 4
    if (y == 10)
	    return x8 * x2; //Cost 4
    if (y == 11)
	    return x8 * x2 * x; //Cost 5
    if (y == 12)
	    return x8 * x4; //Cost 4
    if (y == 13)
	    return x8 * x4 * x; //Cost 5
    if (y == 14)
	    return x8 * x4 * x2; //Cost 5
    float x16 = x8 * x8; //Cost 4
    if (y == 16)
	    return x16; //Cost 4
    if (y == 17)
	    return x16 * x; //Cost 5
    if (y == 18)
	    return x16 * x2; //Cost 5
    if (y == 20)
	    return x16 * x4; //Cost 5
    if (y == 24)
	    return x16 * x8; //Cost 5
    if (y == 32)
	    return x16 * x16; //Cost 5
    return pow(x, y);
}

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it.

#5CryZe

Posted 09 October 2012 - 05:28 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^24 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
	if (y == 0)
		return 1; //Cost 0

	if (y == 1)
		return x; //Cost 0

	float x2 = x * x; //Cost 1

	if (y == 2)
		return x2; //Cost 1

	if (y == 3)
		return x2 * x; //Cost 2

	float x4 = x2 * x2; //Cost 2

	if (y == 4)
		return x4; //Cost 2

	if (y == 5)
		return x4 * x; //Cost 3

	if (y == 6)
		return x4 * x2; //Cost 3

	if (y == 7)
		return x4 * x2 * x; //Cost 4

	float x8 = x4 * x4; //Cost 3

	if (y == 8)
		return x8; //Cost 3

	if (y == 9)
		return x8 * x; //Cost 4

	if (y == 10)
		return x8 * x2; //Cost 4

	if (y == 11)
		return x8 * x2 * x; //Cost 5

	if (y == 12)
		return x8 * x4; //Cost 4

	if (y == 13)
		return x8 * x4 * x; //Cost 5

	if (y == 14)
		return x8 * x4 * x2; //Cost 5

	float x16 = x8 * x8; //Cost 4

	if (y == 16)
		return x16; //Cost 4

	if (y == 17)
		return x16 * x; //Cost 5

	if (y == 18)
		return x16 * x2; //Cost 5

	if (y == 20)
		return x16 * x4; //Cost 5

	if (y == 24)
		return x16 * x8; //Cost 5

	return pow(x, y);
}

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it.

#4CryZe

Posted 09 October 2012 - 05:27 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^24 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is be the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
	if (y == 0)
		return 1; //Cost 0

	if (y == 1)
		return x; //Cost 0

	float x2 = x * x; //Cost 1

	if (y == 2)
		return x2; //Cost 1

	if (y == 3)
		return x2 * x; //Cost 2

	float x4 = x2 * x2; //Cost 2

	if (y == 4)
		return x4; //Cost 2

	if (y == 5)
		return x4 * x; //Cost 3

	if (y == 6)
		return x4 * x2; //Cost 3

	if (y == 7)
		return x4 * x2 * x; //Cost 4

	float x8 = x4 * x4; //Cost 3

	if (y == 8)
		return x8; //Cost 3

	if (y == 9)
		return x8 * x; //Cost 4

	if (y == 10)
		return x8 * x2; //Cost 4

	if (y == 11)
		return x8 * x2 * x; //Cost 5

	if (y == 12)
		return x8 * x4; //Cost 4

	if (y == 13)
		return x8 * x4 * x; //Cost 5

	if (y == 14)
		return x8 * x4 * x2; //Cost 5

	float x16 = x8 * x8; //Cost 4

	if (y == 16)
		return x16; //Cost 4

	if (y == 17)
		return x16 * x; //Cost 5

	if (y == 18)
		return x16 * x2; //Cost 5

	if (y == 20)
		return x16 * x4; //Cost 5

	if (y == 24)
		return x16 * x8; //Cost 5

	return pow(x, y);
}

If the drivers would do this themselves, it would probably be better to just leave the pow(x, y) there, because they know better when to optimize it.

#3CryZe

Posted 09 October 2012 - 05:26 AM

The power function usually is about 6 times slower than a simple mad-instruction (at least on current NVidia GPUs). The HLSL compiler itself doesn't optimize it and simply converts the pow into a LOG, a MUL and an EXP instruction. But most constant powers up to x^24 would actually be faster to be calculated by using just MUL instructions. Now the question is: Should I bother optimizing it myself, or do the drivers usually optimize something like this? Here is be the function I would be using, if I had to optimize it myself:

float constpow(float x, uint y)
{
	if (y == 0)
		return 1; //Cost 0

	if (y == 1)
		return x; //Cost 0

	float x2 = x * x; //Cost 1

	if (y == 2)
		return x2; //Cost 1

	if (y == 3)
		return x2 * x; //Cost 2

	float x4 = x2 * x2; //Cost 2

	if (y == 4)
		return x4; //Cost 2

	if (y == 5)
		return x4 * x; //Cost 3

	if (y == 6)
		return x4 * x2; //Cost 3

	if (y == 7)
		return x4 * x2 * x; //Cost 4

	float x8 = x4 * x4; //Cost 3

	if (y == 8)
		return x8; //Cost 3

	if (y == 9)
		return x8 * x; //Cost 4

	if (y == 10)
		return x8 * x2; //Cost 4

	if (y == 11)
		return x8 * x2 * x; //Cost 5

	if (y == 12)
		return x8 * x4; //Cost 4

	if (y == 13)
		return x8 * x4 * x; //Cost 5

	if (y == 14)
		return x8 * x4 * x2; //Cost 5

	float x16 = x8 * x8; //Cost 4

	if (y == 16)
		return x16; //Cost 4

	if (y == 17)
		return x16 * x; //Cost 5

	if (y == 18)
		return x16 * x2; //Cost 5

	if (y == 20)
		return x16 * x4; //Cost 5

	if (y == 24)
		return x16 * x8; //Cost 5

	return pow(x, y);
}

If the drivers would do this themselves, it would probably better to just leave the pow(x, y) there, because they know better when to optimize it.

PARTNERS