Jump to content

  • Log In with Google      Sign In   
  • Create Account

#ActualCryZe

Posted 10 October 2012 - 07:31 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)
float rLDotH = 1 - LDotH;
float rLDotH2 = rLDotH * rLDotH;
float rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;
float fresnel = reflectivity + (1 - reflectivity) * rLDotH5;
//Or even simpler:
//float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);
//which has the same effect on the resulting assembly


Or this way:
float fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);

Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

#23CryZe

Posted 10 October 2012 - 07:31 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)
float rLDotH = 1 - LDotH;
float rLDotH2 = rLDotH * rLDotH;
float rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;
float fresnel = reflectivity + (1 - reflectivity) * rLDotH5;
//Or even simpler:
//float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);
//which has the same effect on the resulting assembly


Or this way:
float fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);

Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

#22CryZe

Posted 10 October 2012 - 07:30 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)
float rLDotH = 1 - LDotH;
float rLDotH2 = rLDotH * rLDotH;
float rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;
float fresnel = reflectivity + (1 - reflectivity) * rLDotH5;
//Or even simpler:
//float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);
//which has the same effect on the resulting assembly


Or this way:
float fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);

Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

#21CryZe

Posted 10 October 2012 - 07:30 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)

float rLDotH = 1 - LDotH;
float rLDotH2 = rLDotH * rLDotH;
float rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;
float fresnel = reflectivity + (1 - reflectivity) * rLDotH5;
//Or even simpler:
//float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);
//which has the same effect on the resulting assembly


Or this way:

float fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);



Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

#20CryZe

Posted 10 October 2012 - 07:25 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)
float rLDotH = 1 - LDotH;
float rLDotH2 = rLDotH * rLDotH;
float rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;
float fresnel = reflectivity + (1 - reflectivity) * rLDotH5;
//Or even simpler:
//float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);
//which has the same effect on the resulting assembly


Or this way:
float fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);

Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

#19CryZe

Posted 10 October 2012 - 07:24 AM

Thanks for the answers. Good to know, that FXC optimizes pow(x, 2). I thought, that I checked that, but looks like I didn't Posted Image

But I'm still pretty sure that it won't optimize it for other literals (couldn't test it in the meantime though). The thing is that pow(x, 2) isn't really what's interesting about it. I just put it in the title to make clear what this topic is about. It's pretty obvious that a single MUL is always faster or at least as fast as a POW. It get's more interesting for other literals though. Especially when one is implementing Schlick's approximation of Fresnel.

One could implement it this way: (wow gamedev.net can't handle multiple lines of code in the code-tag right now, that explains why so many users post such misaligned code right now)
[source lang="cpp"]float rLDotH = 1 - LDotH;[color=#0000ff]float[/color] rLDotH2 = rLDotH * rLDotH;[color=#0000ff]float[/color] rLDotH5 = rLDotH2 * rLDotH2 * rLDotH;[color=#0000ff]float[/color] fresnel = reflectivity + (1 - reflectivity) * rLDotH5;[color=#008000]//Or even simpler://float fresnel = reflectivity + (1 - reflectivity) * constpow(1 - LDotH, 5);//which has the same effect on the resulting assembly[/color][/source]

Or this way:
[source lang="cpp"][color=#0000ff]float[/color] fresnel = reflectivity + (1 - reflectivity) * pow(1 - LDotH, 5);[/source]

Which one is the preferable implementation? Like I said, I had no time to do some benchmarks, but I'll do some. But the thing is, that my graphics card might have a pretty slow or really fast POW instruction compared to other graphics cards. So it might not be that representative of all graphics cards. So a single benchmark won't tell me which implementation is the more preferable in the average case.

PARTNERS