Thanks, I think I've got it.
So summarize:
- combining 2 multiplications + 1 add is done more efficiently on the GPU, so making 'sets' is better for performance
- I can combine the diffuse and specular light float3's for diffuse total and specular total (diffuseAcc/ specularAcc), for both the directional and point lights
I've made the changes and it's all working nice.
Here's my resulting pixelshader, how does it look?
float4 PS_function(VS_OUTPUT input): COLOR0
{
float4 textureColor = tex2D(textureSampler, input.TexCoord);
float3 normal = normalize(input.Normal);
float3 diffuseAcc = 0.0f;
float3 specularAcc = 0.0f;
/** DIRECTIONAL LIGHTS - PER PIXEL (DIFFUSE & SPECULAR) **/
for(int i=0;i<MaxDirectionalLights;i++)
{
diffuseAcc += saturate(DirLightColInt[i] * dot(normal, DirLightDir[i]));
if(any(MatSpec))
{
float3 lightdir = normalize(DirLightDir[i] - input.wPos);
float3 h = normalize(lightdir + input.ViewDir);
specularAcc += (pow(saturate(dot(h, normal)), MatSpecPower) * DirLightColInt[i]);
}
}
/** POINT LIGHTS - PER PIXEL (DIFFUSE & SPECULAR) **/
for(int i=0;i<MaxPointLights;++i)
{
float3 lightDir = normalize(PointLightPos[i] - input.wPos);
// PER PIXEL ATTENUATION
float dist = length(PointLightPos[i] - input.wPos);
float att = saturate(1 - ((dist - PointLightFPRange[i]) / (PointLightRange[i] - PointLightFPRange[i])));
att *= att; // optional, not correct for full power range !?!?
// DIFFUSE
float diffIntPoint = saturate(dot(normal, lightDir) * att);
diffuseAcc += diffIntPoint * PointLightColInt[i]; // float3
// SPECULAR; USING BLINN HALF ANGLE
if(any(MatSpec))
{
float3 h = normalize(lightDir + input.ViewDir);
specularAcc += pow(saturate(dot(h, normal)), MatSpecPower) * att * PointLightColInt[i];
}
}
/** FINAL PIXEL COLOR **/
return float4(textureColor.rgb * (MatDiff * diffuseAcc + AmbientColInt) + (MatSpec * specularAcc + MatEmi), textureColor.a);
}
Ps.; I might be able to raise the max. number of point lights from 3 to 4 for my Shader Model 2.0 versions of my shaders, because of less instructions
On the optimization side;
- I could use one float2 for point light range and full power range, and save another 'instruction'/constant
Do you see any other small things to improve?
I think I've moved as much as possible from the PX to the VS and pre-calculate as much as possible/ acceptable on the CPU side.