OpenGL ES 2.0: Point light optimisation

Started by
9 comments, last by marcClintDion 10 years, 11 months ago

Hey devs!

I've been working on a OpenGL ES 2.0 android engine and I have begun implementing some simple (point) lighting. I had something fairly simple working, so I tried to get fancy and added color-tinting light. And it works great... with only one or two lights. Any more than that, the application drops about 15 frames per light added (my ideal is at least 4 or 5). I know implementing lighting is expensive, I just didn't think it was that expensive. I'm fairly new to the world of OpenGL and GLSL, so there is a good chance I've written some crappy shader code. If anyone had any feedback or tips on how I can optimize this code, please let me know.

Vertex Shader


    uniform mat4 u_MVPMatrix;		
    uniform mat4 u_MVMatrix;			  
    attribute vec4 a_Position;		
    attribute vec3 a_Normal;	
    attribute vec2 a_TexCoordinate;
    varying vec3 v_Position;	
    varying vec3 v_Normal;		          			
    varying vec2 v_TexCoordinate;	
	          		
    void main()	{
        v_Position = vec3(u_MVMatrix * a_Position);		     	
        v_TexCoordinate = a_TexCoordinate;		
        v_Normal = vec3(u_MVMatrix * vec4(a_Normal, 0.0));
        gl_Position = u_MVPMatrix * a_Position;	
}                             		

Fragment Shader


precision mediump float;
uniform vec4 u_LightPos["+numLights+"];
uniform vec4 u_LightColours["+numLights+"];
uniform float u_LightPower["+numLights+"];
uniform sampler2D u_Texture;
varying vec3 v_Position;
varying vec3 v_Normal;
varying vec2 v_TexCoordinate;

void main()
{
   gl_FragColor = (texture2D(u_Texture, v_TexCoordinate));
   float diffuse = 0.0;
   vec4 colourSum = vec4(1.0);
   for (int i = 0; i < "+numLights+"; i++) {
   	vec3 toPointLight = vec3(u_LightPos);
   	float distance = length(toPointLight - v_Position);
   	vec3 lightVector = normalize(toPointLight - v_Position);
   	float diffuseDiff = 0.0; // The diffuse difference contributed from current light
   	diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
   	diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower)* distance * distance))); //Determine attenuatio
   	diffuse += diffuseDiff;
   	gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours))*diffuseDiff))); //The expensive part
   }
  diffuse += 0.1; //Add ambient light
   gl_FragColor.rgb *= diffuse;
}

Am I making any rookie mistakes? Or am I just being unrealistic about what I can do? Thanks in advance

Advertisement

Any more than that, the application drops about 15 frames per light added

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless wink.png
For a "drop in FPS" to be meaningful, you need to know what FPS it dropped from so you've got an absolute starting point. It's generally better to just always use "1000/FPS" (milliseconds per frame) rather than FPS so that you're always talking in absolute terms rather than relative terms.
    float distance = length(toPointLight - v_Position);
    vec3 lightVector = normalize(toPointLight - v_Position);
Both length and normalize involve a square root. You're also performing the same subtraction twice. Maybe your device/driver will optimize this code for you, but maybe it won't sad.png
If you don't trust it, you can rewrite it optimally yourself as:
    vec3 lightVector = toPointLight - v_Position;
    float distance = length(lightVector);
    lightVector /= distance;
diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
GPUs can usually clamp the results of operations to the 0-1 range "for free", but clamping from 0-infinity has a cost. In this case, it may be faster to use:
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);
diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower)* distance * distance))); //Determine attenuation
Here you're performing math on a uniform variable. You can eliminate the "1-u..." operation by doing it once on the CPU, and storing "1-u_blah" in the uniform instead.

You may also be able to compute distance*distance (distance squared) for free, by changing the earlier distance calculation like this:
    vec3 lightVector = toPointLight - v_Position;
    float distanceSquared = dot(lightVector, lightVector);
    float distance = sqrt(distanceSquared);
    lightVector /= distance;
gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours))*diffuseDiff))); //The expensive part
What's the theory behind this line? Is it necessary?

Any more than that, the application drops about 15 frames per light added

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless wink.png
For a "drop in FPS" to be meaningful, you need to know what FPS it dropped from so you've got an absolute starting point. It's generally better to just always use "1000/FPS" (milliseconds per frame) rather than FPS so that you're always talking in absolute terms rather than relative terms.

    float distance = length(toPointLight - v_Position);
    vec3 lightVector = normalize(toPointLight - v_Position);
Both length and normalize involve a square root. You're also performing the same subtraction twice. Maybe your device/driver will optimize this code for you, but maybe it won't sad.png
If you don't trust it, you can rewrite it optimally yourself as:

    vec3 lightVector = toPointLight - v_Position;
    float distance = length(lightVector);
    lightVector /= distance;

diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
GPUs can usually clamp the results of operations to the 0-1 range "for free", but clamping from 0-infinity has a cost. In this case, it may be faster to use:

diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);

diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower)* distance * distance))); //Determine attenuation
Here you're performing math on a uniform variable. You can eliminate the "1-u..." operation by doing it once on the CPU, and storing "1-u_blah" in the uniform instead.

You may also be able to compute distance*distance (distance squared) for free, by changing the earlier distance calculation like this:

    vec3 lightVector = toPointLight - v_Position;
    float distanceSquared = dot(lightVector, lightVector);
    float distance = sqrt(distanceSquared);
    lightVector /= distance;

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours))*diffuseDiff))); //The expensive part
What's the theory behind this line? Is it necessary?


    vec3 lightVector = toPointLight - v_Position;
    float distanceSquared = dot(lightVector, lightVector);
    float invDistance = inverseSqrt(distanceSquared);
    lightVector *= invDistance;
    float distance = invDistance * distanceSquared;
 

This optimize sqrt to cheaper invSqrt and replace division by multiply but add one multiply.

Tchom:

Don't use vec4 for light color alpha channel does not belong there. Just use vec3 to save ALU's. If you are only calculating diffuse part you propably can calculate all lights in vertex shader without notably loss in quality.

You could try lot cheaper attenuation func and premultiply intensity and light color at cpu.

Then all relevant math would be just.


diffuse += lightColorAndIntesity * (nDotL / distanceSquared);
 

You can get significant speedups using lowp a bit more (on PowerVR at least). I haven't spent the time to follow the logic of the existing code fully, but there are definitely quite a few operations that can be made to be low precision.

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless

Oops, sorry about the confusion. I come from a Flash background, so I'm stuck in the habit of using FPS as a measurement of efficiency (I'll get better). My original and target speed was 16.6ms per frame, but it increase to 22.ms after three light, 33.3ms after four, etc etc.


gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part

What's the theory behind this line? Is it necessary?

This line determines how much to tint the fragment's rgb by each light's rgb. If 'diffuseDiff' is 0, then gl_FragColor.rgb would just be multiplied by vec3(1.0). If 'diffuseDiff' is 1, gl_FragColor.rgb is multiplied by the full value of vec3(u_LightColours[i]).

Thanks heaps guys. I'll try implementing this when I get home from work and report back on the difference. I really appreciate the help.

I implemented those suggestions and that helped heaps. I had my own brainwave (slapped myself for not thinking of it before), and added a limit to the distance from the light that attenuation and colour would be calculated at.


if(distance < limit) {	
	diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);		
 	diffuseDiff = diffuseDiff * (1.0 / (1.0 + (u_LightPower* distanceSquared)));	
	gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - u_LightColours)*diffuseDiff)));
}

Hopefully that might help someone else

That limit will cause popping that does not look good. I would suggest to remove it.

Your tinting code is also quite odd and its really hard to even understand what you want to do with it. Also it's quite expensive and I would suggest to remove it and using more physically plausible light model.

That limit will cause popping that does not look good. I would suggest to remove it.

Yeah, I worked that out with some experimentation sad.png.

Is there a more typical way to colour light?

The straightforward way to have coloured lights would be:
vec3 diffuse = vec3(0);
...
  diffuseDiff = ...N.L * attenuation...
  diffuse += diffuseDiff * u_LightColours.rgb;
  //OR if you really want to keep colour and intensity separate, instead of pre-multiplying them on the CPU:
  diffuse += (diffuseDiff * u_LightColours.a) * u_LightColours.rgb;//n.b. scalar mul first, then scalar/vector mul
...
gl_FragColor.rgb = diffuseTexture * diffuse;
If you want to use an if statement to only compute lighting within a certain radius, you've also got to modify your attenuation function so that it does actually reach zero by that radius. There's an example at the bottom of this blog post: http://imdoingitwrong.wordpress.com/2011/01/31/light-attenuation/

Keep in mind though that older GPUs do not deal well with branching. The GPU will typically working on a large number of pixels at once (e.g. 64) and if any of them take a certain branch, then all of them pay the cost of executing that branch. The branch instruction itself may also be expensive, regardless of which path is taken (e.g. a dozen cycles -- so if the branch is not skipping more than a dozen basic math operations in the average case, it may not be worth it).
As with any kind of optimisation, you should be sure to profile before and after in order to be sure that it's actually helping.

Perfect! Thank you! Works a treat

This topic is closed to new replies.

Advertisement