• FEATURED

View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# OpenGL ES 2.0: Point light optimisation

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

10 replies to this topic

### #1Tchom  Members

Posted 06 June 2013 - 08:08 AM

Hey devs!

I've been working on a OpenGL ES 2.0 android engine and I have begun implementing some simple (point) lighting. I had something fairly simple working, so I tried to get fancy and added color-tinting light. And it works great... with only one or two lights. Any more than that, the application drops about 15 frames per light added (my ideal is at least 4 or 5). I know implementing lighting is expensive, I just didn't think it was that expensive. I'm fairly new to the world of OpenGL and GLSL, so there is a good chance I've written some crappy shader code. If anyone had any feedback or tips on how I can optimize this code, please let me know.

    uniform mat4 u_MVPMatrix;
uniform mat4 u_MVMatrix;
attribute vec4 a_Position;
attribute vec3 a_Normal;
attribute vec2 a_TexCoordinate;
varying vec3 v_Position;
varying vec3 v_Normal;
varying vec2 v_TexCoordinate;

void main()	{
v_Position = vec3(u_MVMatrix * a_Position);
v_TexCoordinate = a_TexCoordinate;
v_Normal = vec3(u_MVMatrix * vec4(a_Normal, 0.0));
gl_Position = u_MVPMatrix * a_Position;
}


precision mediump float;
uniform vec4 u_LightPos["+numLights+"];
uniform vec4 u_LightColours["+numLights+"];
uniform float u_LightPower["+numLights+"];
uniform sampler2D u_Texture;
varying vec3 v_Position;
varying vec3 v_Normal;
varying vec2 v_TexCoordinate;

void main()
{
gl_FragColor = (texture2D(u_Texture, v_TexCoordinate));
float diffuse = 0.0;
vec4 colourSum = vec4(1.0);
for (int i = 0; i < "+numLights+"; i++) {
vec3 toPointLight = vec3(u_LightPos[i]);
float distance = length(toPointLight - v_Position);
vec3 lightVector = normalize(toPointLight - v_Position);
float diffuseDiff = 0.0; // The diffuse difference contributed from current light
diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuatio
diffuse += diffuseDiff;
gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part
}
diffuse += 0.1; //Add ambient light
gl_FragColor.rgb *= diffuse;
}



Am I making any rookie mistakes? Or am I just being unrealistic about what I can do? Thanks in advance

### #2Hodgman  Moderators

Posted 06 June 2013 - 08:49 AM

Any more than that, the application drops about 15 frames per light added

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless
For a "drop in FPS" to be meaningful, you need to know what FPS it dropped from so you've got an absolute starting point. It's generally better to just always use "1000/FPS" (milliseconds per frame) rather than FPS so that you're always talking in absolute terms rather than relative terms.

    float distance = length(toPointLight - v_Position);
vec3 lightVector = normalize(toPointLight - v_Position);
Both length and normalize involve a square root. You're also performing the same subtraction twice. Maybe your device/driver will optimize this code for you, but maybe it won't
If you don't trust it, you can rewrite it optimally yourself as:
    vec3 lightVector = toPointLight - v_Position;
float distance = length(lightVector);
lightVector /= distance;

diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
GPUs can usually clamp the results of operations to the 0-1 range "for free", but clamping from 0-infinity has a cost. In this case, it may be faster to use:
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);

diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuation
Here you're performing math on a uniform variable. You can eliminate the "1-u..." operation by doing it once on the CPU, and storing "1-u_blah" in the uniform instead.

You may also be able to compute distance*distance (distance squared) for free, by changing the earlier distance calculation like this:
    vec3 lightVector = toPointLight - v_Position;
float distanceSquared = dot(lightVector, lightVector);
float distance = sqrt(distanceSquared);
lightVector /= distance;

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part
What's the theory behind this line? Is it necessary?

Edited by Hodgman, 06 June 2013 - 08:56 AM.

### #3kalle_h  Members

Posted 06 June 2013 - 10:43 AM

Any more than that, the application drops about 15 frames per light added

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless
For a "drop in FPS" to be meaningful, you need to know what FPS it dropped from so you've got an absolute starting point. It's generally better to just always use "1000/FPS" (milliseconds per frame) rather than FPS so that you're always talking in absolute terms rather than relative terms.

    float distance = length(toPointLight - v_Position);
vec3 lightVector = normalize(toPointLight - v_Position);
Both length and normalize involve a square root. You're also performing the same subtraction twice. Maybe your device/driver will optimize this code for you, but maybe it won't
If you don't trust it, you can rewrite it optimally yourself as:
    vec3 lightVector = toPointLight - v_Position;
float distance = length(lightVector);
lightVector /= distance;

diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
GPUs can usually clamp the results of operations to the 0-1 range "for free", but clamping from 0-infinity has a cost. In this case, it may be faster to use:
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);

diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuation
Here you're performing math on a uniform variable. You can eliminate the "1-u..." operation by doing it once on the CPU, and storing "1-u_blah" in the uniform instead.

You may also be able to compute distance*distance (distance squared) for free, by changing the earlier distance calculation like this:
    vec3 lightVector = toPointLight - v_Position;
float distanceSquared = dot(lightVector, lightVector);
float distance = sqrt(distanceSquared);
lightVector /= distance;

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part
What's the theory behind this line? Is it necessary?

    vec3 lightVector = toPointLight - v_Position;
float distanceSquared = dot(lightVector, lightVector);
float invDistance = inverseSqrt(distanceSquared);
lightVector *= invDistance;
float distance = invDistance * distanceSquared;


This optimize sqrt to cheaper invSqrt  and replace division by multiply but add one multiply.

Don't use vec4 for light color alpha channel does not belong there. Just use  vec3 to save ALU's. If you are only calculating diffuse part you propably can calculate all lights in vertex shader without notably loss in quality.

You could try lot cheaper attenuation func and premultiply intensity and light color at cpu.

Then all relevant math would be just.

diffuse += lightColorAndIntesity * (nDotL / distanceSquared);


### #4C0lumbo  Members

Posted 06 June 2013 - 11:18 AM

You can get significant speedups using lowp a bit more (on PowerVR at least). I haven't spent the time to follow the logic of the existing code fully, but there are definitely quite a few operations that can be made to be low precision.

### #5Tchom  Members

Posted 06 June 2013 - 06:20 PM

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless

Oops, sorry about the confusion. I come from a Flash background, so I'm stuck in the habit of using FPS as a measurement of efficiency (I'll get better). My original and target speed was 16.6ms per frame, but it increase to 22.ms after three light, 33.3ms after four, etc etc.

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part

What's the theory behind this line? Is it necessary?

This line determines how much to tint the fragment's rgb by each light's rgb. If 'diffuseDiff' is 0, then gl_FragColor.rgb would just be multiplied by vec3(1.0). If 'diffuseDiff' is 1, gl_FragColor.rgb is multiplied by the full value of vec3(u_LightColours[i]).

Thanks heaps guys. I'll try implementing this when I get home from work and report back on the difference. I really appreciate the help.

### #6Tchom  Members

Posted 07 June 2013 - 07:05 AM

I implemented those suggestions and that helped heaps. I had my own brainwave (slapped myself for not thinking of it before), and added a limit to the distance from the light that attenuation and colour would be calculated at.

if(distance < limit) {
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);
diffuseDiff = diffuseDiff * (1.0 / (1.0 + (u_LightPower[i]* distanceSquared)));
gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - u_LightColours[i])*diffuseDiff)));
}


Hopefully that might help someone else

### #7kalle_h  Members

Posted 07 June 2013 - 09:26 AM

That limit will cause popping that does not look good. I would suggest to remove it.

Your tinting code is also quite odd and its really hard to even understand what you want to do with it. Also it's quite expensive and I would suggest to remove it and using more physically plausible light model.

### #8Tchom  Members

Posted 08 June 2013 - 12:13 AM

That limit will cause popping that does not look good. I would suggest to remove it.

Yeah, I worked that out with some experimentation .

Is there a more typical way to colour light?

### #9Hodgman  Moderators

Posted 08 June 2013 - 02:04 AM

The straightforward way to have coloured lights would be:
vec3 diffuse = vec3(0);
...
diffuseDiff = ...N.L * attenuation...
diffuse += diffuseDiff * u_LightColours[i].rgb;
//OR if you really want to keep colour and intensity separate, instead of pre-multiplying them on the CPU:
diffuse += (diffuseDiff * u_LightColours[i].a) * u_LightColours[i].rgb;//n.b. scalar mul first, then scalar/vector mul
...
gl_FragColor.rgb = diffuseTexture * diffuse;
If you want to use an if statement to only compute lighting within a certain radius, you've also got to modify your attenuation function so that it does actually reach zero by that radius. There's an example at the bottom of this blog post: http://imdoingitwrong.wordpress.com/2011/01/31/light-attenuation/

Keep in mind though that older GPUs do not deal well with branching. The GPU will typically working on a large number of pixels at once (e.g. 64) and if any of them take a certain branch, then all of them pay the cost of executing that branch. The branch instruction itself may also be expensive, regardless of which path is taken (e.g. a dozen cycles -- so if the branch is not skipping more than a dozen basic math operations in the average case, it may not be worth it).
As with any kind of optimisation, you should be sure to profile before and after in order to be sure that it's actually helping.

Edited by Hodgman, 08 June 2013 - 02:08 AM.

### #10Tchom  Members

Posted 08 June 2013 - 04:53 AM

Perfect! Thank you! Works a treat

### #11marcClintDion  Members

Posted 11 June 2013 - 06:56 AM

You could also move these to the vertex shader so they are only being calculated once for every vertex instead of once every fragment.  For low poly models which take up a lot of screen space this should be much more efficient.
//---------------------------------------------------------------------------------------------
vec3 toPointLight = vec3(u_LightPos[i]);
float distance = length(toPointLight - v_Position);
vec3 lightVector = toPointLight - v_Position;
float diffuseDiff = 0.0; // The diffuse difference contributed from current light

Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.