Followers 0

# OpenGL OpenGL ES 2.0: Point light optimisation

## 10 posts in this topic

Hey devs!

I've been working on a OpenGL ES 2.0 android engine and I have begun implementing some simple (point) lighting. I had something fairly simple working, so I tried to get fancy and added color-tinting light. And it works great... with only one or two lights. Any more than that, the application drops about 15 frames per light added (my ideal is at least 4 or 5). I know implementing lighting is expensive, I just didn't think it was that expensive. I'm fairly new to the world of OpenGL and GLSL, so there is a good chance I've written some crappy shader code. If anyone had any feedback or tips on how I can optimize this code, please let me know.

    uniform mat4 u_MVPMatrix;
uniform mat4 u_MVMatrix;
attribute vec4 a_Position;
attribute vec3 a_Normal;
attribute vec2 a_TexCoordinate;
varying vec3 v_Position;
varying vec3 v_Normal;
varying vec2 v_TexCoordinate;

void main()	{
v_Position = vec3(u_MVMatrix * a_Position);
v_TexCoordinate = a_TexCoordinate;
v_Normal = vec3(u_MVMatrix * vec4(a_Normal, 0.0));
gl_Position = u_MVPMatrix * a_Position;
}


precision mediump float;
uniform vec4 u_LightPos["+numLights+"];
uniform vec4 u_LightColours["+numLights+"];
uniform float u_LightPower["+numLights+"];
uniform sampler2D u_Texture;
varying vec3 v_Position;
varying vec3 v_Normal;
varying vec2 v_TexCoordinate;

void main()
{
gl_FragColor = (texture2D(u_Texture, v_TexCoordinate));
float diffuse = 0.0;
vec4 colourSum = vec4(1.0);
for (int i = 0; i < "+numLights+"; i++) {
vec3 toPointLight = vec3(u_LightPos[i]);
float distance = length(toPointLight - v_Position);
vec3 lightVector = normalize(toPointLight - v_Position);
float diffuseDiff = 0.0; // The diffuse difference contributed from current light
diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuatio
diffuse += diffuseDiff;
gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part
}
diffuse += 0.1; //Add ambient light
gl_FragColor.rgb *= diffuse;
}



Am I making any rookie mistakes? Or am I just being unrealistic about what I can do? Thanks in advance

0

##### Share on other sites

Any more than that, the application drops about 15 frames per light added

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless
For a "drop in FPS" to be meaningful, you need to know what FPS it dropped from so you've got an absolute starting point. It's generally better to just always use "1000/FPS" (milliseconds per frame) rather than FPS so that you're always talking in absolute terms rather than relative terms.

    float distance = length(toPointLight - v_Position);
vec3 lightVector = normalize(toPointLight - v_Position);
Both length and normalize involve a square root. You're also performing the same subtraction twice. Maybe your device/driver will optimize this code for you, but maybe it won't
If you don't trust it, you can rewrite it optimally yourself as:
    vec3 lightVector = toPointLight - v_Position;
float distance = length(lightVector);
lightVector /= distance;

diffuseDiff = max(dot(v_Normal, lightVector), 0.0);
GPUs can usually clamp the results of operations to the 0-1 range "for free", but clamping from 0-infinity has a cost. In this case, it may be faster to use:
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);

diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuation
Here you're performing math on a uniform variable. You can eliminate the "1-u..." operation by doing it once on the CPU, and storing "1-u_blah" in the uniform instead.

You may also be able to compute distance*distance (distance squared) for free, by changing the earlier distance calculation like this:
    vec3 lightVector = toPointLight - v_Position;
float distanceSquared = dot(lightVector, lightVector);
float distance = sqrt(distanceSquared);
lightVector /= distance;

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part
What's the theory behind this line? Is it necessary?

    vec3 lightVector = toPointLight - v_Position;
float distanceSquared = dot(lightVector, lightVector);
float invDistance = inverseSqrt(distanceSquared);
lightVector *= invDistance;
float distance = invDistance * distanceSquared;


This optimize sqrt to cheaper invSqrt  and replace division by multiply but add one multiply.

Don't use vec4 for light color alpha channel does not belong there. Just use  vec3 to save ALU's. If you are only calculating diffuse part you propably can calculate all lights in vertex shader without notably loss in quality.

You could try lot cheaper attenuation func and premultiply intensity and light color at cpu.

Then all relevant math would be just.

diffuse += lightColorAndIntesity * (nDotL / distanceSquared);

2

##### Share on other sites

You can get significant speedups using lowp a bit more (on PowerVR at least). I haven't spent the time to follow the logic of the existing code fully, but there are definitely quite a few operations that can be made to be low precision.

1

##### Share on other sites

This doesn't mean anything.
Say the app was running at 16fps and it dropped by 15 to 1fps, that means it increased from 62.5ms per frame to 1000ms per frame -- or an increase of 937.5ms.
Say the app was running at 100fps and it dropped by 15 to 85fps, that means it increased from 10ms per frame to 11.8ms per frame -- or an increase of 1.8ms.
Is a drop of 15fps equal to the workload increasing by 1ms or by 1000ms? It's both, so it's meaningless

Oops, sorry about the confusion. I come from a Flash background, so I'm stuck in the habit of using FPS as a measurement of efficiency (I'll get better). My original and target speed was 16.6ms per frame, but it increase to 22.ms after three light, 33.3ms after four, etc etc.

gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part

What's the theory behind this line? Is it necessary?

This line determines how much to tint the fragment's rgb by each light's rgb. If 'diffuseDiff' is 0, then gl_FragColor.rgb would just be multiplied by vec3(1.0). If 'diffuseDiff' is 1, gl_FragColor.rgb is multiplied by the full value of vec3(u_LightColours[i]).

Thanks heaps guys. I'll try implementing this when I get home from work and report back on the difference. I really appreciate the help.

0

##### Share on other sites

I implemented those suggestions and that helped heaps. I had my own brainwave (slapped myself for not thinking of it before), and added a limit to the distance from the light that attenuation and colour would be calculated at.

if(distance < limit) {
diffuseDiff = clamp(dot(v_Normal, lightVector), 0.0, 1.0);
diffuseDiff = diffuseDiff * (1.0 / (1.0 + (u_LightPower[i]* distanceSquared)));
gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - u_LightColours[i])*diffuseDiff)));
}


Hopefully that might help someone else

0

##### Share on other sites

That limit will cause popping that does not look good. I would suggest to remove it.

Your tinting code is also quite odd and its really hard to even understand what you want to do with it. Also it's quite expensive and I would suggest to remove it and using more physically plausible light model.

0

##### Share on other sites

That limit will cause popping that does not look good. I would suggest to remove it.

Yeah, I worked that out with some experimentation .

Is there a more typical way to colour light?

0

##### Share on other sites
The straightforward way to have coloured lights would be:
vec3 diffuse = vec3(0);
...
diffuseDiff = ...N.L * attenuation...
diffuse += diffuseDiff * u_LightColours[i].rgb;
//OR if you really want to keep colour and intensity separate, instead of pre-multiplying them on the CPU:
diffuse += (diffuseDiff * u_LightColours[i].a) * u_LightColours[i].rgb;//n.b. scalar mul first, then scalar/vector mul
...
gl_FragColor.rgb = diffuseTexture * diffuse;
If you want to use an if statement to only compute lighting within a certain radius, you've also got to modify your attenuation function so that it does actually reach zero by that radius. There's an example at the bottom of this blog post: http://imdoingitwrong.wordpress.com/2011/01/31/light-attenuation/

Keep in mind though that older GPUs do not deal well with branching. The GPU will typically working on a large number of pixels at once (e.g. 64) and if any of them take a certain branch, then all of them pay the cost of executing that branch. The branch instruction itself may also be expensive, regardless of which path is taken (e.g. a dozen cycles -- so if the branch is not skipping more than a dozen basic math operations in the average case, it may not be worth it).
As with any kind of optimisation, you should be sure to profile before and after in order to be sure that it's actually helping. Edited by Hodgman
1

##### Share on other sites

Perfect! Thank you! Works a treat

0

##### Share on other sites

You could also move these to the vertex shader so they are only being calculated once for every vertex instead of once every fragment.  For low poly models which take up a lot of screen space this should be much more efficient.
//---------------------------------------------------------------------------------------------
vec3 toPointLight = vec3(u_LightPos[i]);
float distance = length(toPointLight - v_Position);
vec3 lightVector = toPointLight - v_Position;
float diffuseDiff = 0.0; // The diffuse difference contributed from current light

1

## Create an account

Register a new account

Followers 0

• ### Similar Content

• Hello, I have been working on SH Irradiance map rendering, and I have been using a GLSL pixel shader to render SH irradiance to 2D irradiance maps for my static objects. I already have it working with 9 3D textures so far for the first 9 SH functions.
In my GLSL shader, I have to send in 9 SH Coefficient 3D Texures that use RGBA8 as a pixel format. RGB being used for the coefficients for red, green, and blue, and the A for checking if the voxel is in use (for the 3D texture solidification shader to prevent bleeding).
My problem is, I want to knock this number of textures down to something like 4 or 5. Getting even lower would be a godsend. This is because I eventually plan on adding more SH Coefficient 3D Textures for other parts of the game map (such as inside rooms, as opposed to the outside), to circumvent irradiance probe bleeding between rooms separated by walls. I don't want to reach the 32 texture limit too soon. Also, I figure that it would be a LOT faster.
Is there a way I could, say, store 2 sets of SH Coefficients for 2 SH functions inside a texture with RGBA16 pixels? If so, how would I extract them from inside GLSL? Let me know if you have any suggestions ^^.
• By KarimIO
EDIT: I thought this was restricted to Attribute-Created GL contexts, but it isn't, so I rewrote the post.
Hey guys, whenever I call SwapBuffers(hDC), I get a crash, and I get a "Too many posts were made to a semaphore." from Windows as I call SwapBuffers. What could be the cause of this?
Update: No crash occurs if I don't draw, just clear and swap.
static PIXELFORMATDESCRIPTOR pfd = // pfd Tells Windows How We Want Things To Be { sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor 1, // Version Number PFD_DRAW_TO_WINDOW | // Format Must Support Window PFD_SUPPORT_OPENGL | // Format Must Support OpenGL PFD_DOUBLEBUFFER, // Must Support Double Buffering PFD_TYPE_RGBA, // Request An RGBA Format 32, // Select Our Color Depth 0, 0, 0, 0, 0, 0, // Color Bits Ignored 0, // No Alpha Buffer 0, // Shift Bit Ignored 0, // No Accumulation Buffer 0, 0, 0, 0, // Accumulation Bits Ignored 24, // 24Bit Z-Buffer (Depth Buffer) 0, // No Stencil Buffer 0, // No Auxiliary Buffer PFD_MAIN_PLANE, // Main Drawing Layer 0, // Reserved 0, 0, 0 // Layer Masks Ignored }; if (!(hDC = GetDC(windowHandle))) return false; unsigned int PixelFormat; if (!(PixelFormat = ChoosePixelFormat(hDC, &pfd))) return false; if (!SetPixelFormat(hDC, PixelFormat, &pfd)) return false; hRC = wglCreateContext(hDC); if (!hRC) { std::cout << "wglCreateContext Failed!\n"; return false; } if (wglMakeCurrent(hDC, hRC) == NULL) { std::cout << "Make Context Current Second Failed!\n"; return false; } ... // OGL Buffer Initialization glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); glBindVertexArray(vao); glUseProgram(myprogram); glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, (void *)indexStart); SwapBuffers(GetDC(window_handle));

• I'm currently working on an open-world, dense urban project using the dev tools released by a design studio for a game they have released.

I can run the base game (which is also set in dense urban areas) at 1080p with ultra settings with a solid 60FPS but if I go to 4K then I get about 30FPS.

The minimum specs of the game:
CPU: 3.4GHZ
GPU: 1GB VRAM

The recommended specs of the game:
CPU: 4.0GHZ
GPU: 2GB VRAM

My question is about LOD switching. Using the dev tools, you can create your own unique buildings but I'm worried about how I should create the LODs. All the different buildings I create will use many similar objects such as windows and detailed objects like air conditioners, chimneys e.c.t. It seems more convenient to me to create many smaller LODs rather than creating a new singular LOD for every building that's made to save time and also, if I edit the values of a smaller decoration, it would take effect across all the other smaller LODs already created. I hope this has made sense to you.

If I create a LOD for multiple parts of a single building then I can keep creating new ones easily and all the LODs are already created. But of course, that could effect the performance. However, with minimum specs like 3.4GHZ, could I compromise with more LOD switches?

I'm new to generating and creating LODs and could use a bit of advice and guidance. Unfortunately, I cannot disclose too much about the project or show screenshots as it is currently under wraps. Any help would be appreciated - thanks!

• Hi community,

here is Emanuele from Crimson Games Development Department.

So today we will discuss about main optimization problems that you can find in a MMO game development.

I will be happy if someone will add his contribute, so we can learn together: I will add it to open post!

DEFINITION

By definition, a MMOG should allow you to play with a huge amount of people at once and interact with them as if you are in a normal multiplayer game, this in a persistent world.
Now, if we want to dissect a little more this statement, we will see that this is impossible without applying various “tricks” behind the scenes.

You can definitely understand how when the amount of connected players grows, server performances will be degraded.
Many operations on the server are required to operate on all connected players or a subset of them, on all objects around the world, on all monsters and their AI, etc. All these calculations are executed several times per second: imagine, then, to have to iterate over 200 players, having to iterate over 2,000 players or having to iterate over 20,000 players, frame each frame of your server simulation. For each iteration, I have to send packets, make calculations, change positions, etc. There is, therefore, an exponential growth of the computational load for each new connected player.
As you can well imagine, is a very large amount of work for a single machine, this due to an obvious hardware limitation.
Usually, therefore, there is a maximum threshold of concurrent players simultaneously processed, after which the server itself (the physical machine) can not keep up, creating a negative game experience (lag, unresponsive commands, etc).
You can not accept new connections beyond this threshold until a seat becomes available, in order to not ruin the experience for those who are already connected and playing.
You could then start multiple servers on different machines, so you can host more players, but of course they can not interact with players from other servers.
The division into various “server instance” definitely does not fall within the definition of MMOG, as it does not allow you to interact with all players in a persistent world, but it creates different instances of the same world. It is acceptable, of course: but it isn’t what we want to achieve.

That said, what can we do to “bypass” a little bit this problem? And what did I already do for Heroes of Asgard? What I describe is the result of my experience and, therefore, it is also what I provided for Heroes of Asgard, obviously trying to get the best.

WHAT CAN WE DO?

There are several measures that can be applied to improve the maximum threshold. Yes, improve it: there will always be a maximum threshold beyond which it is difficult to go (by maintaining the same hardware, of course).

YOU ARE THE CODE THAT YOU WRITE

As first thing: write good code, with your brain attached to this task and without unnecessary waste of resources. It may seem obvious and trite, but it is not. Wasting resources is equivalent to worsen server’s available resources.
Wasting bandwidth means exhaust it in no time, every single piece of data that is transmitted has to be carefully selected. If I send an extra byte for each user, when my server hosts 20,000 players, it means sending about additional 20KB for each frame.
Wasting CPU cycles is like shooting myself in the foot: the actions you perform must be kept to a bare minimum, add a single more function call per user may mean adding N additional CPU cycles, which for 20,000 users will be N x 20000 additional CPU cycles.
Waste memory (and therefore to allocate unnecessary resources) is harmful: the allocation requires both additional CPU cycles and memory. And system memory ends.
In managed environments, also leave resources allocated causes garbage collection, which may mean spending huge CPU cycles to free resources, instead of serving the players and simulate the world.
Ultimately, wasting resources in your code will ensure that you will spend more money and more frequently to improve your servers (when your userbase increases), in order to maintain acceptable performance.

As you certainly know, the simulation of a virtual world can be executed a certain number of times per second by the server. This means that every second, all entities and systems in the world are “simulated” a certain number of times. The simulation can include AI routines, positions/rotations updates, etc. It allows you to infuse ”life” to your virtual world.
The number of times your simulation is performed is called FPS or Frames Per Second. It is obvious that if the simulation is cumbersome and requires time, our hardware will tend to simulate the world less times in one second. This can lead to a degradation of the simulation.
But consider: does we need a big amount of simulations performed by the server? Does we need to strive our hardware in this manner? Can we, however, improve this?
Yes. For most games with few players in the same map, and a high game speed (see the FPS, with a high number of commands) our world can be simulated 60 times per second (or less, obviously it depends on game type).
For a MMOG a more little amount can be enough, depending on the genre.
There is no need to simulate the world many times per second as possible, since this will change the simulation in a minimal way, wasting more resources than necessary.
In Heroes of Asgard, for example, the world is simulated 20 times per second (at the moment).

DO WE NEED TO KNOW ABOUT THE ENTIRE WORLD?

We said that in an MMOG we must be able to interact with other players and with the surrounding environment and I should be able to do it with anyone in the world at that time. Quite right, of course.
But, from the point of view of a player, do you really need to know what a player is doing on the other side of the map? No, not always. Indeed, in the majority of cases this player isn’t interested to know if another player, as example, is walking or not in another far area. Send an information that can not be displayed on the user’s screen is a waste of resources.
This observation is important, it allows us to implement a big optimization.
How can I inform a particular player only on entities that may interest him?
Why not break the map (or maps) in zones? A simple subdivision is grid one: divide the map in N x M zones, where N and M are greater than or equal to 1. This technique is also known as space partitioning or zones partitioning.
In this way, a player can only receive information on the entities contained in its area, without needing to have knowledge of distant entities. If in my map 8000 entities are uniformly distributed and it is divided into a 4 x 4 grid, the player who is in the [1, 1] zone will have the burden of receiving information only about 500 entities. A great advantage, doesn’t it?
But consider: what if the player is on zone’s borders? It will not see the players in the nearby zones, although they are visible.
We can therefore understand that the player will have to be informed about the entities contained in its zone and in zones immediately contiguous.
The size of the zones allows you to optimize a lot this method, so depending on the size of a map the size of the grid can vary , in order to obtain the best effect. Also the shape of the zones can vary, to better fit to the composition of the map.

LOOK FAR AS THE EYE CAN SEE

As mentioned, zone division already offers a decent level of optimization, allowing us to send information about a single entity to the players who really can benefit from them.
But let us ask ourselves a question: can we identify useless information in our zone division (remember that also include those contiguous, so in a regular grid we have to dealt with 9 zones in the worst case)? Of course we can.
Most likely a player does not affect entities outside of his field of view.
If I can not see an entity, I do not care to trace what it is doing, although it may be in my own zone. Then sending information about that entity is a waste of resources.
How can you determine what your server needs to send to a specific player? The easiest way is to trace, in fact, the field of view. Everything within that radius is what matters to the specific player, entities outside are not necessary to the specific player’s world simulation.
And since we already have a zone subdivision, we can simply iterate over the entities in player’s zones of interest (instead of all entities in the map) to determine who is within our field of view. This concept is also called area of interest or AoI.
So, continuing the example before, let’s iterate on 500 entities instead of 8000, to extrapolate those hypothetical 25 which fall within the visual range and exchange information through the network only with them.
From 8000 to 25, a good result: doesn’t it? And without the user suffers of missing information as it does not see them. Indeed, it will notice less use of resources.
You can further enhance the area of interest, by applying various measures: organize various levels of visual rays; the most distant entity will receive updates less frequently filter the interesting entities depending on the morphology of the map; if an entity is in our sight, but behind a mountain, I can possibly ignore it. This measure, however, (in my opinion) only makes sense if you already use culling for other things, so you don’t introduce additional calculations to filter few other entities DISTRIBUTE YOUR COMPUTATION LOAD

We already said that a single machine will still have a certain threshold beyond which, despite all the optimizations made, you will experience performance degradation (and thus a bad gaming experience).
Fine, but then why not take advantage of multiple computers simultaneously?
There are obviously different ways to do it.
For example, in Heroes of Asgard each map that composes the world is hosted on a separate process. This causes each map can be hosted on a different physical machine.
Obviously, however, you can go down even more and accommodate sets of zones on separate processes (so a single map may be divided into several parts and hosted by different servers).

You can also combine global services (such as chat) in different server processes, to give to your player the impression that, even being connected to different maps (so different servers), you can interact with distant players. Furthermore, break those services from the main world is getting an additional gain in performance.

As mentioned, allocate memory costs a good amount of resources. So why not reuse what we already allocated? The use of objects pools is of great importance in the multiplayer development. It allows to shift the burden of allocating costs when it can be faced with no problems, for example during bootstrap of our app server.
A monster is defeated and dies? Well, I put it aside. I can use it again when another monster must be spawned, just recovering from my pool.
Of course it is clear that you have to use a certain criteria in order to choose which objects to keep in memory and which are not. Should I keep in memory a pool of a monsters that spawns once a month? No, it may be useless. Should I keep in memory a pool of objects representing the drop of the currency? Yes, it makes more sense.

Of course, an important part of this thread is for resources. Articles, papers: each thing you think that can be useful on this topic.

Spatial Partitioning
http://gameprogrammingpatterns.com/spatial-partition.html
Objects Pooling
http://gameprogrammingpatterns.com/object-pool.html
Game loop
http://gafferongames.com/game-physics/fix-your-timestep/

Best regards,
Emanuele
• By gamervb
when checking if some object is within a certain area, which approach would have better performance if using glm library?

vec3 pos;

// Using a bounding box to represent that area
1) if ( pos.x > box.MinX && pos.x < box.MaxX
&& pos.y > box.MinY....) { // do something..}

// Using a sphere to represent that area
2) if ( glm::length ( pos - sphere.centerPosition ) <= sphere.radius ) { // do something..}

• 12
• 28
• 14
• 11
• 36