I am trying to realize tiled lighting (http://www.cse.chalmers.se/~olaolss/papers/tiled_shading_preprint.pdf), my problem is that my fragment shader is extremely slow due to dynamic branching.
The fragment shader looks like this:
#version 150
precision highp float;
uniform sampler2D mTexture0;
uniform isamplerBuffer lightListOffset;
uniform vec4 lightTileSize;
in vec2 outTexcoord;
out vec4 fragColor0;
void main()
{
fragColor0 = texture(mTexture0, outTexcoord);
vec3 light = vec3(0.1);
int tileindex = int(int(gl_FragCoord.y/lightTileSize.y)*lightTileSize.z+int(gl_FragCoord.x/lightTileSize.x));
ivec2 listoffset = texelFetch(lightListOffset, tileindex).xy;
for(int i = 0; i < listoffset.y; i++)
{
light += 0.1;
}
fragColor0.rgb *= light;
}
The problem is the loop which is even extremely slow if there is only one or even no iteration at all.
So my question is: What is the optimal way to provide the information for the dynamic branching to be as fast as possible?
Should the tileindex be calculated in another way? Is it a bad idea to provide the lightListOffset as a buffer texture?
I tried a couple of variations and actually this works great on my NVIDIA GeForce GT 650M, but not on the AMD Radeon HD 6770M we are also testing it on. With slow I am talking about 15fps compared to 60fps on the nvidia card, just by the existence of the loop with the dependency on the texelFetch.
Just to show what I am talking about (just very basic diffuse point lights):
Thanks!






