Hello.
I just finished an implementation of tile based deferred shading using only OpenGL 3.2, but I encountered some problems along the way that I had to hack around.
First I generate tile frustums and store them in a GL_RGBA32F 2D texture array with 6 layers. The actual culling and lighting is done in the geometry shader and the fragment shader respectively. Culling is done in the geometry shader like this:
uniform vec3 lightWorldPositions[MAX_LIGHTS];
uniform float lightIntensities[MAX_LIGHTS];
uniform int numLights;
flat out float visible[MAX_LIGHTS];
...
for(int i = 0; i < MAX_LIGHTS && i < numLights; i++){
vec4 lightPos = vec4(lightWorldPositions[i], 1.0);
float radius = lightIntensities[i];
bool v = true;
for(int p = 0; p < 6; p++){
if(dot(planes[p], lightPos) < -radius){
v = false;
}
}
render = render || v;
visible[i] = float(v);
}
if(!render){
return;
}
vec2 pos0 = tileCoord * tileSize / screenSize * 2 - 1;
vec2 pos1 = (tileCoord + 1) * tileSize / screenSize * 2 - 1;
gl_Position = vec4(pos0, 0, 1);
EmitVertex();
gl_Position = vec4(pos0.x, pos1.y, 0, 1);
EmitVertex();
gl_Position = vec4(pos1.x, pos0.y, 0, 1);
EmitVertex();
gl_Position = vec4(pos1, 0, 1);
EmitVertex();
EndPrimitive();
Here I encountered a number of problems. My first idea was to generate a list of light IDs and output it as an int[], but that turned out to be impossible in GLSL ("lvalue in assignment too complex") (http://www.gamedev.net/topic/518271-are-shaders-really-that-limited/). I ended up with simply marking each light as visible or not visible, but since bool arrays aren't supported I had to make it a float array. Finally, if at least one light was visible in this tile (render = true) I output a quad for the tile. The fragment shader then simply loops through the lights and draws all that are visible:
#pragma optionNV(unroll all)
for(int i = 0; i < MAX_LIGHTS && i < numLights; i++){
if(visible[i] == 0.0){
continue;
}
int index = i;
vec3 dPos = lightEyePositions[index] - eyeSpace.xyz;
vec3 L = normalize(dPos);
float diffuse = max(dot(N, L), 0.0);
if(diffuse > 0.0){
vec3 H = normalize(L + V);
float specular = pow(max(0.0, dot(N, H)), glossiness);
//Fresnel and intensity
specular = (specular + (1-specular)*pow(1-dot(V, H), 5)) * specularIntensity;
float distSqrd = dot(dPos, dPos);
float distance = sqrt(distSqrd);
float falloff = max(1.0 - distance / lightIntensities[index], 0.0) / distance;
light += lightColors[index] * (diffuseColor + specular) *
(lightIntensities[index] * diffuse * falloff);
}
}
Here I encountered a second problem. The visible[] cannot be larger than 32 elements, so I'm limited to 32 lights per pass.
Performance with 512 lights:
Tile based shading with 16x16 tiles: 85 FPS.
Tile based shading with 32x32 tiles: 92 FPS.
Traditional deferred shading with depth bounds: 143 FPS.
Traditional deferred shading with stencil marking: 134 FPS.
I'm betting that the problems I had to work around are slowing things down considerably. The problem seems to be that I can't build a proper light index list and that I can't render enough lights per pass.
TL;DR
Question: How do I build a light index list if stream processors do not support "dynamically addressed scattered writes"? Would switching to OpenCL even help if this is a hardware limitation?