Tile based deferred shading light list?

Started by
0 comments, last by Yours3!f 10 years, 2 months ago

Hello.

I just finished an implementation of tile based deferred shading using only OpenGL 3.2, but I encountered some problems along the way that I had to hack around.

First I generate tile frustums and store them in a GL_RGBA32F 2D texture array with 6 layers. The actual culling and lighting is done in the geometry shader and the fragment shader respectively. Culling is done in the geometry shader like this:


uniform vec3 lightWorldPositions[MAX_LIGHTS];
uniform float lightIntensities[MAX_LIGHTS];
uniform int numLights;

flat out float visible[MAX_LIGHTS];

...


    for(int i = 0; i < MAX_LIGHTS && i < numLights; i++){
        
        vec4 lightPos = vec4(lightWorldPositions[i], 1.0);
        float radius = lightIntensities[i];
        
        bool v = true;
        for(int p = 0; p < 6; p++){
            if(dot(planes[p], lightPos) < -radius){
                v = false;
            }
        }
        render = render || v;
        visible[i] = float(v);
    }
    
    if(!render){
        return;
    }
    
    vec2 pos0 = tileCoord * tileSize / screenSize * 2 - 1;
    vec2 pos1 = (tileCoord + 1) * tileSize / screenSize * 2 - 1;
    
    gl_Position = vec4(pos0, 0, 1);
    EmitVertex();
    
    gl_Position = vec4(pos0.x, pos1.y, 0, 1);
    EmitVertex();
    
    gl_Position = vec4(pos1.x, pos0.y, 0, 1);
    EmitVertex();
    
    gl_Position = vec4(pos1, 0, 1);
    EmitVertex();
    
    EndPrimitive();

Here I encountered a number of problems. My first idea was to generate a list of light IDs and output it as an int[], but that turned out to be impossible in GLSL ("lvalue in assignment too complex") (http://www.gamedev.net/topic/518271-are-shaders-really-that-limited/). I ended up with simply marking each light as visible or not visible, but since bool arrays aren't supported I had to make it a float array. Finally, if at least one light was visible in this tile (render = true) I output a quad for the tile. The fragment shader then simply loops through the lights and draws all that are visible:


#pragma optionNV(unroll all)
	for(int i = 0; i < MAX_LIGHTS && i < numLights; i++){
		if(visible[i] == 0.0){
			continue;
		}
	
		int index = i;

		vec3 dPos = lightEyePositions[index] - eyeSpace.xyz;
		
		vec3 L = normalize(dPos);
		float diffuse = max(dot(N, L), 0.0);
		
		if(diffuse > 0.0){
			
			vec3 H = normalize(L + V);
			
			float specular = pow(max(0.0, dot(N, H)), glossiness);
			
			//Fresnel and intensity
			specular = (specular + (1-specular)*pow(1-dot(V, H), 5)) * specularIntensity;
		
			float distSqrd = dot(dPos, dPos);
			float distance = sqrt(distSqrd);
			float falloff = max(1.0 - distance / lightIntensities[index], 0.0) / distance;
			
			light += lightColors[index] * (diffuseColor + specular) * 
				(lightIntensities[index] * diffuse * falloff);
		}
	} 

Here I encountered a second problem. The visible[] cannot be larger than 32 elements, so I'm limited to 32 lights per pass.

Performance with 512 lights:

Tile based shading with 16x16 tiles: 85 FPS.
Tile based shading with 32x32 tiles: 92 FPS.

Traditional deferred shading with depth bounds: 143 FPS.

Traditional deferred shading with stencil marking: 134 FPS.

I'm betting that the problems I had to work around are slowing things down considerably. The problem seems to be that I can't build a proper light index list and that I can't render enough lights per pass.

TL;DR

Question: How do I build a light index list if stream processors do not support "dynamically addressed scattered writes"? Would switching to OpenCL even help if this is a hardware limitation?

Advertisement

on amd use opencl, on nvidia use glsl compute shaders. you'll probably see this fps value go higher ;) if anything you'll learn how to do compute.

there are a number of options there though. you can go full deferred, or just do the culling in the compute shaders and do forward+ etc.

This topic is closed to new replies.

Advertisement