[SOLVED] Deferred Shading light volumes & optimisations

Started by
9 comments, last by Zipster 17 years, 2 months ago
I'm currently implementing a deferred renderer and I've got a lot of it working already. However, I can't get my head around a specific part of the lighting optimisations suggested in various articles. I fill my g-buffer, then I do an ambient pass as a fullscreen quad. I then change to use additive blending and render directional lights as a fullscreen quad. All of this works perfectly. I've got deferred & forward shading working in tandem so I can switch between the two and compare them. Anyway, my problem is related to point (omni, whatever) lighting. To render the point light, I'm creating a sphere with radius x, where x is the distance where the light attenuates to 0. I.e. the sphere bounds the light's area of influence. I then render the sphere and, for each pixel on the screen it covers, I execute the point lighting shader. This works, but obviously I've got double shading etc. or zero shading on some pixels depending on whether the camera is inside/outside the sphere and whether backface culling is CW/CCW/None. I've now got to implement a robust culling behaviour so that: A: If the pixels are 'inside' the light volume, the pixel shader runs, calculating the shading contribution for that light. B: If the light volume is behind an object, the pixel shader should not be run for the screen pixels it is covering C: If an object is in front of the light volume, the pixel shader should not run for those pixels Now, B and C can be done using depth testing and backface/frontface culling depending on the case. However, I'd like to use the stencil buffer to solve this problem as suggested by NVIDIA's 6800 leagues deferred shading presentation: You can download that here to see what I mean: Click to download, it's 8 megabytes. Page 12 onwards, specifically page 17. What I do not understand is page 17. The depth test is less, so only light volume pixels closer to the viewer than the depth buffer contents will pass the test. If the pixel fails the test, then the stencil is set to the reference value which, in the next pass, indicates that a pixel needs to be lit. That's fine. I can see that this algorithm works if an object is inside (pixel shader runs) or behind (pixel shader skipped) the light volume. What I don't understand is what happens if there is an object between the viewer and the light volume. If this happens then the light volume depth will be greater than the object's depth, so the depth test will fail and the stencil bit will be set. I.e. the pixel shader will run and lighting discarded. Now, I'm not a terribly experienced graphics programmer (at least when it comes to stencilling and shaders), so the chances are I'm interpreting this incorrectly or should be adding something to it to make it function as expected. Can anyone set me straight or fill in the blanks? Thanks. [Edited by - Defrag on February 14, 2007 4:25:33 PM]
Advertisement
You're right, I'm also having difficulty seeing how that would work for objects in front of the light volume. However, my first instinct tells me that if you change the Z-fail op in the first pass from REPLACE to INVERT, you should have something that works. It will basically count the number of times a sphere surface is rendered over an object pixel, and if that number is odd the pixel is inside the volume and the stencil value is ~REF, otherwise it's REF. And of course for objects behind the volume the Z-test won't fail. I haven't implemented deferred shading myself so that solution just speculative at best :)
I don't know if Zipster's way works, so i'll post my suggestion.
Try the following :
1) Clear stencil to 1.
2) Render light volume into stencil so :
a) For front faces, when depth test fails increase stencil, otherwise do nothing
b) For back face, when depth test fails decrease stencil, otherwise do nothing
3) Set stencil test to pass if the stencil value is equal to 0.
4) Draw your scene.

This is basically (iirc) the inverse of what you do with z-fail stencil shadow volumes.

Note, that this works only for one light source at a time, or for any number of light sources, as long as they don't overlap in screen space. I haven't worked with deferred shading, so i can't comment on that part, but i'm using this approach in a forward renderer for minimizing the number of shaded pixels (i'm doing multiple passes, so i must do whatever i can to make it fast).

Hope that helps.

HellRaiZer
HellRaiZer
Thanks for taking the time to offer your suggestions.

HellRaiZeR: I've read somewhere (possibly another DS presentation) that light volume optimisations can be done in a manner very similar to shadow volume stencilling, so I think your solution definitely fits the bill.

I'll give that a try as it sounds fairly straightforward.
Actually, that approach is functionally equivalent to mine. Think about how it works... for failing backfaces it decreases the reference, for failing frontfaces it increases the reference. This produces the same "cancellation" effect as inversion. The number of frontfaces (F) and the number of backfaces (B) rendered per pixel are only ever going to be at most off by one (for volumes), which means that you really don't have to worry about individual frontfaces and backfaces, but rather the sum F + B. If it's even, it implies F = B and the point is outside the volume, if it's odd, it implies F = B - 1 and the point is inside the volume (standard horizontal ray-trace method for determining if a point is inside a polygon). So if we let the number of inversions be equal to F + B, the final stencil value will be REF for an even number of inversions, and ~REF for an odd number of inversions.
Yeah I'm just going with the solution that seems most recognisable to me -- I'm not saying yours wouldn't work :)

I've got it mostly working now but my stencil isn't behaving itself (the light is drawn correctly with no double shading, but every pixel is passing the stencil test regardless of whether it is meant to be considered for shading). I'll post some code later if I can't get it working.
Got it working :) Thanks for helping me.

Click to see stencil clipping mask

Well I learned something today and it was more to do with effect files than not. I assumed the effect framework used default effect states where they are not specified, but it turns out that they carry from previous effects.

I had set the two-sided stencil to do the stencil buffer update, but then on the shading pass I didn't change it back to single sided.

Anyway here's the gist of my code. Please just ignore the vPos stuff -- I couldn't get my head around the maths required to take the mesh and, from its homogeneous coordinates, get the 2d texcoords required to look up the g-buffer attributes. I'm not a fan of cutting & pasting equations when I can't understand them so I used the vPos register as an easy alternative, but I'll get around to it at some stage (as I'd imagine that it's cheaper than doing a divide for every pixel being shaded). Also ignore my attenuation calculation because it's crap. I'll fix that in future too.

// Deferred.fxtechnique StencilConvexLight{	pass Pass0_DoubleSidedStencil	{		VertexShader		= compile vs_2_0 pos_vs_main();        PixelShader			= null;                ColorWriteEnable	= 0x0;        CullMode			= none;                // Disable writing to the frame buffer        AlphaBlendEnable	= true;        SrcBlend			= Zero;        DestBlend			= One;                // Disable writing to depth buffer        ZWriteEnable		= false;        ZEnable				= true;        ZFunc				= Less;               // Setup stencil states        StencilEnable		= true;        TwoSidedStencilMode = true;                StencilRef			= 1;        StencilMask			= 0xFFFFFFFF;        StencilWriteMask	= 0xFFFFFFFF;                // stencil settings for front facing triangles        StencilFunc			= Always;        StencilZFail		= Incr;        StencilPass			= Keep;                // stencil settings for back facing triangles        Ccw_StencilFunc		= Always;        Ccw_StencilZFail	= Decr;        Ccw_StencilPass		= Keep;	}		pass Pass1_PointLightDiffuse	{		VertexShader	= compile vs_3_0 pos_vs_main();		PixelShader		= compile ps_3_0 Pass4_diffuse_point_ps_main();				ZEnable			= false;		ZWriteEnable	= false;									AlphaBlendEnable = true;				SrcBlend		= One;        DestBlend		= One;                CullMode		= CW;	                        ColorWriteEnable = 0xFFFFFFFF;  		StencilEnable	= true;  		TwoSidedStencilMode = false;        StencilFunc		= Equal;		StencilFail		= Keep;		StencilZFail	= Keep;		StencilPass		= Keep;		StencilRef		= 0;		StencilMask		= 0xFFFFFFFF;        StencilWriteMask = 0xFFFFFFFF;	}		pass Pass2_ShowStencilResult	{		VertexShader	= compile vs_3_0 pos_vs_main();		PixelShader		= compile ps_3_0 stencil_bright_light_ps_main();				ZEnable			= false;		ZWriteEnable	= false;									AlphaBlendEnable = true;				SrcBlend		= One;        DestBlend		= One;                CullMode		= CW;	                        ColorWriteEnable = 0xFFFFFFFF;  		StencilEnable	= true;  		TwoSidedStencilMode = false;        StencilFunc		= Equal;		StencilFail		= Keep;		StencilZFail	= Keep;		StencilPass		= Keep;		StencilRef		= 0;		StencilMask		= 0xFFFFFFFF;        StencilWriteMask = 0xFFFFFFFF;	}}


// Shadersstruct PS_INPUT_TEXVPOS{	float2 TexCoords : TEXCOORD0;	float2 vPos : VPOS;};VS_OUTPUT_POS pos_vs_main( VS_INPUT_POS Input ){	VS_OUTPUT_POS Out;	Out.Position = mul( Input.Position, matWorldViewProjection );	return Out;	}float4 stencil_bright_light_ps_main( PS_INPUT_TEXVPOS Input ) : COLOR{	float2 coords = Input.vPos.xy / ScreenSize.xy;		half4 pixelDiffuse = tex2D( diffuseSampler, coords ); 	float4 colour = pixelDiffuse * float4(0.0f, 0.0f, 1.0f, 1.0f);	return colour;	}float4 Pass4_diffuse_point_ps_main( PS_INPUT_TEXVPOS Input ) : COLOR{		// Co-ords for texture lookups = pixel rasterisation pos / screen dimensions (e.g. 512/1024 = 0.5f)	float2 coords = Input.vPos.xy / ScreenSize.xy;		// retrieve data from g-buffer	half3 pixelViewPos		= tex2D( positionSampler, coords ).xyz;	half3 pixelViewNormal	= tex2D( normalSampler, coords ).xyz;	half4 pixelDiffuse		= tex2D( diffuseSampler, coords ); 			// calculate a direction vector going from the pixel's view space position to the view space light position	half3 fvPixelToLight = normalize( fvViewSpaceLightPosition - pixelViewPos );	half intensity = saturate( dot( pixelViewNormal, fvPixelToLight ));				// linear falloff.  Translate distance from light and maxdist into the range 0...1.	// e.g. if a light has a range of 100 and the object is 20 units away from it:	// 100 - 20 / 100 = 80 / 100 = 0.8 of max brightness. 	half attenuation = ( fLightMaxRange - distance( fvViewSpaceLightPosition, pixelViewPos )) / fLightMaxRange;		// clamp to 0 if attenuation is negative	attenuation = saturate( attenuation );	float4 colour = pixelDiffuse * fvLightColour * intensity * attenuation;				return colour;}


// D3DDeferredRenderer.cpp////--------------------------------------------------------------//		// Point Light Pass (Stencil optimised)		//--------------------------------------------------------------//		D3DXHANDLE hStencilConvexLight = pEffect->GetTechniqueByName( "StencilConvexLight" );		assert( hStencilConvexLight );		if( SUCCEEDED ( pEffect->SetTechnique( hStencilConvexLight )))		{			UINT uiNumPasses;			pEffect->Begin( &uiNumPasses, 0 );				DWORD dwMeshFVF = m_pSphereMesh->GetFVF();			m_pDevice->SetFVF( dwMeshFVF );			for( list< LightPoint_s* >::iterator i = pScene->m_pLightPoint.begin(); i != pScene->m_pLightPoint.end(); i++ )			{				m_pDevice->Clear(0, NULL, D3DCLEAR_STENCIL, D3DCOLOR_ARGB(0, 0, 0, 0), 1.0f, 1 ); // clear stencil buffer to hold 1 as default value				// set light position in world & transform into view space then set up light colour				D3DXVECTOR3 lightWorldPos = (*i)->position;				D3DXVECTOR3 lightViewPos;				D3DXVec3TransformCoord( &lightViewPos, &lightWorldPos, &matView );				pEffect->SetFloatArray( hLightPosition, (float*)&lightViewPos, 3 );				pEffect->SetVector( hLightColour, &(*i)->colour );				float fScale = (*i)->maxrange;				pEffect->SetFloat( hLightMaxRange, fScale );				// scale light volume mesh 				D3DXMATRIX scale, translate, sphereMatWVP;				D3DXMatrixTranslation( &translate, lightWorldPos.x, lightWorldPos.y, lightWorldPos.z );				D3DXMatrixScaling( &scale, fScale, fScale, fScale );				sphereMatWVP = scale * translate * matView * matProjPersp;				pEffect->SetMatrix( hWVP, &sphereMatWVP );								pEffect->CommitChanges( );				// render light volume to stencil buffer, leaving 0s where the light intersects scene geometry								pEffect->BeginPass( 0 );				m_pSphereMesh->DrawSubset( 0 );				pEffect->EndPass();                // render second pass, evaluating the lighting contribution for each pixel with stencil bit set to 0				pEffect->BeginPass( 1 );				m_pSphereMesh->DrawSubset( 0 );				pEffect->EndPass();										}						pEffect->End();		}
Out of curiousity (since I'm always looking for ways to optimize stuff), how much of a boost do you get doing this? Currently, I'm just using a basic scissor rect over my projected light quad, but this covers things like objects in front of the light volume, as well. I'm curious how much boost these extra steps give you?
Quote:Original post by Defrag
Please just ignore the vPos stuff -- I couldn't get my head around the maths required to take the mesh and, from its homogeneous coordinates, get the 2d texcoords required to look up the g-buffer attributes. I'm not a fan of cutting & pasting equations when I can't understand them so I used the vPos register as an easy alternative, but I'll get around to it at some stage (as I'd imagine that it's cheaper than doing a divide for every pixel being shaded).

Using vPos is the best way IMHO... doing the math yourself is just repeating the work that the hardware did, and it's actually a bit unstable on light volume polygon borders sometimes. I see no reason not to just use vPos, as it should never be slower than computing the equivalent yourself.

Quote:Original post by xycsoscyx
Out of curiousity (since I'm always looking for ways to optimize stuff), how much of a boost do you get doing this? Currently, I'm just using a basic scissor rect over my projected light quad, but this covers things like objects in front of the light volume, as well. I'm curious how much boost these extra steps give you?

I haven't run many tests yet so I can't say. My application has some rudimentary performance counters and it dumps the average fps though, so I'll run some tests tomorrow and post a few results. Stencil isn't an automatic performance boost from what I've read, but I have a feeling it should do the business, especially when you're inside the light volume or there's a lot of geometry that it overlaps, but doesn't shade.

Quote:Using vPos is the best way IMHO... doing the math yourself is just repeating the work that the hardware did, and it's actually a bit unstable on light volume polygon borders sometimes. I see no reason not to just use vPos, as it should never be slower than computing the equivalent yourself.

I was under the impression that since you only calculate various things per vertex and the divide can be done via a projected texture lookup in the pixel shader (essentially free if I recall correctly?) then the vPos method would be slower (since you have to do vPos / screenres for every shaded pixel) but I guess it would possibly depend on the screen coverage, too.

Small coverage: Per vertex calculations would probably cost more than the divide per pixel
Large coverage: Per pixel divide would probably cost more than the per-vertex calculations

I could be talking nonsense, though. :P I suppose that's the great thing about deferred shading though: it'd be easy enough to find out as the screen space coverage of lights makes the costs fairly predictable.

This topic is closed to new replies.

Advertisement