Shadow volume optimization problem

Started by
9 comments, last by mohamed adel 19 years, 6 months ago
Hi all, I'm trying to optimize my code for FPS, since I think it's terribly slow. The main bottleneck is the stencil shadow part as I figured out. I've created a benchmark that makes the camera rotate around an object by 1 degrees - so I have 360 RenderScreens, and measure the total time (and fps). The scene consists of an object having 5534 vertices and 3088 tris. It uses about 5 materials, and they are rendered sorted by material. I have one pass for rendering non-transparent parts and another for rendering transparent ones. I use only one light source. If I don't switch on anything except transparency, I get 2.35 secs (152 fps). This is not that much, however this would be enough. But when I switch on z-fail shadows, it drops to 23,68 secs (15,2 fps). So I tried commenting all render state changes that are in connection with the stencil shadows, and rendered the shadow volume as it is (white), I got 9.3 sec (38 fps). The object and the shadow volume together consist of 21100 vertices and 11536 tris. I post relevant code here, maybe you can spot something. (It compiles with both dx8 and dx9 by some #define-s, now I'm talking about the dx9 part, that's what I'm measuring) device creation params:

void DDRENDERING_VIEW::BuildPresentParamsFromSettings(CD3DSettings& d3dSettings)
{
    d3dParams.Windowed               = true;
    d3dParams.BackBufferCount        = 1;	// means double buffering (1 back buffer)
    d3dParams.MultiSampleType        = D3DMULTISAMPLE_NONE;

    d3dParams.SwapEffect             = D3DSWAPEFFECT_DISCARD;
    d3dParams.EnableAutoDepthStencil = true;
    d3dParams.hDeviceWindow          = m_pView->GetSafeHwnd();
#if defined(DX81M_MODULE)
	d3dParams.Flags					 = 0;
	d3dParams.FullScreen_PresentationInterval = 0;
	d3dParams.SwapEffect			 = D3DSWAPEFFECT_COPY;
#elif defined(DX9M_MODULE)
	d3dParams.Flags					 = D3DPRESENTFLAG_DISCARD_DEPTHSTENCIL;
    d3dParams.MultiSampleQuality     = d3dSettings.MultisampleQuality();
	d3dParams.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE; 
#endif
	d3dParams.AutoDepthStencilFormat = d3dSettings.DepthStencilBufferFormat();
	d3dParams.BackBufferWidth		= clientRect.right - clientRect.left;
	d3dParams.BackBufferHeight		= clientRect.bottom - clientRect.top;
	d3dParams.BackBufferFormat		= d3dSettings.PDeviceCombo()->BackBufferFormat;
	d3dParams.FullScreen_RefreshRateInHz = 0;
} //BuildPresentParamsFromSettings


this is how I create the shadow volume to render: (The shadow volume is computed in our own geometry module, and data is loaded from there. But that code is irrevelant now, since it's computed only once, it's just rendered then (static world, only camera is moving)) I post this part here because of the createmeshfvf and the optimizeinplace

void GEOMETRYELEMDD::InitializeShadowVolumes(void)
{
	EM::GeometryElem* ge = GetGeometryElem();
	assert(ge != NULL);
	if (ge == NULL)
		return;

	for (long iShadowVol = 0; iShadowVol < ge->GetShadowVolumeCount(); iShadowVol++) {
		if (ge->GetShadowVolume(iShadowVol) == NULL) 
			continue;

		const Mesh3D& mesh = *(ge->GetShadowVolume(iShadowVol));

		nShadowTriangleVertices	= mesh.GetTriVertexCount();
		nShadowTriangles			= mesh.GetTriangleCount();

		HRESULT hr;
		ID3DXMesh* shadowVolumeMesh = NULL;
		if (FAILED(hr = D3DXCreateMeshFVF(nShadowTriangles, nShadowTriangleVertices, D3DXMESH_SYSTEMMEM | D3DXMESH_WRITEONLY, D3DFVF_XYZ, rView->pd3dDevice, &shadowVolumeMesh)))
			continue;

		assert(shadowVolumeMesh != NULL);
		if (shadowVolumeMesh == NULL)
			continue;

		// II.1. Fill in the vertex data
		D3DXVECTOR3*  triVertices = NULL;
		if (FAILED(hr = shadowVolumeMesh->LockVertexBuffer(0, (DX89(BYTE,void)**)&triVertices)))
			continue;

		for (long i = 0; i < nShadowTriangleVertices; i++) {
			const GM::TriVertex3D& trivertex = mesh.GetTriVertex(i);
			CCVector3D pos = trivertex.pos;
			ConvertVector3D2D3DXVector(pos, triVertices); 
			ConvertMeter2MiliMeter(triVertices);
		}

		// II.2. Fill in index, texture data
		WORD* triangleIndices = NULL;
		m_userIdPerTriangle = new long[nTriangles];
		if (FAILED(hr = shadowVolumeMesh->LockIndexBuffer(0, (DX89(BYTE,void)**)&triangleIndices)))
			continue;

		for (long i = 0; i < nShadowTriangles; i++) {
			const GM::Triangle3D& triangle = mesh.GetTriangle(i);
			triangleIndices[3 * i]	    = triangle.triVert1Id;
			triangleIndices[3 * i + 1]	= triangle.triVert2Id;
			triangleIndices[3 * i + 2]	= triangle.triVert3Id;
		}

		shadowVolumeMesh->UnlockVertexBuffer();
		shadowVolumeMesh->UnlockIndexBuffer();

		// opt
		DWORD *adjacency = new DWORD[3*shadowVolumeMesh->GetNumFaces()];
		hr = shadowVolumeMesh->GenerateAdjacency(0.0f, adjacency);
		hr = shadowVolumeMesh->OptimizeInplace(D3DXMESHOPT_COMPACT | D3DXMESHOPT_ATTRSORT | D3DXMESHOPT_VERTEXCACHE, adjacency, NULL, NULL, NULL);

		dxShadowVolumeMeshes.Add(shadowVolumeMesh);
	}
} // InitializeShadowVolumes


This is where I set up things for shadow rendering (both single and two sided stencil):

void DDRENDERING_VIEW::RenderZFailShadow(const EM::Camera& camera)
{
	CCDX9::CD3DStateGuard guard(pd3dDevice);

	long currLight = 0;
	long maxLights = GetShadowCastableLightMaxIndex();
	for (long iLight = 0; iLight < maxLights; iLight++) {
		EM::Light* light = m_elementManager->GetLight(iLight);
		if (light == NULL || !light->IsCastShadow())
			continue;
		SetLight(iLight, m_graphicsSettings.MustRenderShadow());
		//pd3dDevice->LightEnable(0, FALSE);

		// depth buffer writing OFF
		guard.SetRenderState( D3DRS_ZWRITEENABLE, FALSE );
		// clear stencil buffer
		//pd3dDevice->Clear(0, NULL, D3DCLEAR_STENCIL, D3DCOLOR_XRGB(0,0,0), 1.0f, 0);
		// turn OFF colour buffer (now we wanna write to the stencil buffer only)
		guard.SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
		guard.SetRenderState(D3DRS_SRCBLEND, D3DBLEND_ZERO); 
		guard.SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE); 
		// disable lighting (not needed for stencil writes anyway)
		guard.SetRenderState(D3DRS_LIGHTING, FALSE);
		// turn ON stencil buffer
		guard.SetRenderState( D3DRS_STENCILENABLE, TRUE );
		guard.SetRenderState( D3DRS_STENCILFUNC,  D3DCMP_ALWAYS );
		guard.SetRenderState( D3DRS_CCW_STENCILFUNC,  D3DCMP_ALWAYS );

		if (m_use2SidedStencil) {
			//set stencil to increment if z test fails, else keep
			guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_INCR );
			//set stencil to decrement if z test fails, else keep
			guard.SetRenderState( D3DRS_CCW_STENCILPASS, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_CCW_STENCILFAIL, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_CCW_STENCILZFAIL, D3DSTENCILOP_DECR );
			//render back faces
			guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_NONE );
			guard.SetRenderState( D3DRS_TWOSIDEDSTENCILMODE, TRUE );

			// render shadow volumes for this light
			for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
				EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
				if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
					continue;
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail)); // 0.000001 egesz jo
				RenderShadow(camera, elem, currLight);
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
			}

			// switch off 2 sided stencil
			guard.SetRenderState( D3DRS_TWOSIDEDSTENCILMODE, FALSE );
			// from now, render front faces again (normal operation)
			guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );
		}
		else {
			//set stencil to decrement if z test fails, else keep
			guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_INCR );
			//render back faces
			guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CW );

			// render shadow volumes for this light
			for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
				EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
				if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
					continue;
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail));
				RenderShadow(camera, elem, currLight);
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
			}

			//set stencil to increment if z test fails, else keep
			guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
			guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_DECR );
			//render front faces
			guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );

			// render shadow volumes for this light
			for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
				EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
				if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
					continue;
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail));
				RenderShadow(camera, elem, currLight);
				guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
				guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
			}

			// from now, render front faces again (normal operation)
			guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );
		} // if (m_use2SidedStencil) 

		// switch lighting ON
		guard.SetRenderState(D3DRS_LIGHTING, TRUE);

		// now draw only if stencil buffer enables it
		guard.SetRenderState( D3DRS_STENCILENABLE, TRUE );

		// reset stencil ops
		guard.SetRenderState( D3DRS_STENCILREF,  0x0 );
		guard.SetRenderState( D3DRS_STENCILFUNC, D3DCMP_EQUAL );
		guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
		guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
		guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_KEEP );
		guard.SetRenderState( D3DRS_CCW_STENCILPASS, D3DSTENCILOP_KEEP );
		guard.SetRenderState( D3DRS_CCW_STENCILFAIL, D3DSTENCILOP_KEEP );
		guard.SetRenderState( D3DRS_CCW_STENCILZFAIL, D3DSTENCILOP_KEEP );

		// turn on the current light, disable all others
		EM::DirectionalLight* dlight = dynamic_cast<EM::DirectionalLight*>(m_elementManager->GetLight(iLight));
		if (dlight != NULL && iLight == 0 && dlight->GetDirection().z > 0)		// Sun doesn't shine at night
			pd3dDevice->LightEnable(0, FALSE);
		else
			pd3dDevice->LightEnable(0, TRUE);
		guard.SetRenderState( D3DRS_AMBIENT, 0x00000000 );

		//turn ON colour buffer, additive
		guard.SetRenderState( D3DRS_ALPHABLENDENABLE, TRUE );
		guard.SetRenderState( D3DRS_SRCBLEND, D3DBLEND_ONE );
		guard.SetRenderState( D3DRS_DESTBLEND, D3DBLEND_ONE );

		// grass, render if only no background
		Grid*	grid = m_elementManager->GetGrid();
		if (m_backgroundDD == NULL && grid->GetShowType() == EM::Shaded && grid->IsDrawable())
			Render(grid);

		// render scene
		for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
			EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
			if (elem == NULL || !elem->IsDrawable() || elem->GetShowType() == EM::LinesOnly)
				continue;
			Render(elem);
		}

		SetLight(iLight, false);
		currLight++;
	}	// END FOR

	SetLight(0, m_graphicsSettings.MustRenderShadow());
}


And this is the actual shadow volume rendering (render(elem))above:

void GEOMETRYELEMDD::RenderShadowVolume(long iLight)
{
	if (iLight < 0 || iLight >= dxShadowVolumeMeshes.GetCount())
		return;

	SetTransform(NULL);
	dxShadowVolumeMeshes[iLight]->DrawSubset(0);

	rView->m_verticesDrawn += nShadowTriangleVertices;
	rView->m_trisDrawn += nShadowTriangles;
}


Now I tried everything I ever read about on forums and tuts. I know that the drawsubset isn't that good, but it must draw the entire shadow volume here, so I don't think that something else would be significantly perform better. As you can see, I tried optimizing the mesh, creating the vertex and index buffers writeonly. I think I don't have too much renderstate changes (I cannot do it with less, I need these all to achieve correct operation). So why is this code so terribly slow? Any ideas? Thanks for your kind help, Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
Advertisement
For your information:

the application runs windowed, around 1100x800, r8g8b8, d24s8
the machine is a radeon 9600 with a p4 2,4GHz

Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
Does the frame rate increase when you rotate the camera away from the shadow volume (the shadow is not visible)?
I have only one suggestion.

In my engine i use shadow buffer for every critical vb. It means i've got another copy in the system mem. So i don't use any lock to get the primitives.
I think it would boost your fps too.
Ironicaly I was just reading an entire chapter in the book GPU Gems about optimizing shadow volumes. I would pick it up at the book store and check it out. It included numerous pages on this subject and was very informative. I'm sure if you follow the guide you'll achieve significant improvements in performance.
Thanks for your answers, guys!

Mohamed: Yes, fps decreases when the shadows are visible. What's the implication then?

EverIce: how do I do that exactly?

toysnob: I'll try to find the book - unfortunately books like this aren't sold here in book stores.. thanks for the tip!

Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
just one thing that I want to make sure of in your message, when you disable the stencil buffer the frame rate is 38 fps? This means that when rendering a simple mesh (or meshes) wiht no materials or effects, then you get 38 fps?
I get 38 fps when I render the meshes as they are and render the shadow volumes as simple meshes too (that is, I don't switch on stenciling, don't change lights, alpha blending, zbias, nothing). This way the shadow volumes appear as white meshes.

What are you thinking about? I'm very interested now :)

Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
Iam thinking that the problem is not in the shadow volume rendering , but in the mesh them selves.
The impact due to shadow volume rendering should be very much less than what you have,specially that your shadow volume is a static mesh and your card has a high fill rate.
try to search for the reason why the rendering of the shadow volume mesh itself is slow (regardless of shadow volume issues), because I couldn't find a reason why it is so slow.
another thing : do you have any messages from the debug run time in the output window?
Thanks for your answer, I will try to render only the sahdow volume and nothing else.. unfortunatley only tomorrow.
Since the code isn't here I can't tell you exactly, but as far as I can remember, i get some rendeundant renderstate warnings (only a few), and sometimes "unable to create hardware indexbuffer", but this was told to be "don't care" on the dxdev list. Maybe I should care?

Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!

This topic is closed to new replies.

Advertisement