Hi all,
I'm trying to optimize my code for FPS, since I think it's terribly slow. The main bottleneck is the stencil shadow part as I figured out. I've created a benchmark that makes the camera rotate around an object by 1 degrees - so I have 360 RenderScreens, and measure the total time (and fps).
The scene consists of an object having 5534 vertices and 3088 tris. It uses about 5 materials, and they are rendered sorted by material.
I have one pass for rendering non-transparent parts and another for rendering transparent ones.
I use only one light source.
If I don't switch on anything except transparency, I get 2.35 secs (152 fps). This is not that much, however this would be enough.
But when I switch on z-fail shadows, it drops to 23,68 secs (15,2 fps). So I tried commenting all render state changes that are in connection with the stencil shadows, and rendered the shadow volume as it is (white), I got 9.3 sec (38 fps).
The object and the shadow volume together consist of 21100 vertices and 11536 tris.
I post relevant code here, maybe you can spot something.
(It compiles with both dx8 and dx9 by some #define-s, now I'm talking about the dx9 part, that's what I'm measuring)
device creation params:
void DDRENDERING_VIEW::BuildPresentParamsFromSettings(CD3DSettings& d3dSettings)
{
d3dParams.Windowed = true;
d3dParams.BackBufferCount = 1; // means double buffering (1 back buffer)
d3dParams.MultiSampleType = D3DMULTISAMPLE_NONE;
d3dParams.SwapEffect = D3DSWAPEFFECT_DISCARD;
d3dParams.EnableAutoDepthStencil = true;
d3dParams.hDeviceWindow = m_pView->GetSafeHwnd();
#if defined(DX81M_MODULE)
d3dParams.Flags = 0;
d3dParams.FullScreen_PresentationInterval = 0;
d3dParams.SwapEffect = D3DSWAPEFFECT_COPY;
#elif defined(DX9M_MODULE)
d3dParams.Flags = D3DPRESENTFLAG_DISCARD_DEPTHSTENCIL;
d3dParams.MultiSampleQuality = d3dSettings.MultisampleQuality();
d3dParams.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;
#endif
d3dParams.AutoDepthStencilFormat = d3dSettings.DepthStencilBufferFormat();
d3dParams.BackBufferWidth = clientRect.right - clientRect.left;
d3dParams.BackBufferHeight = clientRect.bottom - clientRect.top;
d3dParams.BackBufferFormat = d3dSettings.PDeviceCombo()->BackBufferFormat;
d3dParams.FullScreen_RefreshRateInHz = 0;
} //BuildPresentParamsFromSettings
this is how I create the shadow volume to render:
(The shadow volume is computed in our own geometry module, and data is loaded from there. But that code is irrevelant now, since it's computed only once, it's just rendered then (static world, only camera is moving))
I post this part here because of the createmeshfvf and the optimizeinplace
void GEOMETRYELEMDD::InitializeShadowVolumes(void)
{
EM::GeometryElem* ge = GetGeometryElem();
assert(ge != NULL);
if (ge == NULL)
return;
for (long iShadowVol = 0; iShadowVol < ge->GetShadowVolumeCount(); iShadowVol++) {
if (ge->GetShadowVolume(iShadowVol) == NULL)
continue;
const Mesh3D& mesh = *(ge->GetShadowVolume(iShadowVol));
nShadowTriangleVertices = mesh.GetTriVertexCount();
nShadowTriangles = mesh.GetTriangleCount();
HRESULT hr;
ID3DXMesh* shadowVolumeMesh = NULL;
if (FAILED(hr = D3DXCreateMeshFVF(nShadowTriangles, nShadowTriangleVertices, D3DXMESH_SYSTEMMEM | D3DXMESH_WRITEONLY, D3DFVF_XYZ, rView->pd3dDevice, &shadowVolumeMesh)))
continue;
assert(shadowVolumeMesh != NULL);
if (shadowVolumeMesh == NULL)
continue;
// II.1. Fill in the vertex data
D3DXVECTOR3* triVertices = NULL;
if (FAILED(hr = shadowVolumeMesh->LockVertexBuffer(0, (DX89(BYTE,void)**)&triVertices)))
continue;
for (long i = 0; i < nShadowTriangleVertices; i++) {
const GM::TriVertex3D& trivertex = mesh.GetTriVertex(i);
CCVector3D pos = trivertex.pos;
ConvertVector3D2D3DXVector(pos, triVertices);
ConvertMeter2MiliMeter(triVertices);
}
// II.2. Fill in index, texture data
WORD* triangleIndices = NULL;
m_userIdPerTriangle = new long[nTriangles];
if (FAILED(hr = shadowVolumeMesh->LockIndexBuffer(0, (DX89(BYTE,void)**)&triangleIndices)))
continue;
for (long i = 0; i < nShadowTriangles; i++) {
const GM::Triangle3D& triangle = mesh.GetTriangle(i);
triangleIndices[3 * i] = triangle.triVert1Id;
triangleIndices[3 * i + 1] = triangle.triVert2Id;
triangleIndices[3 * i + 2] = triangle.triVert3Id;
}
shadowVolumeMesh->UnlockVertexBuffer();
shadowVolumeMesh->UnlockIndexBuffer();
// opt
DWORD *adjacency = new DWORD[3*shadowVolumeMesh->GetNumFaces()];
hr = shadowVolumeMesh->GenerateAdjacency(0.0f, adjacency);
hr = shadowVolumeMesh->OptimizeInplace(D3DXMESHOPT_COMPACT | D3DXMESHOPT_ATTRSORT | D3DXMESHOPT_VERTEXCACHE, adjacency, NULL, NULL, NULL);
dxShadowVolumeMeshes.Add(shadowVolumeMesh);
}
} // InitializeShadowVolumes
This is where I set up things for shadow rendering (both single and two sided stencil):
void DDRENDERING_VIEW::RenderZFailShadow(const EM::Camera& camera)
{
CCDX9::CD3DStateGuard guard(pd3dDevice);
long currLight = 0;
long maxLights = GetShadowCastableLightMaxIndex();
for (long iLight = 0; iLight < maxLights; iLight++) {
EM::Light* light = m_elementManager->GetLight(iLight);
if (light == NULL || !light->IsCastShadow())
continue;
SetLight(iLight, m_graphicsSettings.MustRenderShadow());
//pd3dDevice->LightEnable(0, FALSE);
// depth buffer writing OFF
guard.SetRenderState( D3DRS_ZWRITEENABLE, FALSE );
// clear stencil buffer
//pd3dDevice->Clear(0, NULL, D3DCLEAR_STENCIL, D3DCOLOR_XRGB(0,0,0), 1.0f, 0);
// turn OFF colour buffer (now we wanna write to the stencil buffer only)
guard.SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
guard.SetRenderState(D3DRS_SRCBLEND, D3DBLEND_ZERO);
guard.SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE);
// disable lighting (not needed for stencil writes anyway)
guard.SetRenderState(D3DRS_LIGHTING, FALSE);
// turn ON stencil buffer
guard.SetRenderState( D3DRS_STENCILENABLE, TRUE );
guard.SetRenderState( D3DRS_STENCILFUNC, D3DCMP_ALWAYS );
guard.SetRenderState( D3DRS_CCW_STENCILFUNC, D3DCMP_ALWAYS );
if (m_use2SidedStencil) {
//set stencil to increment if z test fails, else keep
guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_INCR );
//set stencil to decrement if z test fails, else keep
guard.SetRenderState( D3DRS_CCW_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_CCW_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_CCW_STENCILZFAIL, D3DSTENCILOP_DECR );
//render back faces
guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_NONE );
guard.SetRenderState( D3DRS_TWOSIDEDSTENCILMODE, TRUE );
// render shadow volumes for this light
for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
continue;
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail)); // 0.000001 egesz jo
RenderShadow(camera, elem, currLight);
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
}
// switch off 2 sided stencil
guard.SetRenderState( D3DRS_TWOSIDEDSTENCILMODE, FALSE );
// from now, render front faces again (normal operation)
guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );
}
else {
//set stencil to decrement if z test fails, else keep
guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_INCR );
//render back faces
guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CW );
// render shadow volumes for this light
for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
continue;
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail));
RenderShadow(camera, elem, currLight);
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
}
//set stencil to increment if z test fails, else keep
guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_DECR );
//render front faces
guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );
// render shadow volumes for this light
for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
if(elem == NULL || !elem->IsDrawable() || (elem->GetShowType() != EM::Shaded))
continue;
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(m_zSlopeScaleZFail));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(m_zBiasZFail));
RenderShadow(camera, elem, currLight);
guard.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, F2DW(0.0));
guard.SetRenderState(D3DRS_DEPTHBIAS, F2DW(0.0));
}
// from now, render front faces again (normal operation)
guard.SetRenderState( D3DRS_CULLMODE, D3DCULL_CCW );
} // if (m_use2SidedStencil)
// switch lighting ON
guard.SetRenderState(D3DRS_LIGHTING, TRUE);
// now draw only if stencil buffer enables it
guard.SetRenderState( D3DRS_STENCILENABLE, TRUE );
// reset stencil ops
guard.SetRenderState( D3DRS_STENCILREF, 0x0 );
guard.SetRenderState( D3DRS_STENCILFUNC, D3DCMP_EQUAL );
guard.SetRenderState( D3DRS_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_STENCILZFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_CCW_STENCILPASS, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_CCW_STENCILFAIL, D3DSTENCILOP_KEEP );
guard.SetRenderState( D3DRS_CCW_STENCILZFAIL, D3DSTENCILOP_KEEP );
// turn on the current light, disable all others
EM::DirectionalLight* dlight = dynamic_cast<EM::DirectionalLight*>(m_elementManager->GetLight(iLight));
if (dlight != NULL && iLight == 0 && dlight->GetDirection().z > 0) // Sun doesn't shine at night
pd3dDevice->LightEnable(0, FALSE);
else
pd3dDevice->LightEnable(0, TRUE);
guard.SetRenderState( D3DRS_AMBIENT, 0x00000000 );
//turn ON colour buffer, additive
guard.SetRenderState( D3DRS_ALPHABLENDENABLE, TRUE );
guard.SetRenderState( D3DRS_SRCBLEND, D3DBLEND_ONE );
guard.SetRenderState( D3DRS_DESTBLEND, D3DBLEND_ONE );
// grass, render if only no background
Grid* grid = m_elementManager->GetGrid();
if (m_backgroundDD == NULL && grid->GetShowType() == EM::Shaded && grid->IsDrawable())
Render(grid);
// render scene
for (long i = 0; i < m_elementManager->GetElemCount(); i++) {
EM::Renderable* elem = dynamic_cast<EM::Renderable*>(m_elementManager->GetElem(i));
if (elem == NULL || !elem->IsDrawable() || elem->GetShowType() == EM::LinesOnly)
continue;
Render(elem);
}
SetLight(iLight, false);
currLight++;
} // END FOR
SetLight(0, m_graphicsSettings.MustRenderShadow());
}
And this is the actual shadow volume rendering (render(elem))above:
void GEOMETRYELEMDD::RenderShadowVolume(long iLight)
{
if (iLight < 0 || iLight >= dxShadowVolumeMeshes.GetCount())
return;
SetTransform(NULL);
dxShadowVolumeMeshes[iLight]->DrawSubset(0);
rView->m_verticesDrawn += nShadowTriangleVertices;
rView->m_trisDrawn += nShadowTriangles;
}
Now I tried everything I ever read about on forums and tuts. I know that the drawsubset isn't that good, but it must draw the entire shadow volume here, so I don't think that something else would be significantly perform better.
As you can see, I tried optimizing the mesh, creating the vertex and index buffers writeonly. I think I don't have too much renderstate changes (I cannot do it with less, I need these all to achieve correct operation).
So why is this code so terribly slow? Any ideas?
Thanks for your kind help,
Peter
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!