Jump to content

  • Log In with Google      Sign In   
  • Create Account

SSAO performance problems


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
16 replies to this topic

#1 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 07 April 2014 - 01:49 AM

With SSAO, even when drawing nothing, I get 50 fps. I think this is because of depth buffer overdraw.

The author in the book sets depth state to equal to prevent overdraw:

void Direct3D::DrawScene()
{
 
m_Smap->BindDsvAndSetNullRenderTarget(pDeviceContext);
 
DrawInstancedModelsToShadowMap();
 
DrawModelsToShadowMap();
 
//m_Land.DrawShadowMap(m_Cam);
 
pDeviceContext->RSSetState(0);
 
pDeviceContext->ClearDepthStencilView(m_DepthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
pDeviceContext->RSSetViewports(1, &m_ViewPort);
m_Ssao->SetNormalDepthRenderTarget(m_DepthStencilView);
 
DrawModelsToSsaoMap();
 
//
// Now compute the ambient occlusion.
//
 
m_Ssao->ComputeSsao(m_Cam);
m_Ssao->BlurAmbientMap(4);
 
RestoreRenderTarget();
 
pDeviceContext->ClearRenderTargetView(m_RenderTargetView, reinterpret_cast<const float*>(&Colors::Silver));
//pDeviceContext->ClearDepthStencilView(m_DepthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
 
 
m_Land.Draw(pDeviceContext, m_Cam, mDirLights);
 
// We already laid down scene depth to the depth buffer in the Normal/Depth map pass,
// so we can set the depth comparison test to “EQUALS.”  This prevents any overdraw
// in this rendering pass, as only the nearest visible pixels will pass this depth
// comparison test.
 
pDeviceContext->OMSetDepthStencilState(RenderStates::EqualsDSS, 0);
 
if (GetAsyncKeyState('1'))
pDeviceContext->RSSetState(RenderStates::WireframeRS);
 
DrawInstancedModels();
 
DrawModels();
//....
}

But this does not work well for me, I get this result when i set EqualsDDS
6KArmWR.png
 
 
But with this
void Direct3D::DrawScene()
{
 
m_Smap->BindDsvAndSetNullRenderTarget(pDeviceContext);
 
DrawInstancedModelsToShadowMap();
 
DrawModelsToShadowMap();
 
//m_Land.DrawShadowMap(m_Cam);
 
pDeviceContext->RSSetState(0);
 
pDeviceContext->ClearDepthStencilView(m_DepthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
pDeviceContext->RSSetViewports(1, &m_ViewPort);
m_Ssao->SetNormalDepthRenderTarget(m_DepthStencilView);
 
DrawModelsToSsaoMap();
 
//
// Now compute the ambient occlusion.
//
 
m_Ssao->ComputeSsao(m_Cam);
m_Ssao->BlurAmbientMap(4);
 
RestoreRenderTarget();
 
pDeviceContext->ClearRenderTargetView(m_RenderTargetView, reinterpret_cast<const float*>(&Colors::Silver));
//pDeviceContext->ClearDepthStencilView(m_DepthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
 
 
m_Land.Draw(pDeviceContext, m_Cam, mDirLights);
 
// We already laid down scene depth to the depth buffer in the Normal/Depth map pass,
// so we can set the depth comparison test to “EQUALS.”  This prevents any overdraw
// in this rendering pass, as only the nearest visible pixels will pass this depth
// comparison test.
 
//pDeviceContext->OMSetDepthStencilState(RenderStates::EqualsDSS, 0);
 
if (GetAsyncKeyState('1'))
pDeviceContext->RSSetState(RenderStates::WireframeRS);
 
DrawInstancedModels();
 
DrawModels(); //..........
It works well.
1536570_862403237119365_6259900933246690
 
Only the problem is performance.
 

Edited by newtechnology, 07 April 2014 - 01:52 AM.


Sponsor:

#2 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 07 April 2014 - 02:15 AM

Also what output debug messages say:

D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]
D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]
D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]
D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]
D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]
D3D11: ERROR: ID3D11DeviceContext::OMSetRenderTargets: The RenderTargetView at slot 0 is not compatable with the DepthStencilView. DepthStencilViews may only be used with RenderTargetViews if the effective dimensions of the Views are equal, as well as the Resource types, multisample count, and multisample quality. The RenderTargetView at slot 0 has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:1,mq:0). The DepthStencilView has (w:492,h:473,as:1), while the Resource is a Texture2D with (mc:4,mq:0). D3D11_RESOURCE_MISC_TEXTURECUBE factors into the Resource type, unless GetFeatureLevel() returns D3D_FEATURE_LEVEL_10_1 or greater. [ STATE_SETTING ERROR #388: OMSETRENDERTARGETS_INVALIDVIEW ]

EDIT : When I viewed texture for debugging, it was empty i.e models are not getting drawn to SSAO map.


Edited by newtechnology, 07 April 2014 - 02:24 AM.


#3 Juliean   GDNet+   -  Reputation: 2748

Like
1Likes
Like

Posted 07 April 2014 - 04:18 AM


EDIT : When I viewed texture for debugging, it was empty i.e models are not getting drawn to SSAO map.

 

Just as the debug output messages say, I can barely rephrase it: The render target view is not compatible with the depth stencil view. In detail, your render target texture does not match the depth stencil buffer. They have to equal in size, multisamplecount, etc... otherwise nothing gets rendered.

 

EDIT: Also, concerning performance, we need to see your SSAO-shader.


Edited by Juliean, 07 April 2014 - 04:19 AM.


#4 Hodgman   Moderators   -  Reputation: 31943

Like
2Likes
Like

Posted 07 April 2014 - 04:29 AM


With SSAO, even when drawing nothing, I get 50 fps.
This means that either your total CPU time per frame, or your total GPU time per frame is ~20ms. It could well be that running your SSAO algorithm for every pixel is taking ~20ms of GPU time...

#5 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 07 April 2014 - 07:53 AM

Thanks, I changed my multi-sample quality to match depth buffer of render target view 0 and it worked.

There is now another problem.
Why is this happening?

x0mkZPt.png

ILnE7MA.png

 

EDIT: Also here is the SSAO shader

//=============================================================================
// Ssao.fx by Frank Luna (C) 2011 All Rights Reserved.
//
// Computes SSAO map.
//=============================================================================
 
cbuffer cbPerFrame
{
float4x4 gViewToTexSpace; // Proj*Texture
float4   gOffsetVectors[14];
float4   gFrustumCorners[4];
 
// Coordinates given in view space.
float    gOcclusionRadius    = 0.5f;
float    gOcclusionFadeStart = 0.2f;
float    gOcclusionFadeEnd   = 2.0f;
float    gSurfaceEpsilon     = 0.05f;
};
 
// Nonnumeric values cannot be added to a cbuffer.
Texture2D gNormalDepthMap;
Texture2D gRandomVecMap;
 
SamplerState samNormalDepth
{
Filter = MIN_MAG_LINEAR_MIP_POINT;
 
// Set a very far depth value if sampling outside of the NormalDepth map
// so we do not get false occlusions.
AddressU = BORDER;
AddressV = BORDER;
BorderColor = float4(0.0f, 0.0f, 0.0f, 1e5f);
};
 
SamplerState samRandomVec
{
Filter = MIN_MAG_LINEAR_MIP_POINT;
AddressU  = WRAP;
    AddressV  = WRAP;
};
 
struct VertexIn
{
float3 PosL            : POSITION;
float3 ToFarPlaneIndex : NORMAL;
float2 Tex             : TEXCOORD;
};
 
struct VertexOut
{
    float4 PosH       : SV_POSITION;
    float3 ToFarPlane : TEXCOORD0;
float2 Tex        : TEXCOORD1;
};
 
VertexOut VS(VertexIn vin)
{
VertexOut vout;
 
// Already in NDC space.
vout.PosH = float4(vin.PosL, 1.0f);
 
// We store the index to the frustum corner in the normal x-coord slot.
vout.ToFarPlane = gFrustumCorners[vin.ToFarPlaneIndex.x].xyz;
 
// Pass onto pixel shader.
vout.Tex = vin.Tex;
 
    return vout;
}
 
// Determines how much the sample point q occludes the point p as a function
// of distZ.
float OcclusionFunction(float distZ)
{
//
// If depth(q) is "behind" depth(p), then q cannot occlude p.  Moreover, if 
// depth(q) and depth(p) are sufficiently close, then we also assume q cannot
// occlude p because q needs to be in front of p by Epsilon to occlude p.
//
// We use the following function to determine the occlusion.  
// 
//
//       1.0     -------------\
//               |           |  \
//               |           |    \
//               |           |      \ 
//               |           |        \
//               |           |          \
//               |           |            \
//  ------|------|-----------|-------------|---------|--> zv
//        0     Eps          z0            z1        
//
 
float occlusion = 0.0f;
if(distZ > gSurfaceEpsilon)
{
float fadeLength = gOcclusionFadeEnd - gOcclusionFadeStart;
 
// Linearly decrease occlusion from 1 to 0 as distZ goes 
// from gOcclusionFadeStart to gOcclusionFadeEnd. 
occlusion = saturate( (gOcclusionFadeEnd-distZ)/fadeLength );
}
 
return occlusion; 
}
 
float4 PS(VertexOut pin, uniform int gSampleCount) : SV_Target
{
// p -- the point we are computing the ambient occlusion for.
// n -- normal vector at p.
// q -- a random offset from p.
// r -- a potential occluder that might occlude p.
 
// Get viewspace normal and z-coord of this pixel.  The tex-coords for
// the fullscreen quad we drew are already in uv-space.
float4 normalDepth = gNormalDepthMap.SampleLevel(samNormalDepth, pin.Tex, 0.0f);
 
float3 n = normalDepth.xyz;
float pz = normalDepth.w;
 
//
// Reconstruct full view space position (x,y,z).
// Find t such that p = t*pin.ToFarPlane.
// p.z = t*pin.ToFarPlane.z
// t = p.z / pin.ToFarPlane.z
//
float3 p = (pz/pin.ToFarPlane.z)*pin.ToFarPlane;
 
// Extract random vector and map from [0,1] --> [-1, +1].
float3 randVec = 2.0f*gRandomVecMap.SampleLevel(samRandomVec, 4.0f*pin.Tex, 0.0f).rgb - 1.0f;
 
float occlusionSum = 0.0f;
 
// Sample neighboring points about p in the hemisphere oriented by n.
[unroll]
for(int i = 0; i < gSampleCount; ++i)
{
// Are offset vectors are fixed and uniformly distributed (so that our offset vectors
// do not clump in the same direction).  If we reflect them about a random vector
// then we get a random uniform distribution of offset vectors.
float3 offset = reflect(gOffsetVectors[i].xyz, randVec);
 
// Flip offset vector if it is behind the plane defined by (p, n).
float flip = sign( dot(offset, n) );
 
// Sample a point near p within the occlusion radius.
float3 q = p + flip * gOcclusionRadius * offset;
 
// Project q and generate projective tex-coords.  
float4 projQ = mul(float4(q, 1.0f), gViewToTexSpace);
projQ /= projQ.w;
 
// Find the nearest depth value along the ray from the eye to q (this is not
// the depth of q, as q is just an arbitrary point near p and might
// occupy empty space).  To find the nearest depth we look it up in the depthmap.
 
float rz = gNormalDepthMap.SampleLevel(samNormalDepth, projQ.xy, 0.0f).a;
 
// Reconstruct full view space position r = (rx,ry,rz).  We know r
// lies on the ray of q, so there exists a t such that r = t*q.
// r.z = t*q.z ==> t = r.z / q.z
 
float3 r = (rz / q.z) * q;
 
//
// Test whether r occludes p.
//   * The product dot(n, normalize(r - p)) measures how much in front
//     of the plane(p,n) the occluder point r is.  The more in front it is, the
//     more occlusion weight we give it.  This also prevents self shadowing where 
//     a point r on an angled plane (p,n) could give a false occlusion since they
//     have different depth values with respect to the eye.
//   * The weight of the occlusion is scaled based on how far the occluder is from
//     the point we are computing the occlusion of.  If the occluder r is far away
//     from p, then it does not occlude it.
// 
 
float distZ = p.z - r.z;
float dp = max(dot(n, normalize(r - p)), 0.0f);
float occlusion = dp * OcclusionFunction(distZ);
 
occlusionSum += occlusion;
}
 
occlusionSum /= gSampleCount;
 
float access = 1.0f - occlusionSum;
 
// Sharpen the contrast of the SSAO map to make the SSAO affect more dramatic.
return saturate(pow(access, 4.0f));
}
 
technique11 Ssao
{
    pass P0
    {
SetVertexShader( CompileShader( vs_4_0, VS() ) );
SetGeometryShader( NULL );
        SetPixelShader( CompileShader( ps_4_0, PS(14) ) );
    }
}

Here is how i draw:

void Model::RenderNormalDepthMap(CXMMATRIX World, CXMMATRIX ViewProj)
{
ID3DX11EffectTechnique* activeTech = Effects::SsaoNormalDepthFX->NormalDepthTech;;
 
pDeviceContext->IASetInputLayout(InputLayouts::Basic32);
pDeviceContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST); 
 
XMMATRIX view = d3d->m_Cam.View();
 
XMMATRIX world = World;
XMMATRIX worldInvTranspose = MathHelper::InverseTranspose(world);
XMMATRIX worldView   = world* view;
XMMATRIX worldInvTransposeView = worldInvTranspose*view;
XMMATRIX worldViewProj = world * ViewProj;
 
 
D3DX11_TECHNIQUE_DESC techDesc;
    activeTech->GetDesc(&techDesc);
 
float blendFactor[4] = {0.0f, 0.0f, 0.0f, 0.0f};
 
    for(UINT p = 0; p < techDesc.Passes; ++p)
    {
for (UINT i = 0; i < mModel.mSubsetCount; i++)
   {    
Effects::SsaoNormalDepthFX->SetWorldView(worldView);
       Effects::SsaoNormalDepthFX->SetWorldInvTransposeView(worldInvTransposeView);
       Effects::SsaoNormalDepthFX->SetWorldViewProj(worldViewProj);
       Effects::SsaoNormalDepthFX->SetTexTransform(XMMatrixIdentity());
 
   activeTech->GetPassByIndex(p)->Apply(0, pDeviceContext);
 
   mModel.Mesh.Draw(i);
 
   }
}
}
 
void Model::Render(CXMMATRIX World, CXMMATRIX ViewProj)
{
ID3DX11EffectTechnique* activeTech;
 
pDeviceContext->IASetInputLayout(InputLayouts::Basic32);
pDeviceContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST); 
 
XMMATRIX W = World;
XMMATRIX worldInvTranspose = MathHelper::InverseTranspose(W);
XMMATRIX WorldViewProj = W * ViewProj;
XMMATRIX TexTransform = XMMatrixIdentity();
XMMATRIX ShadowTransform = W * XMLoadFloat4x4(&d3d->m_ShadowTransform);
 
// Transform NDC space [-1,+1]^2 to texture space [0,1]^2
XMMATRIX toTexSpace(
0.5f, 0.0f, 0.0f, 0.0f,
0.0f, -0.5f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.5f, 0.5f, 0.0f, 1.0f);
 
Effects::BasicFX->SetDirLights(Lights);
Effects::BasicFX->SetEyePosW(d3d->m_Cam.GetPosition());
 
if (mInfo.AlphaClip)
{
   if (mInfo.NumLights == 1)
   activeTech = Effects::BasicFX->Light1TexAlphaClipTech;
   else if (mInfo.NumLights == 2)
       activeTech = Effects::BasicFX->Light2TexAlphaClipTech;
   else if (mInfo.NumLights == 3)
   activeTech = Effects::BasicFX->Light3TexAlphaClipTech;
   else 
   activeTech = Effects::BasicFX->Light0TexAlphaClipTech;
}
else
{
if (mInfo.NumLights == 1)
   activeTech = Effects::BasicFX->Light1TexTech;
   else if (mInfo.NumLights == 2)
       activeTech = Effects::BasicFX->Light2TexTech;
   else if (mInfo.NumLights == 3)
   activeTech = Effects::BasicFX->Light3TexTech;
   else 
   activeTech = Effects::BasicFX->Light0TexTech;
 
}
 
if (!mInfo.BackfaceCulling)
pDeviceContext->RSSetState(RenderStates::NoCullRS);
 
 
Effects::BasicFX->SetShadowMap(d3d->GetShadowMap());
Effects::BasicFX->SetSsaoMap(d3d->m_Ssao->AmbientSRV());
Effects::BasicFX->SetWorld(W);
Effects::BasicFX->SetWorldInvTranspose(worldInvTranspose);
Effects::BasicFX->SetWorldViewProj(WorldViewProj);
Effects::BasicFX->SetWorldViewProjTex(WorldViewProj * toTexSpace);
Effects::BasicFX->SetTexTransform(TexTransform);
Effects::BasicFX->SetShadowTransform(ShadowTransform);
 
 
D3DX11_TECHNIQUE_DESC techDesc;
    activeTech->GetDesc(&techDesc);
 
float blendFactor[4] = {0.0f, 0.0f, 0.0f, 0.0f};
 
    for(UINT p = 0; p < techDesc.Passes; ++p)
    {
for (UINT i = 0; i < mModel.mSubsetCount; i++)
   {   
   Effects::BasicFX->SetMaterial(Materials[i]);
Effects::BasicFX->SetDiffuseMap(DiffuseMapSRV[i]);
 
if (mInfo.AlphaToCoverage)
pDeviceContext->OMSetBlendState(RenderStates::AlphaToCoverageBS, blendFactor, 0xffffffff);
 
   activeTech->GetPassByIndex(p)->Apply(0, pDeviceContext);
 
   mModel.Mesh.Draw(i);
 
if (mInfo.AlphaToCoverage)
   pDeviceContext->OMSetBlendState(0, blendFactor, 0xffffffff);
   }
}
 
if (!mInfo.BackfaceCulling)
   pDeviceContext->RSSetState(0);
}

Edited by newtechnology, 07 April 2014 - 07:57 AM.


#6 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 12 April 2014 - 01:32 AM

I fixed the problem but SSAO is still bottleneck. (Performance problems.)

1534330_864713560221666_8278732862303078

 

I reduced samples to 5 from 14 and blurring of image 2 times instead of 4 times + I'm skipping pixels that are too away from camera. ( if (p.z > 100) return 1.0f;) yet I only get 40 fps (at 1024x768 resoulution of screen and half of it for ssao map) when I'm very near to the model which uses SSAO.


Edited by newtechnology, 12 April 2014 - 01:32 AM.


#7 TheChubu   Crossbones+   -  Reputation: 4802

Like
0Likes
Like

Posted 12 April 2014 - 07:09 AM

40fps with what hardware precisely? (I'm always surprised that no one seems to ask about the hardware when somebody thinks he has performance issues).


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#8 Jason Z   Crossbones+   -  Reputation: 5428

Like
1Likes
Like

Posted 12 April 2014 - 09:33 AM

If you think that your SSAO is the bottleneck, then try rendering your frames with the occlusion buffer not being rendered into, and don't do the calculations for determining the occlusion value - then see what the performance is.  This would be your baseline that you can start to reason about what is taking up the most time with your rendering scheme.  Then add back in the rendering to the occlusion buffer, and finally you can add back in your occlusion calculations.  See what each step is costing you before you try to optimize the code.

 

Realtime rendering performance is a tricky thing to understand.  You might be CPU bound due to lots of API calls.  If that is the case, changing your SSAO calculations won't help you one bit.  You could be texture bandwidth bound, in which case changing your calculations won't help either.  Or you could be GPU computation bound, which means that changing your SSAO calculations will probably help some.

 

The point is, there are lots of reasons that you might end up with performance problems - you can't fix them until you have identified what the problem is.



#9 Styves   Members   -  Reputation: 1078

Like
0Likes
Like

Posted 12 April 2014 - 10:43 AM

We need to know the system specs as well. :)



#10 Jason Z   Crossbones+   -  Reputation: 5428

Like
1Likes
Like

Posted 12 April 2014 - 01:45 PM

40fps with what hardware precisely? (I'm always surprised that no one seems to ask about the hardware when somebody thinks he has performance issues).

 

We need to know the system specs as well. smile.png

 

The system specs don't really matter.  If he is trying to get his system running faster, the bottleneck is probably not going to be apparent from the type of GPU, CPU or how much memory he has.  He needs to indicate directionally what the issue is, and then we can help him drill down and find solutions to the problems.



#11 TheChubu   Crossbones+   -  Reputation: 4802

Like
1Likes
Like

Posted 12 April 2014 - 02:33 PM


The system specs don't really matter. 

What if the OP has a GeForce 8400GT and wants 300 fps out of it? "Performance problems" are relative. For some hardware 40 fps might be poor, for some other hardware it might be very fast, for some people it might be not enough, for some people they'd be fine with just 30 fps.

 

Before saying this is a performance problem, you have to define it first (what it would be the desired performance) and give it context (what hardware is being used), then you can say "Yep, this is running slow" or "Nope, numbers like those are expected with that configuration".


Edited by TheChubu, 12 April 2014 - 02:33 PM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#12 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 13 April 2014 - 12:32 AM

My GPU is average and it can render terrain with tessellation and patch culling at 500-800 fps.

When i add shadows it drops to 200-400 but When I look at sky, its 700 fps (because of culling.)

 

anyways my cpu is:

 

Intel® Core™ i3-3220 CPU @ 3.30 GHz (4 CPUs), ~3.3GHz

 

GPU:  (from dxdiag -> display)

Name: Intel® HD Graphics

Approx. Total memory: 1555 MB

 

RAM: 4 GB

 

The desired performance I want is 100 fps at 1024x768 resolution with SSAO (as soon as I disable computing ssao, fps increases to about 200) because when I'll add other things such as physics and collision then the fps going to drop more (maybe to 60 - 70 fps)

 

Also, I'll try what Jason Z said.


Edited by newtechnology, 13 April 2014 - 12:37 AM.


#13 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 13 April 2014 - 01:09 AM

With this ssao shader

float4 PS(VertexOut pin, uniform int gSampleCount) : SV_Target
{
/* // p -- the point we are computing the ambient occlusion for.
// n -- normal vector at p.
// q -- a random offset from p.
// r -- a potential occluder that might occlude p.
 
// Get viewspace normal and z-coord of this pixel.  The tex-coords for
// the fullscreen quad we drew are already in uv-space.
float4 normalDepth = gNormalDepthMap.SampleLevel(samNormalDepth, pin.Tex, 0.0f);
 
float3 n = normalDepth.xyz;
float pz = normalDepth.w;
 
//
// Reconstruct full view space position (x,y,z).
// Find t such that p = t*pin.ToFarPlane.
// p.z = t*pin.ToFarPlane.z
// t = p.z / pin.ToFarPlane.z
//
float3 p = (pz/pin.ToFarPlane.z)*pin.ToFarPlane;
 
if (p.z > 100)
return 1.0f;
 
 
// Extract random vector and map from [0,1] --> [-1, +1].
float3 randVec = 2.0f*gRandomVecMap.SampleLevel(samRandomVec, 4.0f*pin.Tex, 0.0f).rgb - 1.0f;
 
float occlusionSum = 0.0f;
 
// Sample neighboring points about p in the hemisphere oriented by n.
[unroll]
for(int i = 0; i < gSampleCount; ++i)
{
// Are offset vectors are fixed and uniformly distributed (so that our offset vectors
// do not clump in the same direction).  If we reflect them about a random vector
// then we get a random uniform distribution of offset vectors.
float3 offset = reflect(gOffsetVectors[i].xyz, randVec);
 
// Flip offset vector if it is behind the plane defined by (p, n).
float flip = sign( dot(offset, n) );
 
// Sample a point near p within the occlusion radius.
float3 q = p + flip * gOcclusionRadius * offset;
 
// Project q and generate projective tex-coords.  
float4 projQ = mul(float4(q, 1.0f), gViewToTexSpace);
 
projQ /= projQ.w;
 
// Find the nearest depth value along the ray from the eye to q (this is not
// the depth of q, as q is just an arbitrary point near p and might
// occupy empty space).  To find the nearest depth we look it up in the depthmap.
 
float rz = gNormalDepthMap.SampleLevel(samNormalDepth, projQ.xy, 0.0f).a;
 
// Reconstruct full view space position r = (rx,ry,rz).  We know r
// lies on the ray of q, so there exists a t such that r = t*q.
// r.z = t*q.z ==> t = r.z / q.z
 
float3 r = (rz / q.z) * q;
 
//
// Test whether r occludes p.
//   * The product dot(n, normalize(r - p)) measures how much in front
//     of the plane(p,n) the occluder point r is.  The more in front it is, the
//     more occlusion weight we give it.  This also prevents self shadowing where 
//     a point r on an angled plane (p,n) could give a false occlusion since they
//     have different depth values with respect to the eye.
//   * The weight of the occlusion is scaled based on how far the occluder is from
//     the point we are computing the occlusion of.  If the occluder r is far away
//     from p, then it does not occlude it.
// 
 
float distZ = p.z - r.z;
float dp = max(dot(n, normalize(r - p)), 0.0f);
float occlusion = dp * OcclusionFunction(distZ);
 
occlusionSum += occlusion;
}
 
occlusionSum /= gSampleCount;
 
float access = 1.0f - occlusionSum;
 
// Sharpen the contrast of the SSAO map to make the SSAO affect more dramatic.
return saturate(pow(access, 4.0f)); */
 
 
return 1.0f;
}
rABHi8l.png
performance is still same so there is no problem here.
 
When I'm looking at other side or If i'm too far, I disable computing SSAO Map with this code.
 
bool ComputeSSAOThisFrame = false;
 
for (USHORT i = 0; i < ModelInstances.size(); ++i) 
{
if (ModelInstances[i].ComputeSSAO) 
{
if (ModelInstances[i].Visible) 
{
XMVECTOR campos = m_Cam.GetPositionXM();
 
XMVECTOR modelpos = XMLoadFloat3(&XMFLOAT3(ModelInstances[i].World._41, 
ModelInstances[i].World._42, ModelInstances[i].World._43));
 
XMVECTOR dist = modelpos - campos;
float distf;
 
XMStoreFloat(&distf, XMVector3LengthSq(dist));
 
if (!(distf > 10000)) //sqrt(10000) = 100
{
ComputeSSAOThisFrame = true;
break;
}
}
}
}
Now the performance is when not visible:
uF9BaHk.png
when too far:
FiXGCFi.png
 
Now when I'm looking at sky, everything is culling using intersection tests (shadows are too culled i.e rendering to shadow map) + terrain is culled in constant hull shader.
Now the performance is:
opkRSeN.png
 
These code is causing performance issues because when I don't call it, FPS increases to 136 from 40.
void Ssao::ComputeSsao(const Camera& camera)
{
// Bind the ambient map as the render target.  Observe that this pass does not bind 
// a depth/stencil buffer--it does not need it, and without one, no depth test is
// performed, which is what we want.
ID3D11RenderTargetView* renderTargets[1] = {mAmbientRTV0};
mDC->OMSetRenderTargets(1, renderTargets, 0);
mDC->ClearRenderTargetView(mAmbientRTV0, reinterpret_cast<const float*>(&Colors::Black));
mDC->RSSetViewports(1, &mAmbientMapViewport);
 
// Transform NDC space [-1,+1]^2 to texture space [0,1]^2
static const XMMATRIX T(
0.5f, 0.0f, 0.0f, 0.0f,
0.0f, -0.5f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.5f, 0.5f, 0.0f, 1.0f);
 
XMMATRIX P  = camera.Proj();
XMMATRIX PT = XMMatrixMultiply(P, T);
 
Effects::SsaoFX->SetViewToTexSpace(PT);
Effects::SsaoFX->SetOffsetVectors(mOffsets);
Effects::SsaoFX->SetFrustumCorners(mFrustumFarCorner);
Effects::SsaoFX->SetNormalDepthMap(mNormalDepthSRV);
Effects::SsaoFX->SetRandomVecMap(mRandomVectorSRV);
 
UINT stride = sizeof(Vertex::Basic32);
    UINT offset = 0;
 
mDC->IASetInputLayout(InputLayouts::Basic32);
    mDC->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
mDC->IASetVertexBuffers(0, 1, &mScreenQuadVB, &stride, &offset);
mDC->IASetIndexBuffer(mScreenQuadIB, DXGI_FORMAT_R16_UINT, 0);
 
ID3DX11EffectTechnique* tech = Effects::SsaoFX->SsaoTech;
D3DX11_TECHNIQUE_DESC techDesc;
 
tech->GetDesc( &techDesc );
for(UINT p = 0; p < techDesc.Passes; ++p)
    {
tech->GetPassByIndex(p)->Apply(0, mDC);
mDC->DrawIndexed(6, 0, 0);
    }
}
 
void Ssao::BlurAmbientMap(int blurCount)
{
for(int i = 0; i < blurCount; ++i)
{
// Ping-pong the two ambient map textures as we apply
// horizontal and vertical blur passes.
BlurAmbientMap(mAmbientSRV0, mAmbientRTV1, true);
BlurAmbientMap(mAmbientSRV1, mAmbientRTV0, false);
}
}
 
void Ssao::BlurAmbientMap(ID3D11ShaderResourceView* inputSRV, ID3D11RenderTargetView* outputRTV, bool horzBlur)
{
ID3D11RenderTargetView* renderTargets[1] = {outputRTV};
mDC->OMSetRenderTargets(1, renderTargets, 0);
mDC->ClearRenderTargetView(outputRTV, reinterpret_cast<const float*>(&Colors::Black));
mDC->RSSetViewports(1, &mAmbientMapViewport);
 
Effects::SsaoBlurFX->SetTexelWidth(1.0f / mAmbientMapViewport.Width );
Effects::SsaoBlurFX->SetTexelHeight(1.0f / mAmbientMapViewport.Height );
Effects::SsaoBlurFX->SetNormalDepthMap(mNormalDepthSRV);
Effects::SsaoBlurFX->SetInputImage(inputSRV);
 
ID3DX11EffectTechnique* tech;
if(horzBlur)
{
tech = Effects::SsaoBlurFX->HorzBlurTech;
}
else
{
tech = Effects::SsaoBlurFX->VertBlurTech;
}
 
UINT stride = sizeof(Vertex::Basic32);
    UINT offset = 0;
 
mDC->IASetInputLayout(InputLayouts::Basic32);
    mDC->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
mDC->IASetVertexBuffers(0, 1, &mScreenQuadVB, &stride, &offset);
mDC->IASetIndexBuffer(mScreenQuadIB, DXGI_FORMAT_R16_UINT, 0);
 
D3DX11_TECHNIQUE_DESC techDesc;
tech->GetDesc( &techDesc );
for(UINT p = 0; p < techDesc.Passes; ++p)
    {
tech->GetPassByIndex(p)->Apply(0, mDC);
mDC->DrawIndexed(6, 0, 0);
 
// Unbind the input SRV as it is going to be an output in the next blur.
Effects::SsaoBlurFX->SetInputImage(0);
tech->GetPassByIndex(p)->Apply(0, mDC);
    }
}
When i'm too far or if model is not visible, then i do this and performance suddenly increases.

if (ComputeSSAOThisFrame) //don't compute if all models are not visible or all models are far.
{
  // Now compute the ambient occlusion.
   m_Ssao->ComputeSsao(m_Cam);
   m_Ssao->BlurAmbientMap(2);
}
 

 

 

 

NOTE: The resolution is 1024x768, if its 800x600 or 500x500 then with SSAO it reaches about 70-100 fps and 216 fps when not computing ssao. My goal is to reach 100 fps with SSAO at 1024x768 resolution.

 

EDIT: here are SSAO blur and ssaoNormaldepth shaders.

 

SSAO normal depth: http://pastebin.com/DUXjGxYd

SSAO Blur: http://pastebin.com/KQe2MP1C


Edited by newtechnology, 13 April 2014 - 02:31 AM.


#14 Jason Z   Crossbones+   -  Reputation: 5428

Like
0Likes
Like

Posted 13 April 2014 - 08:07 AM

 


The system specs don't really matter. 

What if the OP has a GeForce 8400GT and wants 300 fps out of it? "Performance problems" are relative. For some hardware 40 fps might be poor, for some other hardware it might be very fast, for some people it might be not enough, for some people they'd be fine with just 30 fps.

 

Before saying this is a performance problem, you have to define it first (what it would be the desired performance) and give it context (what hardware is being used), then you can say "Yep, this is running slow" or "Nope, numbers like those are expected with that configuration".

 

Maybe you can guess what the appropriate performance for a given algorithm on particular hardware is, but I doubt that you can be anywhere other than within an order of magnitude of the true 'max' performance.  At least in my opinion, the specs may be interesting to hear and compare with your own experiences, but there are far too many variables in play for them to have a useful input into a performance discussion.  He indicated that he gets 40 FPS - now that he has given his hardware specs, what is your estimate of what his performance should be?  What if he said he gets 80 FPS, or 160 FPS - what should he be getting before you consider it a performance problem with his system?  How do you suggest for him to improve his performance based on specs?

 

In reality, you need so much more information that only running tests will tell you.

 

@newtechnology: It is fairly common for SSAO performance to scale with the number of pixels in a scene, since the computation and texture lookups are linearly related to the number of pixels in the render target.  When you run your tests, do so from a single camera view point - don't move around or change how many objects you are rendering.  This will give a more stable estimate of your performance and let you optimize effectively.

 

So once you have picked a representative view of your scene, take a measurement of your average frame time.  Then do the same measurement with your occlusion buffer calculations from above disabled.  Now you can selectively re-enable parts of your algorithm and see what effect they are having on the performance.



#15 phil_t   Crossbones+   -  Reputation: 4109

Like
1Likes
Like

Posted 13 April 2014 - 01:31 PM

Your blur algorithm takes 24 texture samples (12 samples  * 2 textures) per pixel. That's 48 texture samples for horizontal + vertical blur passes (i.e. a huge amount!). If you remove the blur, what happens to your frame time?



#16 newtechnology   Members   -  Reputation: 789

Like
0Likes
Like

Posted 14 April 2014 - 02:00 AM

This is my fps with no blur. (92 - 165)

LcaSOtX.png

MU4FbVZ.png



#17 phil_t   Crossbones+   -  Reputation: 4109

Like
0Likes
Like

Posted 14 April 2014 - 02:51 AM

So now you know that your blur is a big part of your performance problem.

 

A couple of quick thoughts:

- Do you really need so many taps? Note that if you're using bilinear filtering, you can sample from between two pixels to get contributions from both without making two texture samples. It doesn't look like you're doing that, from a quick look at your code

- Are you using all the channels of your occlusion texture (samInputImage)? You're really just interested in one channel, right? Then maybe you could squeeze your depth and normal into the remaining 3 channels (you'd need to squeeze your normal into 2, I guess), so you don't have to sample from 2 textures.

- Starting from the center texel, once you hit a discontinuity in the blur, you don't need to continue in that direction, right? So you might be able to use dynamic branching to bail out of some of the calculations. This could actually end up making performance worse, but there's a chance it could improve things and might be worth experimenting with.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS