SSAO slow when geometry is near camera

This topic is 456 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi,

I implemented SSAO based on some tutorials and it does work correctly. However, it is very slow (resulting in low FPS) if the geometry is near (< 0.5 meters) the camera (e.g. facing a wall in first person view). Can anyone explain why? And is there a solution to this?

float main(PixelInputType input) : SV_TARGET
{

uint3 TexCoords = uint3(input.position.xy, 0);

if (Position.w != -1.0f){

if (abs(normal.x) > 0.75f){
rvec = float3(0, rvec.x, rvec.y);
}
else if (abs(normal.y) > 0.75f){
rvec = float3(rvec.x, 0, rvec.y);
}

float3 tangent = normalize(rvec - normal * dot(rvec, normal));
float3 bitangent = cross(normal, tangent);

float3x3 tbn = float3x3(tangent, bitangent, normal); //rows

float Occlusion = 0.0;

for (uint i = 0; i < KernelSize; ++i) {

float3 lKernelData = KernelData[i].xyz;

float3 RotatedKernel = mul(lKernelData, tbn);

if (dot(RotatedKernel, normal) > 0.1f){
float3 Sample = RotatedKernel*Radius + Position.xyz;

float4 offset = mul(float4(Sample, 1.0), ViewProjectionMatrix);
offset.xyz /= offset.w;

if (offset.z > 0){

offset.xy = offset.xy * 0.5 + 0.5;
offset.y = 1 - offset.y;

float4 SamplePos = positionTexture.Sample(SampleTypePoint, offset.xy);

if (SamplePos.w != -1.0f){
float SquaredDifference = dot(Position.xyz - SamplePos.xyz, Position.xyz - SamplePos.xyz);
// range check & accumulate:
Occlusion += ((dot(SamplePos.xyz, SamplePos.xyz) <= dot(Sample.xyz, Sample.xyz) && SquaredDifference < Radius*Radius && SquaredDifference != 0) ? 1.0 : 0.0);
}
}
}
}

return min(1.0f,(Occlusion / KernelSize)*1.25);
}
else{
return 0.0f;
}
}



My position is relative to the camera, so I do depth calculations based on the position. Position.w != -1 is true if it is actual geometry and not the sky box.

Is it slow due to cache misses because the shader looks up texels far from the one rendered? That's what I would guess, but I'm not sure.

Best regards,

Magogan

Edited by Magogan

Share on other sites

Cache misses are not a concern on the GPU as long as it fits into VRAM since the GPU has very good memory bandwidth made to sample textures.

Where the texture is sampled does not matter to performance as long as they are not random noise causing anisotropic filtering to peak at maximum samples.
It is likely that you are doing the advanced calculation in a too high resolution or using a sampler state that does unwanted interpolation, mipmap blending or anisotropic filtering that is not really needed.

Most post effects are done in a low resolution and then merged into the final image while upscaling the result because per pixel calculations are expensive.

You may get some ugly artifacts along the edges from upscaling that has to be covered with some extra post processing to avoid using an ambient occlusion intensity from a far away depth.

Edited by Dawoodoz

Share on other sites

That is definetly about cache misses. Texture caches work spatialy so if samples are scattered around the screen you get bad performance. This is the case when there is something near of the camera. Camera space based sampling means that sample at 1meter away can be 1000 texels away when current sample is near camera. There is really clever way to solve this by using depth mipmaps. Then performance is constant no matter how close or far objects are.

http://graphics.cs.williams.edu/papers/SAOHPG12/

I have implemented this technique and it works really well.

Share on other sites

That is definetly about cache misses. Texture caches work spatialy so if samples are scattered around the screen you get bad performance. This is the case when there is something near of the camera. Camera space based sampling means that sample at 1meter away can be 1000 texels away when current sample is near camera. There is really clever way to solve this by using depth mipmaps. Then performance is constant no matter how close or far objects are.

http://graphics.cs.williams.edu/papers/SAOHPG12/

I have implemented this technique and it works really well.

Intel recently released an article and sample code that builds upon Morgan McGuire's work for performance-scalable SSAO. I would recommend checking that out as well if you're interested in improving your performance.

Share on other sites

That is definetly about cache misses. Texture caches work spatialy so if samples are scattered around the screen you get bad performance. This is the case when there is something near of the camera. Camera space based sampling means that sample at 1meter away can be 1000 texels away when current sample is near camera. There is really clever way to solve this by using depth mipmaps. Then performance is constant no matter how close or far objects are.

http://graphics.cs.williams.edu/papers/SAOHPG12/

I have implemented this technique and it works really well.

Intel recently released an article and sample code that builds upon Morgan McGuire's work for performance-scalable SSAO. I would recommend checking that out as well if you're interested in improving your performance.

My opinion is that SSAO is better with temporal smoothing instead of spatial smoothing. All ssao algorithms that rely depth aware blurs can be quite noisy with foliage. Those also smooth out all normal map details which make indirect lighting bit boring. Deinterleaved rendering ins't needed if you are using depth mips. Depth aware blur is usually quite expensive too compared to ssao with few samples.