Weird performance problem with SSAO

Started by
6 comments, last by JoeJ 6 years, 8 months ago

Hey folks. So I'm having this problem in which if my camera is close to a surface, the SSAO pass suddenly spikes up to around taking 16 milliseconds.

When still looking towards the same surface, but less close. The framerate resolves itself and becomes regular again.

This happens with ANY surface of my model, I am a bit clueless in regards to what could cause this. Any ideas?

In attached image: y axis is time in ms, x axis is current frame. The dips in SSAO milliseconds are when I moved away from the surface, the peaks happen when I am very close to the surface.

ss+(2017-08-19+at+07.40.00).png

 

Edit: So I've done some more in-depth profiling with Nvidia nsight. So these are the facts from my results

Count of command buffers goes from 4 (far away from surface) to ~20(close to surface).

The command buffer duration in % goes from around ~30% to ~99%

Sometimes the CPU duration takes up to 0.03 to 0.016 milliseconds per frame while comparatively usually it takes around 0.002 milliseconds.

I am using a vertex shader which generates my full-screen quad and afterwards I do my SSAO calculations in my pixel shader, could this be a GPU driver bug? I'm a bit lost myself. It seems there could be a CPU/GPU resource stall. But why would the amount of command buffers be variable depending on distance from a surface?

 

 

Edit n2: Any resolution above 720p starts to have this issue, and I am fairly certain my SSAO is not that performance heavy it would crap itself at a bit higher resolutions.

 

Advertisement

This is common, you solve it by using mip maps for the depth buffer, so you can sample a larger area with less semples.

Do you have any sources for this? Not that I don't trust you, but I can't find anything on this on google.

Maybe this (did not read it): http://research.nvidia.com/publication/scalable-ambient-obscurance

However, what i mean is simple:

Close to camera means you need to sample a large area in screen space, so samples get spread in memory and also the sample count can increase (depending on algorithm).

If you have a mip map pyramid of the depth you can pick a higher mip map level so performance remains constant independent of distance.

 

Edit: Are you sure increasing command buffer count comes from SSAO? Makes no sense.

And I am not sure if the command buffer count comes from SSAO, but what I do know is that SSAO takes up most of my performance (as you can see in the graph) and in those frames command buffer counts increase as well.

 

Edit: And I think you are talking about cache misses from texture samples? And I don't really understand your mip map pyramid, I believe if you downsample a depth texture it does not really make sense anymore? Since it will linear interpolate between the values during downsampling?

 

Edit2: I lowered my samplerate and framerate does improve a lot, so I guess the amount of samples attributes to too much random memory access which causes cache misses as you stated :) 

Link to scrnshot: 21ad9b39ba.png

The cost of a texture sample depends whether you hit the cache or not, which depends on whether your sampling is coherent or not (e g. Do neighbouring pixels sample neighbouring texels). If your SSAO changes it's sampling radius based on the distance to the surface, then this is a predictable result. At long range, your pixels might be sampling a small 3x3 area of texels, which is quite predicable, but at near range perhaps you start sampling a 1000x1000 area of pixels (111k times larger), which is very incoherent and the cache suddenly can't help you any more.

These kinds of variable radius effects either need a way to reduce the size of the data set that they're sampling on, such as the mipmaps mentioned above (a hierarchical structure) or simply clamping your filter radius with "min".

9 hours ago, Mercesa said:

Edit: And I think you are talking about cache misses from texture samples? And I don't really understand your mip map pyramid, I believe if you downsample a depth texture it does not really make sense anymore? Since it will linear interpolate between the values during downsampling?

Unlike to shadow maps downsampling depth with interpolation should actually increase quality for SSAO as it prefilters (f. ex. VSM shadow maps also benefit from downsampling). You could even implement your own trilinear filtering by blending results from two mips, or use dithereing to avoid banding from switching mips... if the switch becomes visible at all.

Probably you should distribute your samples over multiple frames so you get high sample count and quality for free, similar to temporal aliasing. Should bring you down to 1-3 ms or something. High quality methods can use 4-5 ms, but IMO that's a real waste even on 1000$ GPUs :)

I guess the varying commandbuffer count could be caused by frustum / occlusion culling or NPCs running around?

 

This topic is closed to new replies.

Advertisement