Hello.
TL;DR: I have an SSAO renderer that is too slow for (EDIT: Wow, I messed up the TL;DR... xD)
EDIT: I have written a summary of my findings further down.
I have a home-made SSAO shader that I currently run at a reduced resolution and then upscale using a bilateral upsampling shader to maintain the sharpness of edges. The shader relies on normals calculated from the depth buffer to reduce the noise from normal mapping, which can cause flickering due to undersampling at lower SSAO resolutions.
We recently added a significant amount of vegetation to the game, and sadly the SSAO looks horrible on it. They depth buffer essentially becomes extremely noisy, giving worthless normals. During the bilateral upsampling part, the normal doesn't match and there's not enough resolution to give the vegetation a good SSAO effect since there are so many different depth values that there simply doesn't exist a good match in the low resolution SSAO texture. The result is a noisy aliased mess that flickers extremely during motion. Even the temporal supersampling I have can't do much to reduce the impact of the effect.
My SSAO renderer has 4 main passes:
1. In the first pass I pack in the normal and the linear depth into a low-resolution single GL_RGBA16F texture, usually at half resolution but this can be modified by a setting. The normal is calculated by checking the depth of the 5 pixels in a cross shape and using a cross product and stored in RGB and the linear depth of each pixel is packed into the alpha channel. The cost of this pass is offset by the savings in the next two passes.
2. In the second pass, the SSAO value is calculated and stored in a GL_R8 texture. See the shader code below for details; it's pretty straight-forward.
3. In the third pass, the SSAO value is blurred using a depth and normal aware separable 9x9 blur.This is applied twice. This pass benefits a lot from the normal+depth packed texture from pass 1, completely offsetting the cost of generating it.
4. As part of a big shader that handles a lot of things (run at full resolution), the 4 closest SSAO values are read and a weighted sum of them is chosen based on the depth and normal of the full resolution pixel being processed.
At 1920x1080, my GTX 770 gets the following performance numbers:
- At half resolution:
The additional cost of doing this at full resolution is simply way too high. My goal is to get this running at 1-2ms at full resolution. Reducing the blur passes to 1 instead of two would save around 0.6ms, while getting rid of the bilateral upsample would save ~0.15ms. At full res, I wouldn't have to calculate the normal from the depth buffer as well, which would save another fraction of a ms too. That leaves me at around 2.9ms, still 1-2ms too high.
Here is my shader: http://pastebin.com/xYFmbEP3
The blur and pack shaders are essentially as fast as the can be.
I am looking for other SSAO algorithms that are more efficient/cache friendly, better sampling patterns so I can reduce the noise, optimizations to my current code and the likes.