SSAO and skybox artifact

Started by
10 comments, last by MJP 11 years, 4 months ago
I'm running into an ugly artifact with my SSAO where when geometry against the skybox is being occluded by the skybox and vice versa, I was able to fix the skybox being occluded by simply skipping the ssao calculation as soon as I know that the pixel is a skybox. I still have the problem with the skybox occluding my geometry creating an ugly dotted line, the skybox is black in the gbuffer and at depth 1.0 but after trying uselessly to skip sampling the skybox using a step function, I decided to ask here. Anyway here is my glsl shader code based heavily on code from here:



uniform sampler2D depth_texture;
uniform sampler2D color_texture;
uniform sampler2D normal_texture;
uniform float scr_w;
uniform float scr_h;

uniform vec3 pSphere[10] = vec3[]( vec3(-0.010735935, 0.01647018, 0.0062425877),
vec3(-0.06533369, 0.3647007, -0.13746321),
vec3(-0.6539235, -0.016726388, -0.53000957),
vec3(0.40958285, 0.0052428036, -0.5591124),
vec3(-0.1465366, 0.09899267, 0.15571679),
vec3(-0.44122112, -0.5458797, 0.04912532),
vec3(0.03755566, -0.10961345, -0.33040273),
vec3(0.019100213, 0.29652783, 0.066237666),
vec3(0.8765323, 0.011236004, 0.28265962),
vec3(0.29264435, -0.40794238, 0.15964167));
varying vec2 vTexCoord;
#define STRENGTH 0.09
#define FALLOFF 0.0 //0.00002
#define RAD 0.006
#define SAMPLES 10
#define INVSAMPLES 1.0/SAMPLES
vec4 height_normal(in vec2 texcoord)
{
vec4 normaltexel;
normaltexel.rgb = (texture2D(normal_texture, texcoord).xyz * 2.0) - vec3(1.0);
normaltexel.a = texture2D(depth_texture, texcoord).x;

return normaltexel;
}
void main(void)
{
// get a random normal
vec3 fres = normalize((texture2D(color_texture, vTexCoord * (scr_w / 64)).xyz * 2.0) - vec3(1.0));

//grab depth and a normal vector
vec4 currentPixelSample = height_normal(vTexCoord);

vec3 samplepos = vec3(vTexCoord.xy, currentPixelSample.a);

float blacklevel = 0.0;

float depthDiff;
vec4 occluderFragment;
vec3 ray;

if(length(currentPixelSample.xyz) <= 1.0) //dont calculate ssao because the pixel is in skybox
{
for(int i = 0; i < SAMPLES; ++i)
{
// trace a ray from a random normal to a random position
ray = (RAD / samplepos.z) * reflect(pSphere, fres);

occluderFragment = height_normal(samplepos.xy + (sign(dot(ray, currentPixelSample.xyz)) * ray.xy)); //get the position of the occluder

depthDiff = samplepos.z - occluderFragment.a;

blacklevel += step(FALLOFF, depthDiff) * (1.0 - dot(currentPixelSample.xyz, occluderFragment.xyz)) * (1.0 - smoothstep(FALLOFF, STRENGTH, depthDiff));
}
}

// output the result
gl_FragColor = vec4(vec3(1.0 - (blacklevel * INVSAMPLES)), 1.0);

}


attached is a picture of my problem the the offending pixels circled. Anyone know how to fix this?
Advertisement
if the skybox is exactly at 1.0 (zfar) by writing gl_FragDepth = 1.0 in the atmosphere shader,
you can avoid it by using a branch: if (depth < 0.99) { do stuff }
It will reduce performance of your SSAO shader, but i think it will work :)

in your example this could be:
if (normaltexel.a < 0.999)
{
do ssao
}

It will reduce performance of your SSAO shader

It actually will improve the performance of his shader. Here are the rules for if's in shaders:
- If's get compiled away if they can be evaluated at compile time and thus don't reduce your performance
- If's don't reduce your performance if they are using values from a constant buffer (Except the additional instructions for checking the condition)
- If's don't reduce your performance if they don't have a second code path like you would by having else. (Except the additional instructions for checking the condition)
- If's don't reduce your performance if the whole warp chooses the same code path. A warp consists of either 16 or 32 threads grouped together. So not even a whole thread group / block which is diverging into different code paths might negatively effect the performance, if the warps themselves only choose one code path. (Except the additional instructions for checking the condition)
- When none of these conditions are met, your if's will reduce the performance.

In his case, there's only one code path. So either a warp could take it or it doesn't. If all the threads inside the warp are working on pixels which are associated with the sky, the whole warp actually skips the whole code inside the if, which results in a performance increase.
When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
Do a depth bound test (can be setup on engine side, no shader if branches) with max range 0.99999f, this will ensure you're not computing SSAO on the sky (which should be at 1.0). I'm not familiar with OpenGL so I don't know the setup for a depth-bound test (but I'm sure it can be done), but you can do this on CPU side in D3D.

ALL code is executed, including ALL branches, all function calls, etc. (...) This is how all graphics cards work, AMD, NVIDIA, etc.


I'm not quite sure where you base your information on, but almost all graphics cards from the last 3 or 4 years work this way. Here's a quote from NVidia:
Any flow control instruction (if, switch, do, for, while) can significantly affect
the instruction throughput by causing threads of the same warp to diverge; that is, to
follow different execution paths. If this happens, the different execution paths must be
serialized, since all of the threads of a warp share a program counter; this increases the
total number of instructions executed for this warp. When all the different execution
paths have completed, the threads converge back to the same execution path.
To obtain best performance in cases where the control flow depends on the thread ID,
the controlling condition should be written so as to minimize the number of divergent
warps.
This is possible because the distribution of the warps across the block is deterministic as
mentioned in SIMT Architecture of the CUDA C Programming Guide. A trivial example is
when the controlling condition depends only on (threadIdx / WSIZE) where WSIZE is
the warp size.
In this case, no warp diverges because the controlling condition is perfectly aligned with
the warps.[/quote]

Only when serialization is needed, which is when threads inside a warp diverge into different branches, the different execution paths get serialized.
Yes, CryZe is mostly right, although you forgot one other potential source of slowdown from adding branches - register count. The compiler needs to statically determine the worst-case number of temporary registers needed for intermediate computation, accounting for any code path. If the alternate path introduced by a branch causes the number of registers needed to increase, then the total register count of the shader can be higher (even when that branch is never taken). When running the shader, each instance (thread or similar construct, depending on which HW vendor or API terminology you're using), needs that many registers. Basically, shaders that use more threads get fewer instances running in parallel.

A warp consists of either 16 or 32 threads grouped together.


I think you mean "32 or 64" tongue.png

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.


This is completely wrong, even for relatively old GPU's (even the first-gen DX9 GPU's supported branching on shader constants, although in certain cases it was implemented through driver-level shenanigans). I'm not sure how you could even come to such a conclusion, considering it's really easy to set up a test case that shows otherwise.
So I take it that the if statement can stay.

This topic is closed to new replies.

Advertisement