# SSAO and skybox artifact

This topic is 2045 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm running into an ugly artifact with my SSAO where when geometry against the skybox is being occluded by the skybox and vice versa, I was able to fix the skybox being occluded by simply skipping the ssao calculation as soon as I know that the pixel is a skybox. I still have the problem with the skybox occluding my geometry creating an ugly dotted line, the skybox is black in the gbuffer and at depth 1.0 but after trying uselessly to skip sampling the skybox using a step function, I decided to ask here. Anyway here is my glsl shader code based heavily on code from here:

 uniform sampler2D depth_texture; uniform sampler2D color_texture; uniform sampler2D normal_texture; uniform float scr_w; uniform float scr_h; uniform vec3 pSphere[10] = vec3[]( vec3(-0.010735935, 0.01647018, 0.0062425877), vec3(-0.06533369, 0.3647007, -0.13746321), vec3(-0.6539235, -0.016726388, -0.53000957), vec3(0.40958285, 0.0052428036, -0.5591124), vec3(-0.1465366, 0.09899267, 0.15571679), vec3(-0.44122112, -0.5458797, 0.04912532), vec3(0.03755566, -0.10961345, -0.33040273), vec3(0.019100213, 0.29652783, 0.066237666), vec3(0.8765323, 0.011236004, 0.28265962), vec3(0.29264435, -0.40794238, 0.15964167)); varying vec2 vTexCoord; #define STRENGTH 0.09 #define FALLOFF 0.0 //0.00002 #define RAD 0.006 #define SAMPLES 10 #define INVSAMPLES 1.0/SAMPLES vec4 height_normal(in vec2 texcoord) { vec4 normaltexel; normaltexel.rgb = (texture2D(normal_texture, texcoord).xyz * 2.0) - vec3(1.0); normaltexel.a = texture2D(depth_texture, texcoord).x; return normaltexel; } void main(void) { // get a random normal vec3 fres = normalize((texture2D(color_texture, vTexCoord * (scr_w / 64)).xyz * 2.0) - vec3(1.0)); //grab depth and a normal vector vec4 currentPixelSample = height_normal(vTexCoord); vec3 samplepos = vec3(vTexCoord.xy, currentPixelSample.a); float blacklevel = 0.0; float depthDiff; vec4 occluderFragment; vec3 ray; if(length(currentPixelSample.xyz) <= 1.0) //dont calculate ssao because the pixel is in skybox { for(int i = 0; i < SAMPLES; ++i) { // trace a ray from a random normal to a random position ray = (RAD / samplepos.z) * reflect(pSphere, fres); occluderFragment = height_normal(samplepos.xy + (sign(dot(ray, currentPixelSample.xyz)) * ray.xy)); //get the position of the occluder depthDiff = samplepos.z - occluderFragment.a; blacklevel += step(FALLOFF, depthDiff) * (1.0 - dot(currentPixelSample.xyz, occluderFragment.xyz)) * (1.0 - smoothstep(FALLOFF, STRENGTH, depthDiff)); } } // output the result gl_FragColor = vec4(vec3(1.0 - (blacklevel * INVSAMPLES)), 1.0); } 

attached is a picture of my problem the the offending pixels circled. Anyone know how to fix this?

##### Share on other sites
if the skybox is exactly at 1.0 (zfar) by writing gl_FragDepth = 1.0 in the atmosphere shader,
you can avoid it by using a branch: if (depth < 0.99) { do stuff }
It will reduce performance of your SSAO shader, but i think it will work

in your example this could be:
if (normaltexel.a < 0.999)
{
do ssao
} Edited by Kaptein

##### Share on other sites
When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.

##### Share on other sites
Do a depth bound test (can be setup on engine side, no shader if branches) with max range 0.99999f, this will ensure you're not computing SSAO on the sky (which should be at 1.0). I'm not familiar with OpenGL so I don't know the setup for a depth-bound test (but I'm sure it can be done), but you can do this on CPU side in D3D.

##### Share on other sites

ALL code is executed, including ALL branches, all function calls, etc. (...) This is how all graphics cards work, AMD, NVIDIA, etc.

I'm not quite sure where you base your information on, but almost all graphics cards from the last 3 or 4 years work this way. Here's a quote from NVidia:
Any flow control instruction (if, switch, do, for, while) can significantly affect
the instruction throughput by causing threads of the same warp to diverge; that is, to
follow different execution paths. If this happens, the different execution paths must be
serialized, since all of the threads of a warp share a program counter; this increases the
total number of instructions executed for this warp. When all the different execution
paths have completed, the threads converge back to the same execution path.
To obtain best performance in cases where the control flow depends on the thread ID,
the controlling condition should be written so as to minimize the number of divergent
warps.
This is possible because the distribution of the warps across the block is deterministic as
mentioned in SIMT Architecture of the CUDA C Programming Guide. A trivial example is
when the controlling condition depends only on (threadIdx / WSIZE) where WSIZE is
the warp size.
In this case, no warp diverges because the controlling condition is perfectly aligned with
the warps.[/quote]

Only when serialization is needed, which is when threads inside a warp diverge into different branches, the different execution paths get serialized. Edited by CryZe

##### Share on other sites
Yes, CryZe is mostly right, although you forgot one other potential source of slowdown from adding branches - register count. The compiler needs to statically determine the worst-case number of temporary registers needed for intermediate computation, accounting for any code path. If the alternate path introduced by a branch causes the number of registers needed to increase, then the total register count of the shader can be higher (even when that branch is never taken). When running the shader, each instance (thread or similar construct, depending on which HW vendor or API terminology you're using), needs that many registers. Basically, shaders that use more threads get fewer instances running in parallel.

##### Share on other sites

A warp consists of either 16 or 32 threads grouped together.

I think you mean "32 or 64"

##### Share on other sites

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.

This is completely wrong, even for relatively old GPU's (even the first-gen DX9 GPU's supported branching on shader constants, although in certain cases it was implemented through driver-level shenanigans). I'm not sure how you could even come to such a conclusion, considering it's really easy to set up a test case that shows otherwise. Edited by MJP

##### Share on other sites
So I take it that the if statement can stay.

1. 1
Rutin
19
2. 2
3. 3
JoeJ
16
4. 4
5. 5

• 27
• 20
• 13
• 13
• 17
• ### Forum Statistics

• Total Topics
631700
• Total Posts
3001790
×