SSAO and skybox artifact

Graphics and GPU Programming Programming

Started by ic0de December 06, 2012 03:59 AM

10 comments, last by MJP 11 years, 4 months ago

1,014

Author

December 06, 2012 03:59 AM

I'm running into an ugly artifact with my SSAO where when geometry against the skybox is being occluded by the skybox and vice versa, I was able to fix the skybox being occluded by simply skipping the ssao calculation as soon as I know that the pixel is a skybox. I still have the problem with the skybox occluding my geometry creating an ugly dotted line, the skybox is black in the gbuffer and at depth 1.0 but after trying uselessly to skip sampling the skybox using a step function, I decided to ask here. Anyway here is my glsl shader code based heavily on code from here:





uniform sampler2D depth_texture;

uniform sampler2D color_texture;

uniform sampler2D normal_texture;

uniform float scr_w;

uniform float scr_h;



uniform vec3 pSphere[10] = vec3[]( vec3(-0.010735935, 0.01647018, 0.0062425877),

		 vec3(-0.06533369, 0.3647007, -0.13746321),

		 vec3(-0.6539235, -0.016726388, -0.53000957),

		 vec3(0.40958285, 0.0052428036, -0.5591124),

		 vec3(-0.1465366, 0.09899267, 0.15571679),

		 vec3(-0.44122112, -0.5458797, 0.04912532),

		 vec3(0.03755566, -0.10961345, -0.33040273),

		 vec3(0.019100213, 0.29652783, 0.066237666),

		 vec3(0.8765323, 0.011236004, 0.28265962),

		 vec3(0.29264435, -0.40794238, 0.15964167));

varying vec2 vTexCoord;

#define STRENGTH 0.09

#define FALLOFF 0.0 //0.00002

#define RAD 0.006

#define SAMPLES 10

#define INVSAMPLES 1.0/SAMPLES

vec4 height_normal(in vec2 texcoord)

{

vec4 normaltexel;

normaltexel.rgb = (texture2D(normal_texture, texcoord).xyz * 2.0) - vec3(1.0);

normaltexel.a = texture2D(depth_texture, texcoord).x;



return normaltexel;

}

void main(void)

{

// get a random normal

vec3 fres = normalize((texture2D(color_texture, vTexCoord * (scr_w / 64)).xyz * 2.0) - vec3(1.0));



//grab depth and a normal vector

vec4 currentPixelSample = height_normal(vTexCoord);



vec3 samplepos = vec3(vTexCoord.xy, currentPixelSample.a);



float blacklevel = 0.0;



float depthDiff;

vec4 occluderFragment;

vec3 ray;



if(length(currentPixelSample.xyz) <= 1.0) //dont calculate ssao because the pixel is in skybox

{

  for(int i = 0; i < SAMPLES; ++i)

  {

  // trace a ray from a random normal to a random position

   ray = (RAD / samplepos.z) * reflect(pSphere, fres);



   occluderFragment = height_normal(samplepos.xy + (sign(dot(ray, currentPixelSample.xyz)) * ray.xy)); //get the position of the occluder



   depthDiff = samplepos.z - occluderFragment.a;



   blacklevel += step(FALLOFF, depthDiff) * (1.0 - dot(currentPixelSample.xyz, occluderFragment.xyz)) * (1.0 - smoothstep(FALLOFF, STRENGTH, depthDiff));

  }

}



// output the result

gl_FragColor = vec4(vec3(1.0 - (blacklevel * INVSAMPLES)), 1.0);



}

attached is a picture of my problem the the offending pixels circled. Anyone know how to fix this?

Kaptein

2,226

December 06, 2012 09:11 PM

if the skybox is exactly at 1.0 (zfar) by writing gl_FragDepth = 1.0 in the atmosphere shader,
you can avoid it by using a branch: if (depth < 0.99) { do stuff }
It will reduce performance of your SSAO shader, but i think it will work

in your example this could be:
if (normaltexel.a < 0.999)
{
do ssao
}

CryZe

773

December 07, 2012 08:15 AM

It will reduce performance of your SSAO shader

It actually will improve the performance of his shader. Here are the rules for if's in shaders:
- If's get compiled away if they can be evaluated at compile time and thus don't reduce your performance
- If's don't reduce your performance if they are using values from a constant buffer (Except the additional instructions for checking the condition)
- If's don't reduce your performance if they don't have a second code path like you would by having else. (Except the additional instructions for checking the condition)
- If's don't reduce your performance if the whole warp chooses the same code path. A warp consists of either 16 or 32 threads grouped together. So not even a whole thread group / block which is diverging into different code paths might negatively effect the performance, if the warps themselves only choose one code path. (Except the additional instructions for checking the condition)
- When none of these conditions are met, your if's will reduce the performance.

In his case, there's only one code path. So either a warp could take it or it doesn't. If all the threads inside the warp are working on pixels which are associated with the sky, the whole warp actually skips the whole code inside the if, which results in a performance increase.

smasherprog

570

December 07, 2012 03:07 PM

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.

Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.

Styves

1,811

December 07, 2012 04:48 PM

Do a depth bound test (can be setup on engine side, no shader if branches) with max range 0.99999f, this will ensure you're not computing SSAO on the sky (which should be at 1.0). I'm not familiar with OpenGL so I don't know the setup for a depth-bound test (but I'm sure it can be done), but you can do this on CPU side in D3D.

CryZe

773

December 07, 2012 09:18 PM

ALL code is executed, including ALL branches, all function calls, etc. (...) This is how all graphics cards work, AMD, NVIDIA, etc.

I'm not quite sure where you base your information on, but almost all graphics cards from the last 3 or 4 years work this way. Here's a quote from NVidia:

Any flow control instruction (if, switch, do, for, while) can significantly affect
the instruction throughput by causing threads of the same warp to diverge; that is, to
follow different execution paths. If this happens, the different execution paths must be
serialized, since all of the threads of a warp share a program counter; this increases the
total number of instructions executed for this warp. When all the different execution
paths have completed, the threads converge back to the same execution path.
To obtain best performance in cases where the control flow depends on the thread ID,
the controlling condition should be written so as to minimize the number of divergent
warps.
This is possible because the distribution of the warps across the block is deterministic as
mentioned in SIMT Architecture of the CUDA C Programming Guide. A trivial example is
when the controlling condition depends only on (threadIdx / WSIZE) where WSIZE is
the warp size.
In this case, no warp diverges because the controlling condition is perfectly aligned with
the warps.[/quote]

Only when serialization is needed, which is when threads inside a warp diverge into different branches, the different execution paths get serialized.

osmanb

2,082

December 08, 2012 01:33 AM

Yes, CryZe is mostly right, although you forgot one other potential source of slowdown from adding branches - register count. The compiler needs to statically determine the worst-case number of temporary registers needed for intermediate computation, accounting for any code path. If the alternate path introduced by a branch causes the number of registers needed to increase, then the total register count of the shader can be higher (even when that branch is never taken). When running the shader, each instance (thread or similar construct, depending on which HW vendor or API terminology you're using), needs that many registers. Basically, shaders that use more threads get fewer instances running in parallel.

MJP

20,295

December 08, 2012 02:39 AM

A warp consists of either 16 or 32 threads grouped together.

I think you mean "32 or 64"

The Blog | The Book

MJP

20,295

December 08, 2012 02:45 AM

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.

This is completely wrong, even for relatively old GPU's (even the first-gen DX9 GPU's supported branching on shader constants, although in certain cases it was implemented through driver-level shenanigans). I'm not sure how you could even come to such a conclusion, considering it's really easy to set up a test case that shows otherwise.

The Blog | The Book

ic0de

1,014

Author

December 08, 2012 04:28 AM

So I take it that the if statement can stay.

SSAO and skybox artifact

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

SSAO and skybox artifact

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines