Jump to content

  • Log In with Google      Sign In   
  • Create Account


SSAO and skybox artifact


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 ic0de   Members   -  Reputation: 808

Like
0Likes
Like

Posted 05 December 2012 - 09:59 PM

I'm running into an ugly artifact with my SSAO where when geometry against the skybox is being occluded by the skybox and vice versa, I was able to fix the skybox being occluded by simply skipping the ssao calculation as soon as I know that the pixel is a skybox. I still have the problem with the skybox occluding my geometry creating an ugly dotted line, the skybox is black in the gbuffer and at depth 1.0 but after trying uselessly to skip sampling the skybox using a step function, I decided to ask here. Anyway here is my glsl shader code based heavily on code from here:


uniform sampler2D depth_texture;
uniform sampler2D color_texture;
uniform sampler2D normal_texture;
uniform float scr_w;
uniform float scr_h;

uniform vec3 pSphere[10] = vec3[]( vec3(-0.010735935, 0.01647018, 0.0062425877),
		 vec3(-0.06533369, 0.3647007, -0.13746321),
		 vec3(-0.6539235, -0.016726388, -0.53000957),
		 vec3(0.40958285, 0.0052428036, -0.5591124),
		 vec3(-0.1465366, 0.09899267, 0.15571679),
		 vec3(-0.44122112, -0.5458797, 0.04912532),
		 vec3(0.03755566, -0.10961345, -0.33040273),
		 vec3(0.019100213, 0.29652783, 0.066237666),
		 vec3(0.8765323, 0.011236004, 0.28265962),
		 vec3(0.29264435, -0.40794238, 0.15964167));
varying vec2 vTexCoord;
#define STRENGTH 0.09
#define FALLOFF 0.0 //0.00002
#define RAD 0.006
#define SAMPLES 10
#define INVSAMPLES 1.0/SAMPLES
vec4 height_normal(in vec2 texcoord)
{
vec4 normaltexel;
normaltexel.rgb = (texture2D(normal_texture, texcoord).xyz * 2.0) - vec3(1.0);
normaltexel.a = texture2D(depth_texture, texcoord).x;
 
return normaltexel;
}
void main(void)
{
// get a random normal
vec3 fres = normalize((texture2D(color_texture, vTexCoord * (scr_w / 64)).xyz * 2.0) - vec3(1.0));

//grab depth and a normal vector
vec4 currentPixelSample = height_normal(vTexCoord);

vec3 samplepos = vec3(vTexCoord.xy, currentPixelSample.a);

float blacklevel = 0.0;

float depthDiff;
vec4 occluderFragment;
vec3 ray;

if(length(currentPixelSample.xyz) <= 1.0) //dont calculate ssao because the pixel is in skybox
{
  for(int i = 0; i < SAMPLES; ++i)
  {
  // trace a ray from a random normal to a random position
   ray = (RAD / samplepos.z) * reflect(pSphere[i], fres);
  
   occluderFragment = height_normal(samplepos.xy + (sign(dot(ray, currentPixelSample.xyz)) * ray.xy)); //get the position of the occluder
  
   depthDiff = samplepos.z - occluderFragment.a;
  
   blacklevel += step(FALLOFF, depthDiff) * (1.0 - dot(currentPixelSample.xyz, occluderFragment.xyz)) * (1.0 - smoothstep(FALLOFF, STRENGTH, depthDiff));
  }
}
 
// output the result
gl_FragColor = vec4(vec3(1.0 - (blacklevel * INVSAMPLES)), 1.0);

}

attached is a picture of my problem the the offending pixels circled. Anyone know how to fix this?

Attached Thumbnails

  • marked up ssao.png

you know you program too much when you start ending sentences with semicolons;


Sponsor:

#2 Kaptein   Prime Members   -  Reputation: 1844

Like
1Likes
Like

Posted 06 December 2012 - 03:11 PM

if the skybox is exactly at 1.0 (zfar) by writing gl_FragDepth = 1.0 in the atmosphere shader,
you can avoid it by using a branch: if (depth < 0.99) { do stuff }
It will reduce performance of your SSAO shader, but i think it will work :)

in your example this could be:
if (normaltexel.a < 0.999)
{
do ssao
}

Edited by Kaptein, 06 December 2012 - 03:13 PM.


#3 CryZe   Members   -  Reputation: 768

Like
5Likes
Like

Posted 07 December 2012 - 02:15 AM

It will reduce performance of your SSAO shader

It actually will improve the performance of his shader. Here are the rules for if's in shaders:
- If's get compiled away if they can be evaluated at compile time and thus don't reduce your performance
- If's don't reduce your performance if they are using values from a constant buffer (Except the additional instructions for checking the condition)
- If's don't reduce your performance if they don't have a second code path like you would by having else. (Except the additional instructions for checking the condition)
- If's don't reduce your performance if the whole warp chooses the same code path. A warp consists of either 16 or 32 threads grouped together. So not even a whole thread group / block which is diverging into different code paths might negatively effect the performance, if the warps themselves only choose one code path. (Except the additional instructions for checking the condition)
- When none of these conditions are met, your if's will reduce the performance.

In his case, there's only one code path. So either a warp could take it or it doesn't. If all the threads inside the warp are working on pixels which are associated with the sky, the whole warp actually skips the whole code inside the if, which results in a performance increase.

Edited by CryZe, 07 December 2012 - 02:17 AM.


#4 smasherprog   Members   -  Reputation: 428

Like
-2Likes
Like

Posted 07 December 2012 - 09:07 AM

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.

#5 Styves   Members   -  Reputation: 950

Like
1Likes
Like

Posted 07 December 2012 - 10:48 AM

Do a depth bound test (can be setup on engine side, no shader if branches) with max range 0.99999f, this will ensure you're not computing SSAO on the sky (which should be at 1.0). I'm not familiar with OpenGL so I don't know the setup for a depth-bound test (but I'm sure it can be done), but you can do this on CPU side in D3D.

#6 CryZe   Members   -  Reputation: 768

Like
3Likes
Like

Posted 07 December 2012 - 03:18 PM

ALL code is executed, including ALL branches, all function calls, etc. (...) This is how all graphics cards work, AMD, NVIDIA, etc.


I'm not quite sure where you base your information on, but almost all graphics cards from the last 3 or 4 years work this way. Here's a quote from NVidia:

Any flow control instruction (if, switch, do, for, while) can significantly affect
the instruction throughput by causing threads of the same warp to diverge; that is, to
follow different execution paths. If this happens, the different execution paths must be
serialized, since all of the threads of a warp share a program counter; this increases the
total number of instructions executed for this warp. When all the different execution
paths have completed, the threads converge back to the same execution path.
To obtain best performance in cases where the control flow depends on the thread ID,
the controlling condition should be written so as to minimize the number of divergent
warps.
This is possible because the distribution of the warps across the block is deterministic as
mentioned in SIMT Architecture of the CUDA C Programming Guide. A trivial example is
when the controlling condition depends only on (threadIdx / WSIZE) where WSIZE is
the warp size.
In this case, no warp diverges because the controlling condition is perfectly aligned with
the warps.


Only when serialization is needed, which is when threads inside a warp diverge into different branches, the different execution paths get serialized.

Edited by CryZe, 07 December 2012 - 03:20 PM.


#7 osmanb   Crossbones+   -  Reputation: 1458

Like
3Likes
Like

Posted 07 December 2012 - 07:33 PM

Yes, CryZe is mostly right, although you forgot one other potential source of slowdown from adding branches - register count. The compiler needs to statically determine the worst-case number of temporary registers needed for intermediate computation, accounting for any code path. If the alternate path introduced by a branch causes the number of registers needed to increase, then the total register count of the shader can be higher (even when that branch is never taken). When running the shader, each instance (thread or similar construct, depending on which HW vendor or API terminology you're using), needs that many registers. Basically, shaders that use more threads get fewer instances running in parallel.

#8 MJP   Moderators   -  Reputation: 10243

Like
0Likes
Like

Posted 07 December 2012 - 08:39 PM

A warp consists of either 16 or 32 threads grouped together.


I think you mean "32 or 64" Posted Image

#9 MJP   Moderators   -  Reputation: 10243

Like
2Likes
Like

Posted 07 December 2012 - 08:45 PM

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.


This is completely wrong, even for relatively old GPU's (even the first-gen DX9 GPU's supported branching on shader constants, although in certain cases it was implemented through driver-level shenanigans). I'm not sure how you could even come to such a conclusion, considering it's really easy to set up a test case that shows otherwise.

Edited by MJP, 07 December 2012 - 08:46 PM.


#10 ic0de   Members   -  Reputation: 808

Like
0Likes
Like

Posted 07 December 2012 - 10:28 PM

So I take it that the if statement can stay.

you know you program too much when you start ending sentences with semicolons;


#11 CryZe   Members   -  Reputation: 768

Like
0Likes
Like

Posted 08 December 2012 - 07:04 AM


A warp consists of either 16 or 32 threads grouped together.

I think you mean "32 or 64" Posted Image


I thought a Wavefront on AMDs architecture consists of 16 execution units. Or am I wrong? (I just used warp as a general term, because I like it more :D)

Edited by CryZe, 08 December 2012 - 07:04 AM.


#12 MJP   Moderators   -  Reputation: 10243

Like
1Likes
Like

Posted 08 December 2012 - 01:49 PM



A warp consists of either 16 or 32 threads grouped together.

I think you mean "32 or 64" Posted Image


I thought a Wavefront on AMDs architecture consists of 16 execution units. Or am I wrong? (I just used warp as a general term, because I like it more Posted Image)


Nah there's 64 threads in a wavefront. In their latest architecture (GCN) the SIMDs are 16-wide, but they execute each instruction 4 times to complete it for the entire wavefront (so a single-cycle instruction actually takes 4 cycles to execute).

Edited by MJP, 08 December 2012 - 01:50 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS