Sign in to follow this  

my new ssao & some help

This topic is 3266 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello! I´m new here. I´ve recently been experimenting with screen space ambient occlusion but i was quite disappointed with the results, its very difficult to get it to look like in Crysis. So i tried to implement it myself without following the crytek method. I ended up creating this shader which implements a type of ssao i have dubbed coherent ssao (cssao for friends). It looks far better than crysis´s (or at least that´s what i think). Now i need help because i want to extend it to full ambient occlusion, (still in image space, but taking into account hidden geometry). I´ll post the shader and explain the method first. The method is quite simple: a convolution kernel which samples pixels from a depth buffer and a normal buffer, calculates occlusion for each pixel using mainly the angular difference between normals, and then modulates it using depth. Usually this would only produce a cartoon outline type shader because it would shade "outside" as well as "inside" creases, but then i apply a cosine difference check between normal and direction of sampling (simple dotproduct between normalized vectors, this is the "coherence" test), that returns >0 for inside creases and <0 for inside ones, so i only occlude concave creases. The resulting AO doesn´t need to be blurred to look good, doesn´t suffer from halos, can be used for local as well as global ao -local looks better ;)-, and is quite fast to compute. A 3x3 kernel is more than enough for a game. Now for the code (needs to be cleaned up): GLSL:
#define NUM_SAMPLES 8

uniform sampler2D som; //the depth buffer
uniform sampler2D normal; //the normal buffer

//noise producing function to reduce banding (got it from someone else´s shader):
float rand(vec2 co){

        return 0.5+(fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453))*0.5;

}


void main()
{    
float sum = 0.0;

float zFar = 80.0;
float zNear = 0.5;

float prof = texture2D(som, gl_TexCoord[0].st).x;
prof = zFar * zNear / (prof * (zFar - zNear) - zFar);  //linearize z sample

vec3 norm = normalize(vec3(texture2D(normal,gl_TexCoord[0].st).xyz)*2.0-vec3(1.0));

int hf = NUM_SAMPLES/2;

//calculate sampling rates:
float incx = (1.0/160.0)*8;//8 is the radius 
float incy = (1.0/120.0)*8;

for(int i=-hf; i < hf; i++){
      for(int j=-hf; j < hf; j++){
 
      vec2 coords = vec2(i*incx,j*incy)/prof;

      float prof2 = texture2D(som,gl_TexCoord[0].st+coords*rand(gl_TexCoord[0])).x;
      prof2 = zFar * zNear / (prof2 * (zFar - zNear) - zFar);  //linearize z sample

   
      if (prof2>prof){

           vec3 norm2 = normalize(vec3(texture2D(normal,gl_TexCoord[0].st+coords*rand(gl_TexCoord[0])).xyz)*2.0-vec3(1.0)); 
           
           //calculate approximate pixel distance:
           vec3 dist = vec3(coords,prof-prof2);

           //calculate normal and sampling direction coherence:
           float coherence = dot(normalize(-coords),normalize(vec2(norm2.xy)));

           //if there is coherence, calculate occlusion:
           if (coherence > 0){
              //approximate form factor:
              float pformfactor = 0.5*((1.0-dot(norm,norm2)))/(3.1416*pow(abs(length(dist*4)),2.0)+0.5);//4 is depthscale
              sum += clamp(pformfactor*2,0.0,1.0);//2 is ao intensity; 
           }
           
      }
   }
}

float occlusion = 1.0-(sum/NUM_SAMPLES);
gl_FragColor = vec4(occlusion ,occlusion ,occlusion ,1.0);
}













Now the question: I have a vague idea of using a second camera to capture the back faces of the geometry, or using a second pass with front culling and a reverse normal buffer to calculate occlusion from the hidden parts of the image, but i´ve tried to imagine it and write it on paper and i have a feeling that it won´t work. Any ideas? thanks and sorry for my weird english PD: normals are supplied in camera space and the zbuffer is not linear. images with 3x3 filter and radius=7: [Edited by - ArKano22 on December 8, 2008 5:51:31 PM]

Share this post


Link to post
Share on other sites
Another idea to increase performance would be to reduce kernel size (along with radius, which already shrinks) when depth increases, i´ll try to add it to the code.

Share this post


Link to post
Share on other sites
Quote:
Original post by n00body
Can we get a bigger version of the full-color shot? Thanks. :)

Yes, i had trimmed the size a bit in order to keep the post relatively small. Note that i´m multiplying the ao on top of the scene, so it can look weird in less illuminated areas. I will use in in the lighting equation once i complete the shader.
Here are another two screenshots:
with very local ao (radius = 2):

with more global ao (radius = 9). Contrast increased in photoshop:

Both images keep the kernel size as 3x3. More samples yield more quality. With this settings (k=3x3,redius=9) it runs at 190-210 fps in a 8800GT, in a moderately complex scene (terrain+skybox+sponza atrium+physics going on). Any ideas about the backface occlusion thing?

Thanks :)

Share this post


Link to post
Share on other sites
I was messing around with the shader, and i tried to convert it to a color bleeding shader(global illumination is too big to speak about here :P)

The results are quite nifty for a convolution, i think:


Settings are the same as with global ssao. The changes made to the shader are minimal, I will post it tomorrow. Hopefully i will find a way to merge both shaders into one, trying to use common computations only once (different kernel sizes for ao & color bleeding are a problem).

Still thinking about a way to take invisible geometry into account...

Share this post


Link to post
Share on other sites
Good job!

I am developing similar technique (called ISR - 'image-space radiosity') for my PhD thesis - I hope I will able to share my work soon.

For back faces you can try the second pass with inverted culling - it should give you more precise results.

Share this post


Link to post
Share on other sites
for the backgeometry, you'd need something like "deep shadow maps". order independent transparency needs that feature, too.

depth peeling, that was the name.

should help a bit to google around.

i really like your work. is it free to use? :)

Share this post


Link to post
Share on other sites
Quote:
i really like your work. is it free to use? :)


Yes, it is free to use. Just share the results if you achieve something better :). The depth peeling idea sounds promising, i had heard about it before but never really implemented it, or found a practical use for it aside of transparency. I´ll google for info.

Share this post


Link to post
Share on other sites
Here it is, some sort of screen space GI, with color bleeding & ambient occlusion. Only looks good if it is local enough, because things outside of the view are not used in calculations. That means that if a wall is red because of some large red object is in front of it, and then the object disappears from the view, the wall loses its color.

Despite that, it is very usable in realtime. And because most computations are made using only angular difference between normals (which doesn´t change with point of view position), the results are largely (not completely) view-independent. I´m quite pleased with the results, however the resulting shader code is very long.

Still has room for some optimizations. Now learning new techinques to couple it with deferred shading and soft shadows :D

Share this post


Link to post
Share on other sites
As for color bleeding SSAO, there are a couple of posts on this guy's blog

http://drkappa.blogspot.com/

And an interesting serie of comments on wolf's one:

http://diaryofagraphicsprogrammer.blogspot.com/2008/06/screen-space-global-illumination.html

Your implementations looks cool!

Share this post


Link to post
Share on other sites
Quote:
As for color bleeding SSAO, there are a couple of posts on this guy's blog

http://drkappa.blogspot.com/


His algorithm looks like mine, but it seems that it has some issues with halos and depth.

In wolf´s blog i found a paper about a GI implementation based on the horizon ssao by nvidia, but it needs raytracing to compute the normal component of gi, like the original algorithm :S. However it looks really cool, and doesn´t seem to have any halo issues. Gotta try that one. Thanks!!

Share this post


Link to post
Share on other sites
haven't looked that close at the code.. but how about sampling the directions of illumination, too, and then use them with the per pixel normal to have ambient bumpmapping? similar to the way hl2 does bumpmapping with the lightmaps.

this could help hiding bleeding / blurring issues, too, as it adds high frequency detail.

hm.. i'm motivated now... where's my xna project? :)

Share this post


Link to post
Share on other sites
Have you tried scaling the sampling radius against the current pixel distance from camera?

On paper to sample with a fixed radius (say 9 pixels) could result in severe artifacts when moving your camera around. 9 pixels at a far distance can easily turn into completely different pixels to get sampled by your kernel in subsequent frames.

Correct me if I'm wrong but IIRC on some implementations you pick up a "real" sphere as a reference, generate points lying on its surface and then you project back them in 2D to get a "distance-scaled" sample.

If this sounds heavier than necessary, as it probably is, you could just do a simple mul to scale your kernel size.

Share this post


Link to post
Share on other sites
Quote:

Have you tried scaling the sampling radius against the current pixel distance from camera?

On paper to sample with a fixed radius (say 9 pixels) could result in severe artifacts when moving your camera around. 9 pixels at a far distance can easily turn into completely different pixels to get sampled by your kernel in subsequent frames.

Correct me if I'm wrong but IIRC on some implementations you pick up a "real" sphere as a reference, generate points lying on its surface and then you project back them in 2D to get a "distance-scaled" sample.

If this sounds heavier than necessary, as it probably is, you could just do a simple mul to scale your kernel size.


Currently I only scale the radius with the depth. Since i am linearizing the depth buffer samples, a simple division seems to do the job. This leads to unnecesary oversampling for far away objects. Next step is to decrease the kernel size, as you say.

Right now there are no visible artifacts except for some noise in the sampling (which is used to hide banding). But I think i can get rid of most of the noise and banding :).

[Edited by - ArKano22 on December 8, 2008 4:30:14 PM]

Share this post


Link to post
Share on other sites
Well, here´s the finished code. It only needs to have some variables as uniforms instead of hard-coded values, but thats minor work.

This is only the pixel shader, the vertex shader is completely standard stuff.

/*
*CSSGI shader (Coherent Screen Space Global Illumination)
*This shader requires a depth pass and a normal map pass.
*/



#define NUM_SAMPLES 8

uniform sampler2D som;
uniform sampler2D normal;
uniform sampler2D color;

//noise producing function to eliminate banding (got it from someone else´s shader):
float rand(vec2 co){

return 0.5+(fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453))*0.5;

}

void main()
{
//calculate sampling rates:
float ratex = (1.0/800.0);
float ratey = (1.0/600.0);

//initialize occlusion sum and gi color:
float sum = 0.0;
vec3 fcolor = vec3(0,0,0);

//far and near clip planes:
float zFar = 80.0;
float zNear = 0.5;

//get depth at current pixel:
float prof = texture2D(som, gl_TexCoord[0].st).x;
//scale sample number with depth:
int samples = round(NUM_SAMPLES/(0.5+prof));
prof = zFar * zNear / (prof * (zFar - zNear) - zFar); //linearize z sample

//obtain normal and color at current pixel:
vec3 norm = normalize(vec3(texture2D(normal,gl_TexCoord[0].st).xyz)*2.0-vec3(1.0));
vec3 dcolor1 = texture2D(color, gl_TexCoord[0].st);

int hf = samples/2;

//calculate kernel steps:
float incx = ratex*30;//gi radius
float incy = ratey*30;

float incx2 = ratex*8;//ao radius
float incy2 = ratey*8;

//do the actual calculations:
for(int i=-hf; i < hf; i++){
for(int j=-hf; j < hf; j++){

if (i != 0 || j!= 0) {

vec2 coords = vec2(i*incx,j*incy)/prof;
vec2 coords2 = vec2(i*incx2,j*incy2)/prof;

float prof2 = texture2D(som,gl_TexCoord[0].st+coords*rand(gl_TexCoord[0])).x;
prof2 = zFar * zNear / (prof2 * (zFar - zNear) - zFar); //linearize z sample

float prof2g = texture2D(som,gl_TexCoord[0].st+coords2*rand(gl_TexCoord[0])).x;
prof2g = zFar * zNear / (prof2g * (zFar - zNear) - zFar); //linearize z sample

vec3 norm2g = normalize(vec3(texture2D(normal,gl_TexCoord[0].st+coords2*rand(gl_TexCoord[0])).xyz)*2.0-vec3(1.0));

vec3 dcolor2 = texture2D(color, gl_TexCoord[0].st+coords*rand(gl_TexCoord[0]));

//OCCLUSION:

//calculate approximate pixel distance:
vec3 dist2 = vec3(coords2,prof-prof2g);

//calculate normal and sampling direction coherence:
float coherence2 = dot(normalize(-coords2),normalize(vec2(norm2g.xy)));

//if there is coherence, calculate occlusion:
if (coherence2 > 0){
float pformfactor2 = 0.5*((1.0-dot(norm,norm2g)))/(3.1416*pow(abs(length(dist2*2)),2.0)+0.5);//el 4: depthscale
sum += clamp(pformfactor2*0.2,0.0,1.0);//ao intensity;
}

//COLOR BLEEDING:

if (length(dcolor2)>0.3){//color threshold
vec3 norm2 = normalize(vec3(texture2D(normal,gl_TexCoord[0].st+coords*rand(gl_TexCoord[0])).xyz)*2.0-vec3(1.0));

//calculate approximate pixel distance:
vec3 dist = vec3(coords,abs(prof-prof2));

//calculate normal and sampling direction coherence:
float coherence = dot(normalize(-coords),normalize(vec2(norm2.xy)));

//if there is coherence, calculate bleeding:
if (coherence > 0){
float pformfactor = ((1.0-dot(norm,norm2)))/(3.1416*pow(abs(length(dist*2)),2.0)+0.5);//el 4: depthscale
fcolor += dcolor2*(clamp(pformfactor,0.0,1.0));
}
}

}
}
}

vec3 bleeding = (fcolor/samples)*0.5;
float occlusion = 1.0-(sum/samples);
gl_FragColor = vec4(vec3(dcolor1*occlusion+bleeding*0.5),1.0);
}





There´s also a video of it in action at:
Youtube

The code is free to use and modify. I would like you to tell me you´re going to use it, though, so that i can see if its useful to people or not :)

Thanks everyone for the tips about depth peeling, kernel scaling (already implemented) and frontface culling, i´ll experiment a bit with it!

Share this post


Link to post
Share on other sites
Seriously, that looks amazing! It really gives each of the spheres a sense of weight in the scene. They look like they belong, and aren't just some arbitrary shape injected in to show off an effect. I think you may have just made dynamic GI accessible to the realtime graphics world. ;)

This will definitely simplify my life a lot. I already planned to have SSAO in my renderer project. Suddenly having that simplified, and having color bleeding as an added bonus, will save me a lot of work. I was originally afraid that I would have to settle for just SSAO, since most GI techniques I've seen require grids of cube-maps, SH coefficients, etc, and are either static or are a pain to update. However, the results of this are more than acceptable, considering how little extra complexity it will add.

Thanks man. :D

EDIT: Any chance of a downloadable higher quality video? =)

EDIT2: Curious, what hardware are you running this on? If the specs aren't too high, that would just be icing on the cake. :p

Share this post


Link to post
Share on other sites
Quote:
Original post by ArKano22
There´s also a video of it in action at: Youtube
That looks bloody awesome, the increase in realism is incredible. Thanks for sharing the implementation - some of the details escaped me last time I tried to implement SSAO.

Share this post


Link to post
Share on other sites
it looks great. one doesn't really note it at first (as it looks natural => no wow effect but just.. the way we're used to see it).. till the moment you turn it off..

awesome :)

can't wait to implement this.

Share this post


Link to post
Share on other sites
Quote:
Seriously, that looks amazing! It really gives each of the spheres a sense of weight in the scene. They look like they belong, and aren't just some arbitrary shape injected in to show off an effect. I think you may have just made dynamic GI accessible to the realtime graphics world. ;)


Thanks! I think it is useful to give an extra "bump" to the images, but real full-scale GI still has to be done through light probes or the mysterious Bunnel method (Danger Planet, anyone?)

EDIT: Has anyone tried to do this?: Put a cubemap with color & depth at the viewer´s position, average color of the cubemap, then obtain world positions from depth, and place there equally spaced deferred lights with the color of the cubemap to simulate the bleeding of gi? I think it would work...

Quote:

EDIT: Any chance of a downloadable higher quality video? =)

EDIT2: Curious, what hardware are you running this on? If the specs aren't too high, that would just be icing on the cake. :p


If i can find a way of compressing the video so that its size is reasonable and doesn´t look very bad, i´ll post it somewhere. The youtube video is a bit blurred and you can´t see that the bleeding color has a bit of noise (it is a tradeoff between banding and noise. With low radius, the gi looks very smooth so no noise needed)

My specs are: intel quad core, 2gb ram, nvidia 8800gt. I have tested it on a core2/8800gts and runs equally smooth. However it still has to be tested in a game with all stuff going on, AI, physics, terrain rendering, water, etc.

Share this post


Link to post
Share on other sites
Hmm. That has me concerned then. If anyone gets this up and running on a single core CPU, SM 3.0 GPU, please let me know. Also, can you tell me what kind of frame-rates you're getting with this trick?

Share this post


Link to post
Share on other sites
As Matt said, the effect is a shader (completely executed in gpu) so the cpu doesn´t matter. I gave its specs only for complete info. In fact my engine doesn´t use multithreading so its using only 1 cpu. The other 3 are at 5-10% while the one running the app is at 80-90%.

Without gi the framerate is 95-96 fps and 90-92 fps with gi enabled (3x3). So you can see it has very little impact on performance. Bigger kernels pretty much kill the framerate, though:
5x5 kernel: 50-55 fps
7x7 kernel: 19-20 fps

So its better to stick with a fixed kernel size and mess around with the radius and noise function.

EDIT: The resolution (important!) is 800x600. However, since gi is pretty low-frequency, you can calculate it with a much smaller resolution (320x240) and then scale the textured quad to fullscreen. That should give good results with higher resolutions, and you get an extra blur pass over it if you use linear texture filtering.

[Edited by - ArKano22 on December 9, 2008 10:41:55 AM]

Share this post


Link to post
Share on other sites

This topic is 3266 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this