the best ssao ive seen

Started by
237 comments, last by Paksas 12 years, 5 months ago
Sorry if the topic title is a bit pretentious but that´s what i think when i look at it :D. I´ve been struggling quite some time to get a good looking ssao and this is the end of my quest! I´ve implemented the famous bunnell GI disk method (http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter14.html) but in screen space (so in the end it´s only form factors) and the results speak for themselves: With contrast increased, so you can see artifacts & granularity: None of the screens have any blurring applied, just ssao output. The banding you can see at some places is due to precision problems in my gbuffer, bear with me. Otherwise, it looks incredibly good, consistent when you move the camera, very little haloing (but haloing anyway) and no gray flat surfaces, captures well fine details. Plus you get one bounce local GI, like in the last two screens. Speed wise, without GI it is as fast as every other ssao implementation, maybe a bit slower. With GI, more or less 30-35% speed decrease. I´m going to post full glsl code soon, together with some speed comparisons. To implement it you´ll need to have a gbuffer set up, or at least normal, diffuse+directlight and depth buffers.
Advertisement
Looks great...keep up the good work, looking forward to checking out the code.
Here´s the code. It does not have any magic numbers or parameters so it should be pretty "plug and play".

Things to note:
-View space position is reconstructed from depth using texture coordinates of the fullscreen quad, no frustum corners required.
-Depth buffer is expected to be linear. So do not use the opengl one, create your own depth shader.
-The randomization texture is this one:


uniform sampler2D gnormals;uniform sampler2D gdepth;uniform sampler2D gdiffuse;uniform sampler2D grandom;vec3 readNormal(in vec2 coord)  {       return normalize(texture2D(gnormals, coord).xyz*2.0  - 1.0);  }vec3 posFromDepth(vec2 coord){     float d = texture2D(gdepth, coord).r;     vec3 tray = mat3x3(gl_ProjectionMatrixInverse)*vec3((coord.x-0.5)*2.0,(coord.y-0.5)*2.0,1.0);     return tray*d;}    //Ambient Occlusion form factor:    float aoFF(in vec3 ddiff,in vec3 cnorm, in float c1, in float c2){          vec3 vv = normalize(ddiff);          float rd = length(ddiff);          return (1.0-clamp(dot(readNormal(gl_TexCoord[0]+vec2(c1,c2)),-vv),0.0,1.0)) *           clamp(dot( cnorm,vv ),0.0,1.0)*                  (1.0 - 1.0/sqrt(1.0/(rd*rd) + 1.0));    }    //GI form factor:    float giFF(in vec3 ddiff,in vec3 cnorm, in float c1, in float c2){          vec3 vv = normalize(ddiff);          float rd = length(ddiff);          return 1.0*clamp(dot(readNormal(gl_TexCoord[0]+vec2(c1,c2)),-vv),0.0,1.0)*                     clamp(dot( cnorm,vv ),0.0,1.0)/                     (rd*rd+1.0);      }void main(){    //read current normal,position and color.    vec3 n = readNormal(gl_TexCoord[0].st);    vec3 p = posFromDepth(gl_TexCoord[0].st);    vec3 col = texture2D(gdiffuse, gl_TexCoord[0]).rgb;    //randomization texture    vec2 fres = vec2(800.0/128.0*5,600.0/128.0*5);    vec3 random = texture2D(grandom, gl_TexCoord[0].st*fres.xy);    random = random*2.0-vec3(1.0);    //initialize variables:    float ao = 0.0;    vec3 gi = vec3(0.0,0.0,0.0);    float incx = 1.0/800.0*0.1;    float incy = 1.0/600.0*0.1;    float pw = incx;    float ph = incy;    float cdepth = texture2D(gdepth, gl_TexCoord[0]).r;    //3 rounds of 8 samples each.     for(float i=0.0; i<3.0; ++i)     {       float npw = (pw+0.0007*random.x)/cdepth;       float nph = (ph+0.0007*random.y)/cdepth;       vec3 ddiff = posFromDepth(gl_TexCoord[0].st+vec2(npw,nph))-p;       vec3 ddiff2 = posFromDepth(gl_TexCoord[0].st+vec2(npw,-nph))-p;       vec3 ddiff3 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,nph))-p;       vec3 ddiff4 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,-nph))-p;       vec3 ddiff5 = posFromDepth(gl_TexCoord[0].st+vec2(0,nph))-p;       vec3 ddiff6 = posFromDepth(gl_TexCoord[0].st+vec2(0,-nph))-p;       vec3 ddiff7 = posFromDepth(gl_TexCoord[0].st+vec2(npw,0))-p;       vec3 ddiff8 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,0))-p;       ao+=  aoFF(ddiff,n,npw,nph);       ao+=  aoFF(ddiff2,n,npw,-nph);       ao+=  aoFF(ddiff3,n,-npw,nph);       ao+=  aoFF(ddiff4,n,-npw,-nph);       ao+=  aoFF(ddiff5,n,0,nph);       ao+=  aoFF(ddiff6,n,0,-nph);       ao+=  aoFF(ddiff7,n,npw,0);       ao+=  aoFF(ddiff8,n,-npw,0);       gi+=  giFF(ddiff,n,npw,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,nph)).rgb;       gi+=  giFF(ddiff2,n,npw,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,-nph)).rgb;       gi+=  giFF(ddiff3,n,-npw,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,nph)).rgb;       gi+=  giFF(ddiff4,n,-npw,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,-nph)).rgb;       gi+=  giFF(ddiff5,n,0,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(0,nph)).rgb;       gi+=  giFF(ddiff6,n,0,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(0,-nph)).rgb;       gi+=  giFF(ddiff7,n,npw,0)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,0)).rgb;       gi+=  giFF(ddiff8,n,-npw,0)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,0)).rgb;       //increase sampling area:       pw += incx;         ph += incy;        }     ao/=24.0;    gi/=24.0;    gl_FragColor = vec4(col-vec3(ao)+gi*5.0,1.0);}


EDIT: there was a bug in the code, now it´s fixed.

[Edited by - ArKano22 on December 14, 2009 7:33:16 PM]
Are normals and positions supposed to be in View space? And does opengl view space mean the same thing in D3D?
Quote:Original post by Matt Aufderheide
Are normals and positions supposed to be in View space? And does opengl view space mean the same thing in D3D?


yes, normals and position (depth, because you reconstruct the position) are in view space.

In OpenGL view space means relative to the camera position so i think it is the same no matter what api you´re using.
Do you have any notions of rendering time? How efficient is this method compared to standard SSAO implementations?
Bookmarked for future study :D. Great job man !

EDIT: I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.

[Edited by - b_thangvn on December 16, 2009 1:56:32 AM]
Fantastic shader. Really sweet.

Any chance of a HLSL version of the shader? That would be incredible. :D
Quote:Original post by b_thangvn
I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.
I've been thinking the same thing. It would be *much* faster to not reproject every sample, but of course we then need a way to move that ray around accurately as well as the UV coordinate, so we can get an accurate position.

No clue how to do that yet. I've tried some back of envelope math on it and it didn't work, but then again my math sucks in stuff like this. :(
------------------------------Great Little War Game
Quote:Original post by Rubicon
Quote:Original post by b_thangvn
I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.
I've been thinking the same thing. It would be *much* faster to not reproject every sample, but of course we then need a way to move that ray around accurately as well as the UV coordinate, so we can get an accurate position.

No clue how to do that yet. I've tried some back of envelope math on it and it didn't work, but then again my math sucks in stuff like this. :(


You can use it if you use the corner method, but you will have to recalculate the projection as i do based o texture coordinates. Well, i think it would be possible to avoid having to reproject every sample and use center sample to calculate the rest of them somehow, but the math involved in it might not save that much. I haven´t tried to do it so i don´t know for sure.

About speed, this is what i measured:
sponza scene: 120 fps no ssao, 98 with ssao. (no color bleeding). with color bleeding, 58-60 fps. Note that i´m taking 24 samples for each (gi and ao), 48 in total. Taking less samples and blurring afterwards might increase performance.

For some reason the method i use to take the samples is faster that other methods i´ve tried, maybe it is because of cache coherency?

This topic is closed to new replies.

Advertisement