Jump to content



the best ssao ive seen

  • You cannot reply to this topic
238 replies to this topic

#1 ArKano22   Members   -  Reputation: 630

Like
0Likes
Like

Posted 13 December 2009 - 10:54 AM

Sorry if the topic title is a bit pretentious but that´s what i think when i look at it :D. I´ve been struggling quite some time to get a good looking ssao and this is the end of my quest! I´ve implemented the famous bunnell GI disk method (http://http.develope..._chapter14.html) but in screen space (so in the end it´s only form factors) and the results speak for themselves: With contrast increased, so you can see artifacts & granularity: None of the screens have any blurring applied, just ssao output. The banding you can see at some places is due to precision problems in my gbuffer, bear with me. Otherwise, it looks incredibly good, consistent when you move the camera, very little haloing (but haloing anyway) and no gray flat surfaces, captures well fine details. Plus you get one bounce local GI, like in the last two screens. Speed wise, without GI it is as fast as every other ssao implementation, maybe a bit slower. With GI, more or less 30-35% speed decrease. I´m going to post full glsl code soon, together with some speed comparisons. To implement it you´ll need to have a gbuffer set up, or at least normal, diffuse+directlight and depth buffers.

Ad:

#2 dgreen02   Members   -  Reputation: 912

Like
0Likes
Like

Posted 13 December 2009 - 10:55 AM

Looks great...keep up the good work, looking forward to checking out the code.

#3 ArKano22   Members   -  Reputation: 630

Like
0Likes
Like

Posted 13 December 2009 - 11:33 AM

Here´s the code. It does not have any magic numbers or parameters so it should be pretty "plug and play".

Things to note:
-View space position is reconstructed from depth using texture coordinates of the fullscreen quad, no frustum corners required.
-Depth buffer is expected to be linear. So do not use the opengl one, create your own depth shader.
-The randomization texture is this one:



uniform sampler2D gnormals;
uniform sampler2D gdepth;
uniform sampler2D gdiffuse;
uniform sampler2D grandom;

vec3 readNormal(in vec2 coord)
{
return normalize(texture2D(gnormals, coord).xyz*2.0 - 1.0);
}

vec3 posFromDepth(vec2 coord){
float d = texture2D(gdepth, coord).r;
vec3 tray = mat3x3(gl_ProjectionMatrixInverse)*vec3((coord.x-0.5)*2.0,(coord.y-0.5)*2.0,1.0);
return tray*d;
}
//Ambient Occlusion form factor:
float aoFF(in vec3 ddiff,in vec3 cnorm, in float c1, in float c2){
vec3 vv = normalize(ddiff);
float rd = length(ddiff);
return (1.0-clamp(dot(readNormal(gl_TexCoord[0]+vec2(c1,c2)),-vv),0.0,1.0)) *
clamp(dot( cnorm,vv ),0.0,1.0)*
(1.0 - 1.0/sqrt(1.0/(rd*rd) + 1.0));
}
//GI form factor:
float giFF(in vec3 ddiff,in vec3 cnorm, in float c1, in float c2){
vec3 vv = normalize(ddiff);
float rd = length(ddiff);
return 1.0*clamp(dot(readNormal(gl_TexCoord[0]+vec2(c1,c2)),-vv),0.0,1.0)*
clamp(dot( cnorm,vv ),0.0,1.0)/
(rd*rd+1.0);
}

void main()
{
//read current normal,position and color.
vec3 n = readNormal(gl_TexCoord[0].st);
vec3 p = posFromDepth(gl_TexCoord[0].st);
vec3 col = texture2D(gdiffuse, gl_TexCoord[0]).rgb;

//randomization texture
vec2 fres = vec2(800.0/128.0*5,600.0/128.0*5);
vec3 random = texture2D(grandom, gl_TexCoord[0].st*fres.xy);
random = random*2.0-vec3(1.0);

//initialize variables:
float ao = 0.0;
vec3 gi = vec3(0.0,0.0,0.0);
float incx = 1.0/800.0*0.1;
float incy = 1.0/600.0*0.1;
float pw = incx;
float ph = incy;
float cdepth = texture2D(gdepth, gl_TexCoord[0]).r;

//3 rounds of 8 samples each.
for(float i=0.0; i<3.0; ++i)
{
float npw = (pw+0.0007*random.x)/cdepth;
float nph = (ph+0.0007*random.y)/cdepth;

vec3 ddiff = posFromDepth(gl_TexCoord[0].st+vec2(npw,nph))-p;
vec3 ddiff2 = posFromDepth(gl_TexCoord[0].st+vec2(npw,-nph))-p;
vec3 ddiff3 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,nph))-p;
vec3 ddiff4 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,-nph))-p;
vec3 ddiff5 = posFromDepth(gl_TexCoord[0].st+vec2(0,nph))-p;
vec3 ddiff6 = posFromDepth(gl_TexCoord[0].st+vec2(0,-nph))-p;
vec3 ddiff7 = posFromDepth(gl_TexCoord[0].st+vec2(npw,0))-p;
vec3 ddiff8 = posFromDepth(gl_TexCoord[0].st+vec2(-npw,0))-p;

ao+= aoFF(ddiff,n,npw,nph);
ao+= aoFF(ddiff2,n,npw,-nph);
ao+= aoFF(ddiff3,n,-npw,nph);
ao+= aoFF(ddiff4,n,-npw,-nph);
ao+= aoFF(ddiff5,n,0,nph);
ao+= aoFF(ddiff6,n,0,-nph);
ao+= aoFF(ddiff7,n,npw,0);
ao+= aoFF(ddiff8,n,-npw,0);

gi+= giFF(ddiff,n,npw,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,nph)).rgb;
gi+= giFF(ddiff2,n,npw,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,-nph)).rgb;
gi+= giFF(ddiff3,n,-npw,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,nph)).rgb;
gi+= giFF(ddiff4,n,-npw,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,-nph)).rgb;
gi+= giFF(ddiff5,n,0,nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(0,nph)).rgb;
gi+= giFF(ddiff6,n,0,-nph)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(0,-nph)).rgb;
gi+= giFF(ddiff7,n,npw,0)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(npw,0)).rgb;
gi+= giFF(ddiff8,n,-npw,0)*texture2D(gdiffuse, gl_TexCoord[0]+vec2(-npw,0)).rgb;

//increase sampling area:
pw += incx;
ph += incy;
}
ao/=24.0;
gi/=24.0;


gl_FragColor = vec4(col-vec3(ao)+gi*5.0,1.0);
}






EDIT: there was a bug in the code, now it´s fixed.

[Edited by - ArKano22 on December 14, 2009 7:33:16 PM]

#4 Matt Aufderheide   Members   -  Reputation: 99

Like
0Likes
Like

Posted 13 December 2009 - 01:09 PM

Are normals and positions supposed to be in View space? And does opengl view space mean the same thing in D3D?

#5 ArKano22   Members   -  Reputation: 630

Like
0Likes
Like

Posted 13 December 2009 - 08:55 PM

Quote:
Original post by Matt Aufderheide
Are normals and positions supposed to be in View space? And does opengl view space mean the same thing in D3D?


yes, normals and position (depth, because you reconstruct the position) are in view space.

In OpenGL view space means relative to the camera position so i think it is the same no matter what api you´re using.


#6 Jason Z   GDNet+   -  Reputation: 1136

Like
0Likes
Like

Posted 14 December 2009 - 01:58 PM

Do you have any notions of rendering time? How efficient is this method compared to standard SSAO implementations?
Jason Zink :: DirectX MVP
Check out our (now available) D3D11 book: Practical Rendering and Computation with Direct3D 11
Check out my Direct3D 11 engine on CodePlex: Hieroglyph 3
Check out our free online D3D10 book: Programming Vertex, Geometry, and Pixel Shaders
Lunar Rift :: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article :: Fast Silhouettes Article

#7 b_thangvn   Members   -  Reputation: 176

Like
0Likes
Like

Posted 15 December 2009 - 03:56 AM

Bookmarked for future study :D. Great job man !

EDIT: I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.

[Edited by - b_thangvn on December 16, 2009 1:56:32 AM]

#8 Styves   Members   -  Reputation: 185

Like
0Likes
Like

Posted 17 December 2009 - 09:01 AM

Fantastic shader. Really sweet.

Any chance of a HLSL version of the shader? That would be incredible. :D

#9 Rubicon   Members   -  Reputation: 289

Like
0Likes
Like

Posted 18 December 2009 - 10:44 PM

Quote:
Original post by b_thangvn
I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.
I've been thinking the same thing. It would be *much* faster to not reproject every sample, but of course we then need a way to move that ray around accurately as well as the UV coordinate, so we can get an accurate position.

No clue how to do that yet. I've tried some back of envelope math on it and it didn't work, but then again my math sucks in stuff like this. :(

#10 ArKano22   Members   -  Reputation: 630

Like
0Likes
Like

Posted 19 December 2009 - 01:09 AM

Quote:
Original post by Rubicon
Quote:
Original post by b_thangvn
I have a question: so if I use the frustum corner method to draw screen quad, will I still be able to use this SSAO algorithm? Thanks.
I've been thinking the same thing. It would be *much* faster to not reproject every sample, but of course we then need a way to move that ray around accurately as well as the UV coordinate, so we can get an accurate position.

No clue how to do that yet. I've tried some back of envelope math on it and it didn't work, but then again my math sucks in stuff like this. :(


You can use it if you use the corner method, but you will have to recalculate the projection as i do based o texture coordinates. Well, i think it would be possible to avoid having to reproject every sample and use center sample to calculate the rest of them somehow, but the math involved in it might not save that much. I haven´t tried to do it so i don´t know for sure.

About speed, this is what i measured:
sponza scene: 120 fps no ssao, 98 with ssao. (no color bleeding). with color bleeding, 58-60 fps. Note that i´m taking 24 samples for each (gi and ao), 48 in total. Taking less samples and blurring afterwards might increase performance.

For some reason the method i use to take the samples is faster that other methods i´ve tried, maybe it is because of cache coherency?

#11 Rubicon   Members   -  Reputation: 289

Like
0Likes
Like

Posted 19 December 2009 - 03:06 AM

I might try your method above in a bit, thanks for sharing it. I would expect that the reprojection of every point isn't *that* much worse that the technique I'm currently using, but indeed there might even be a net win due to better cache access.

The biggest speed up I got was in dropping down to doing all the calculations in a quarter sized RT and then blurring it up to full size afterwards. Might not even need the blur tbh - your screenies have a lot less random speckling than other techniques - I like it a lot.

I'll be doing this in DX9 btw, so if I get anywhere, I'll put up some source for the guy asking. Although in truth glsl and hlsl are pretty much identical barring some annoying syntactic differences. Though I do wish DX supplied those custom variables like the projection matrix without us having to dick about.

#12 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 06:19 AM

Has anyone managed to get this working as an HLSL shader ? I have a couple of questions about the setup and about GLSL -

Can someone confirm how they are calculating the normal and depth ? I am using my calculations from my deferred renderer but I dont think they are right. Can someone post their calcs so I can compare please ?

Also, what does ".st" mean after the texture coords in GLSL ?

Thanks.

#13 Rubicon   Members   -  Reputation: 289

Like
0Likes
Like

Posted 20 December 2009 - 11:21 AM

I've got mine working in HLSL and will post it shortly - got a few tidy ups to make first.

The .st is simply the uv values you'd expect them to be. GLSL has some really quite nice pre-existing globals you can just reference directly, rather than passing it all up yourself. (It still has to pass them, but it does it for you in the background and is a damn nice feature. God I hate it when GL does something nice so I can't moan, lol!)

#14 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 11:29 AM

Sorry, just PM'd you as you posted. Could you just post up the shader you use to build targets for this shader to use. I am curious about what I am doing wrong. Thanks.

#15 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 11:30 AM

{duplicate}

#16 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 11:30 AM

{duplicate}

#17 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 11:30 AM

{duplicate}

#18 Rubicon   Members   -  Reputation: 289

Like
0Likes
Like

Posted 20 December 2009 - 11:33 AM

Hope this helps. Most of it should be obvious enough. My code is built in code iyswim, but I'v removed the "AddLine blah" stuff but there'll still be odd places where I put in constant names using sprintf stylee.

There's an optimisation to be had here to reduce the amount of samples (there are currently 2 reads per sampling, where the 1st one should cache the read for the 2nd one but you have to do something for yourself ;)

uniform float4		Params:register(c%i); // .w = about 10 or so (pixel radius) whilst .x and .y are RT dimensions and .z=viewport far
uniform float4x4 InvProj:register(c%i); // Full-fat 4x4 projection matrix that's been inverted, then transposed, then set row by row into these 4x constants
uniform sampler SampleMap:register(s0);
uniform sampler RandNorms:register(s1);

float3 PosFromDepth (float2 UV)
{
float4 Sample=tex2D(SampleMap,UV);
float Depth=(Sample.z+Sample.w/255)*Params.z;
float4 Pos=float4((UV.x-0.5)*2,(0.5-UV.y)*2,1,1);
float4 Ray=mul(Pos,InvProj);
return Ray.xyz*Depth;
}

float3 ReadNormal (float2 UV)
{
float4 Sample=tex2D(SampleMap,UV);
float3 Normal;
Normal.xy=Sample.xy*2-1;
Normal.z=-sqrt(1-dot(Normal.xy,Normal.xy));
return Normal;
}

float aoFF (float3 ddiff,float3 cnorm,float2 UV)
{
float3 vv=normalize(ddiff);
float rd=length(ddiff);
return (1-clamp(dot(ReadNormal(UV),-vv),0,1))*clamp(dot(cnorm,vv),0,1)* (1-1/sqrt(1/(rd*rd)+1));
}

float4 main (float2 UV:TEXCOORD0):COLOR
{
float3 n=ReadNormal(UV);
float3 p=PosFromDepth(UV);
float3 Random=0.005F*Params.w*(tex2D(RandNorms,UV*5).xyz*2-1); // That intial value affects how "wooly" the randomiser is. Smaller numbers = grainer

float ao=0;
float incx=1.0F/Params.x*Params.w;
float incy=1.0F/Params.y*Params.w;
float pw=incx;
float ph=incy;

float4 Sample=tex2D(SampleMap,UV);
float CDepth=1.0F/((Sample.z+Sample.w/255)*Params.z);

for (float i=0;i<3;++i)
{
float npw = (pw+Random.x)*CDepth;
float nph = (ph+Random.y)*CDepth;

float2 UV1=UV+float2(npw,nph);
float3 Dif1=PosFromDepth(UV1)-p;
ao+= aoFF(Dif1,n,UV1);

float2 UV2=UV+float2(npw,-nph);
float3 Dif2=PosFromDepth(UV2)-p;
ao+= aoFF(Dif2,n,UV2);

float2 UV3=UV+float2(-npw,nph);
float3 Dif3=PosFromDepth(UV3)-p;
ao+= aoFF(Dif3,n,UV3);

float2 UV4=UV+float2(-npw,-nph);
float3 Dif4=PosFromDepth(UV4)-p;
ao+= aoFF(Dif4,n,UV4);

float2 UV5=UV+float2(0,nph);
float3 Dif5=PosFromDepth(UV5)-p;
ao+= aoFF(Dif5,n,UV5);

float2 UV6=UV+float2(0,-nph);
float3 Dif6=PosFromDepth(UV6)-p;
ao+= aoFF(Dif6,n,UV6);

float2 UV7=UV+float2(npw,0);
float3 Dif7=PosFromDepth(UV7)-p;
ao+= aoFF(Dif7,n,UV7);

float2 UV8=UV+float2(-npw,0);
float3 Dif8=PosFromDepth(UV8)-p;
ao+= aoFF(Dif8,n,UV8);

pw += incx;
ph += incy;
}

float Val=(ao/24);
return float4(Val,Val,Val,1);
}







#19 LotusExigeS1   Members   -  Reputation: 133

Like
0Likes
Like

Posted 20 December 2009 - 11:37 AM

Hi do you have the shader you are using to build the targets that this one is using ? Thanks and sorry for being a pain.

#20 Rubicon   Members   -  Reputation: 289

Like
0Likes
Like

Posted 20 December 2009 - 11:42 AM

Quote:
Original post by LotusExigeS1
Hi do you have the shader you are using to build the targets that this one is using ? Thanks and sorry for being a pain.
All that is on a tutorial I have up on my company website at Rubicon Dev under "for developers". I've obviously totally replaced the final pass with the above, but all the setup pass stuff remains good, I didn't need to touch it.

You should also note here that my source texture that I'm rendering into is a RGBA32 and not one of those fatter formats. There's really no need to choke your bandwidth here by using wider formats. It's all explained on the site. As proof that 32-bit is good enough, here's my output from the posted code Note that it's a bit blocky because to really save bandwidth and GPU time I do all this at half resolution. (Wish I knew how to make my pics just appear in the post)






We are working on generating results for this topic
PARTNERS