the best ssao ive seen

Started by
237 comments, last by Paksas 12 years, 5 months ago
Took a quick look at your RenderMonkey project and it looks great! Runs pretty fast on my 5870 too... when I get some time I'd like to write up a compute shader version where some local neighborhood of the depth buffer is cached in local memory to save on bandwidth and see how that performs.

One quick note: in DX9 you need to offset by a half pixel when sampling the G-buffer texture to get a 1:1 pixel-to-texel mapping. Add this code to the end of your "do ssao" vertex shader to get rid of the diagonal line in your output:
o.position.xy -= g_inv_screen_size;
Note that this isn't required in DX10+ or OpenGL.

Again, very neat! I'll try to find some time to dive in properly and take a look at the math in the next little while. Cheers!
Advertisement
Quote:Original post by AndyTX
Took a quick look at your RenderMonkey project and it looks great! Runs pretty fast on my 5870 too... when I get some time I'd like to write up a compute shader version where some local neighborhood of the depth buffer is cached in local memory to save on bandwidth and see how that performs.

I've written just such a compute shader for a chapter in Game Programming Gems 8 (for a more traditional SSAO algorithm). The net result is that there is a proportional improvement in frametime as the number of samples increases when using the group shared memory for a cache. The trick is restricting the sampling area to the locally cached grid area - for pixels that are close to the viewer the sampling radius can become quite large, and then you either need to restrict the sample location or directly fetch the sample from texture (and dynamically decide when to do that too...). In practice, you don't notice it too much if the sampling radius is limited.

I've added a modified version of the demo to my engine if you are interested in taking a peak (link in signature below).
Quote:Original post by AndyTX
Took a quick look at your RenderMonkey project and it looks great! Runs pretty fast on my 5870 too... when I get some time I'd like to write up a compute shader version where some local neighborhood of the depth buffer is cached in local memory to save on bandwidth and see how that performs.

One quick note: in DX9 you need to offset by a half pixel when sampling the G-buffer texture to get a 1:1 pixel-to-texel mapping. Add this code to the end of your "do ssao" vertex shader to get rid of the diagonal line in your output:
o.position.xy -= g_inv_screen_size;
Note that this isn't required in DX10+ or OpenGL.

Again, very neat! I'll try to find some time to dive in properly and take a look at the math in the next little while. Cheers!


Double thank you!! One for the compliments and the other for that hint about dx9! i was having some self-occlusion issues when using very small radiuses and i kept wondering why they appeared. Now i know i was sampling incorrectly.

Btw, i am not very familiar with recent gpu hardware architecture, i´m still in the dx9/gl2.0 era. I really should read about geometry and compute shaders...they could really be helpful for ssao. Can anyone point me to some good texts on the subject?

Quote:Original post by AgentSnoop
Quote:Original post by Shael
Sorry to be a pain in the ass. I've finally got the first RM project code working outside of RM. But the only way to do it was to use exactly the same render target setup. I cannot for the life of me get it to work using R32F for the linear depth, and A8R8G8B8 for normals and then reconstruct the position using a frustum ray.

I've checked my code over and over and I can't see what is wrong. The position and normals when output to screen look the same as when I output them in RM. Its driving me insane.

Could you setup either a small dx application or another RM project that uses this form render target setup with position reconstruction to see if you can get it working?

Thanks a lot!


I'm using that kind of set up currently. If you post your code, I can try plugging it in to see what I get, and see if there are any major differences between the two.


Ok this is the current code I'm using to try get working using position reconstruction.

float g_sample_rad = 2;float g_intensity = 500.0f;float g_scale = 180.0f;float g_bias = 0.085f;texture DepthBuffer;texture NormalBuffer;texture RandomTexture;sampler2D DepthSampler = sampler_state{	Texture = <DepthBuffer>;	MinFilter = LINEAR;	MipFilter = NONE;	MagFilter = LINEAR;		ADDRESSU = CLAMP;    ADDRESSV = CLAMP;};sampler2D NormalSampler = sampler_state{	Texture = <NormalBuffer>;	MinFilter = LINEAR;	MipFilter = NONE;	MagFilter = LINEAR;		ADDRESSU = CLAMP;    ADDRESSV = CLAMP;};sampler2D RandomSampler = sampler_state{	Texture = <RandomTexture>;	MinFilter = LINEAR;	MipFilter = NONE;	MagFilter = LINEAR;		ADDRESSU = WRAP;    ADDRESSV = WRAP;};// ----------------------------------------------------------------------------// HELPER FUNCTIONS// ----------------------------------------------------------------------------float3 getPosition(in float3 ray, in float2 uv){	return ray * tex2D(DepthSampler, uv).r;}float3 getNormal(in float2 uv){	return normalize(tex2D(NormalSampler, uv).xyz * 2.0f - 1.0f);}float2 getRandom(in float2 uv){	return normalize(tex2D(RandomSampler, float2(fViewportWidth, fViewportHeight) * uv / 64.0f).xy * 2.0f - 1.0f);}float doAmbientOcclusion(in float3 ray, in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm){	float3 diff = getPosition(ray, tcoord + uv) - p;	const float3 v = normalize(diff);	const float  d = length(diff)*g_scale;	return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d));}struct OutputVS{    float4 Position		: POSITION0;	float2 uv 	        : TEXCOORD0;	float3 FrustumRay   : TEXCOORD1;};// ----------------------------------------------------------------------------OutputVS SSAO_VS(float4 pos : POSITION0, float4 tex : TEXCOORD0){	OutputVS Out = (OutputVS) 0;		Out.Position = pos;	Out.uv = tex + halfPixel();	Out.FrustumRay = FSQ_GetFrustumRay(tex);		return Out;}float4 SSAO_PS(OutputVS i) : COLOR0{	float4 result = float4(1, 1, 1, 1);	const float2 vec[4] = {float2(1,0),float2(-1,0),				   		float2(0,1),float2(0,-1)};	float3 p = getPosition(i.FrustumRay, i.uv);	float3 n = getNormal(i.uv);	float2 rand = getRandom(i.uv);	float ao = 0.0f;	//**SSAO Calculation**//	float iterations = 4.0;	for (int j = 0; j < iterations; ++j)	{		float2 coord1 = reflect(vec[j],rand)*g_sample_rad/p.z;		float2 coord2 = float2(coord1.x*0.707 - coord1.y*0.707,						   coord1.x*0.707 + coord1.y*0.707);		ao += doAmbientOcclusion(i.FrustumRay, i.uv, coord1*0.25, p, n);		ao += doAmbientOcclusion(i.FrustumRay, i.uv, coord2*0.5, p, n);		ao += doAmbientOcclusion(i.FrustumRay, i.uv, coord1*0.75, p, n);		ao += doAmbientOcclusion(i.FrustumRay, i.uv, coord2, p, n);	} 	ao /= 32.0;	//**END**//	result.rgb -= saturate(ao*g_intensity);	return result;}


This is how I setup the Depth and Normal buffers
VS_DN_OUT VS_DepthNorm( VS_MESH_INPUT input ){	VS_DN_OUT output = (VS_DN_OUT)0;     float4x4 wv = mul(mWorld, mView);	float4x4 wvp = mul(wv, mProj);	    output.Position = mul(input.Pos, wvp);	    float4 vPositionVS = mul(input.Pos, wv);	output.Depth = vPositionVS.z;		output.Tex0 = input.Tex0;		output.Normal = mul(input.Normal, wv);     return output;}PS_DN_OUT PS_DepthNorm( VS_DN_OUT In ){    PS_DN_OUT output = (PS_DN_OUT)1;    	// We store a linear depth term, to be reconstructed using the Frustum Ray method.       // NOTE: i have tried dividing by the far clip and multiplying the depth by it later on but still get shocking results.	output.Depth = float4(In.Depth, 0.0f, 0.0f, 0.0f);		// Move into [0, 1] range	output.Normals.rgb = 0.5f * (normalize(In.Normal) + 1.0f);    output.Normals.a = 1.0f;     return output;}


Quote:Original post by Shael
Quote:Original post by AgentSnoop
Quote:Original post by Shael
Sorry to be a pain in the ass. I've finally got the first RM project code working outside of RM. But the only way to do it was to use exactly the same render target setup. I cannot for the life of me get it to work using R32F for the linear depth, and A8R8G8B8 for normals and then reconstruct the position using a frustum ray.

I've checked my code over and over and I can't see what is wrong. The position and normals when output to screen look the same as when I output them in RM. Its driving me insane.

Could you setup either a small dx application or another RM project that uses this form render target setup with position reconstruction to see if you can get it working?

Thanks a lot!


I'm using that kind of set up currently. If you post your code, I can try plugging it in to see what I get, and see if there are any major differences between the two.


Ok this is the current code I'm using to try get working using position reconstruction.

*** Source Snippet Removed ***

This is how I setup the Depth and Normal buffers
*** Source Snippet Removed ***


I think the problem is not the code but the actual concept of what you´re trying to do. You are reconstructing position using the frustum corners. When you need to sample several positions around, and not only the current pixel, the frustum ray is no longer valid because it is interpolated from the vertex shader. I believe this causes the reconstructed sample positions to be wrong.

There´s a little code snippet somewhere in a paper called "Deferred Lighting in the Leadwerks Engine" that allows you to reconstruct position without the frustum ray. Copy paste from it:

"------------------
The fragment coordinate was used to calculate the screen space position. Here, the buffersize uniform is used to pass the screen width and height to the shader:

vec3 screencoord; screencoord = vec3(((gl_FragCoord.x/buffersize.x)-0.5) * 2.0,((-gl_FragCoord.y/buffersize.y)+0.5) * 2.0 / (buffersize.x/buffersize.y),DepthToZPosition( depth )); screencoord.x *= screencoord.z; screencoord.y *= -screencoord.z;

The depth was converted to a screen space z position with the following function. The camerarange uniform stores the camera near and far clipping distances:

float DepthToZPosition(in float depth) { return camerarange.x / (camerarange.y - depth * (camerarange.y - camerarange.x)) * camerarange.y; }
"------------------

Doing all this for each sample might be slow, though.

[Edited by - ArKano22 on March 18, 2010 3:00:07 PM]
Quote:Original post by Shael
Quote:Original post by AgentSnoop
Quote:Original post by Shael
Sorry to be a pain in the ass. I've finally got the first RM project code working outside of RM. But the only way to do it was to use exactly the same render target setup. I cannot for the life of me get it to work using R32F for the linear depth, and A8R8G8B8 for normals and then reconstruct the position using a frustum ray.

I've checked my code over and over and I can't see what is wrong. The position and normals when output to screen look the same as when I output them in RM. Its driving me insane.

Could you setup either a small dx application or another RM project that uses this form render target setup with position reconstruction to see if you can get it working?

Thanks a lot!


I'm using that kind of set up currently. If you post your code, I can try plugging it in to see what I get, and see if there are any major differences between the two.


Ok this is the current code I'm using to try get working using position reconstruction.

*** Source Snippet Removed ***

This is how I setup the Depth and Normal buffers
*** Source Snippet Removed ***


Alright, so I plugged in the SSAO pixel shader code for the most part, changing variable names slightly as I did, and the result was pretty much identical. The only difference in the code I saw was with setting up the depth buffer.

I have depth = -IN.depth / farClipDist.
I also I don't multiply it by the farClipDist in the SSAO pixelshader, just the frustum ray.

I don't know if that will change anything for you.
ArKano22:

I haven't tried that method as I hadn't seen it before, only the frustum ray method. I'll give it a go if I cant work out the ray method. Have you tried either method?

By the way, have you noticed the latest revision of your code doesn't look as good on the simple box/sphere/cylinder mesh you originally had? It looks good on the Hebe model tho.

AgentSnoop:

I had tried dividing the depth by the far clip but the results were still crap. I didn't negate it though as I thought that was only when you're using a right hand coordinate system, and since DirectX uses a left hand system I didn't think it was necessary. I will try it again though and see what happens.

It might be the frustum ray that is messed up? Can you show me how you get the ray in the application and pass it to the shader?

I use CML for my math lib and so I use these frustum functions:

http://cmldev.net/?p=582

I then get the far clip planes in worldspace and transform them to viewspace by multiplying them by the camera view matrix. I pass the 4 vectors to the shader and use shader math to determine which one to get. Eg.

float3 GetFrustumRay(in float2 texCoord)	{		float index = texCoord.x + (texCoord.y * 2);		return FrustumCorners[index];	}
Quote:Original post by Shael
ArKano22:

I haven't tried that method as I hadn't seen it before, only the frustum ray method. I'll give it a go if I cant work out the ray method. Have you tried either method?

AgentSnoop:

I had tried dividing the depth by the far clip but the results were still crap. I didn't negate it though as I thought that was only when you're using a right hand coordinate system, and since DirectX uses a left hand system I didn't think it was necessary. I will try it again though and see what happens.

It might be the frustum ray that is messed up? Can you show me how you get the ray in the application and pass it to the shader?

I use CML for my math lib and so I use these frustum functions:

http://cmldev.net/?p=582

I then get the far clip planes in worldspace and transform them to viewspace by multiplying them by the camera view matrix. I pass the 4 vectors to the shader and use shader math to determine which one to get. Eg.

*** Source Snippet Removed ***


Here is how I create the view-space frustum. Since this doesn't change unless you change your projection matrix, you don't have to update this much.

It should all work, but to be honest, I'm not sure if I tried all the paths. Essentially, I have different paths for it it's an orthographic projection or not, and then if I set the horizontal fov and vertical fov independent or not. Then at the end, I put it in a matrix and send the matrix to the shader. I put the index for the frustum ray in the position vector when I create the vertex buffer.

	float nearX, nearY, nearZ;	float farX, farY, farZ;	float farH, farW;	float nearH, nearW;		if (ortho)	{		farH = 2.0f * plane[Plane_Top];		farW = 2.0f * plane[Plane_Right];		nearH = 2.0f * plane[Plane_Top];		nearW = 2.0f * plane[Plane_Right];		farX = plane[Plane_Right];		farY = plane[Plane_Top];		farZ = plane[Plane_Far];		nearX = plane[Plane_Right];		nearY = plane[Plane_Top];		nearZ = plane[Plane_Near];	}	else	{		float tanVertical = 2.0f * tanf(vFOVrad / 2.0f);		float tanHorizontal = 2.0f * tanf(hFOVrad / 2.0f);		farH = tanVertical * plane[Plane_Far];		if (hFOVrad == 0)			farW = farH * aspectRatio;		else			farW = tanHorizontal * plane[Plane_Far];		nearH = tanVertical * plane[Plane_Near];		if (hFOVrad == 0)			nearW = nearH * aspectRatio;		else			nearW = tanHorizontal * plane[Plane_Near];		nearX = nearW * 0.5f;	// / 2.0f;		nearY = nearH *0.5f;	// / 2.0f;		nearZ = plane[Plane_Near];		farX = farW * 0.5f;		// / 2.0f;		farY = farH * 0.5f;		// / 2.0f;		farZ = plane[Plane_Far];	}	/*	//near clip plane first, top-left then clockwise	Vector3 ntl = Vector3(-nearX, nearY,-nearZ);	Vector3 ntr = Vector3( nearX, nearY,-nearZ);	Vector3 nbr = Vector3( nearX,-nearY,-nearZ);	Vector3 nbl = Vector3(-nearX,-nearY,-nearZ);	Vector3 ftl = Vector3(-farX, farY,-farZ);	Vector3 ftr = Vector3( farX, farY,-farZ);	Vector3 fbr = Vector3( farX,-farY,-farZ);	Vector3 fbl = Vector3(-farX,-farY,-farZ);	*/	//ftr	vsFarFrustumCorners._11 = farX;	vsFarFrustumCorners._12 = farY;	vsFarFrustumCorners._13 = -farZ;	//ftl	vsFarFrustumCorners._21 = -farX;	vsFarFrustumCorners._22 = farY;	vsFarFrustumCorners._23 = -farZ;	//fbl	vsFarFrustumCorners._31 = -farX;	vsFarFrustumCorners._32 = -farY;	vsFarFrustumCorners._33 = -farZ;	//fbr	vsFarFrustumCorners._41 = farX;	vsFarFrustumCorners._42 = -farY;	vsFarFrustumCorners._43 = -farZ;	vsFarFrustumCorners._14 = vsFarFrustumCorners._24 = vsFarFrustumCorners._34 = vsFarFrustumCorners._44 = 1.0f;

@ArKano22:
I tried your newest shader. It runs a bit faster than previous versions for me (so it is the fastest of them). I had to tinker with the parameters a little bit. For example you multiply the sample radius with the texel-size in one version but not the other. Or is it a change I made myself and don't remember?
Anyway, the quality is good and I like, that occluded areas don't shrink to "lines" when you get close. On the other hand occlusion is non-existent further away. For objects at medium distance, the occlusion effect is pretty similar. Its a very informal description, but I would say, the new shader produces better results for close objects while the other one produces better results for distant objects. Maybe I can find something in between. I think I stick with the new shader so long.
Thanks again for sharing!
Shael:

I tried both methods (for deferred lighting) and i find the frustum one is faster. However the one i suggested is easier to set up and works equally well.

About the ssao, yes, it looks worse in the old scene. However for me it looks better on sponza, for example :S. I guess they work differently well for different scenes.

This topic is closed to new replies.

Advertisement