SSAO in Direct3D10

Started by
8 comments, last by unbird 10 years, 10 months ago

I've been working on adding SSAO support to my Direct3D10 program, but I'm a bit confused when it comes to using the normal and depth maps to build the occlusion buffer, which is then blended with the scene. From my understanding, this is the process:

(Pass 1): Generate the normal and depth maps (I use one pass and put the normal in RGB and the depth in A)

(Pass 2): Generate the AO map using the view space normal/depth map

(Pass 3): Render the actual scene using the occlusion factor from the AO map generated in pass 2

I'm confused when it comes to pass 2. I'm attempting to follow the tutorial here: http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/a-simple-and-practical-approach-to-ssao-r2753 but I got confused with how to implement the shader. Namely, where does the


float2 uv
 

parameter come from in the pixel shader, and how is it calculated?

This is my shader for creating the normal/depth map:


//Vertex Input
struct VS_INPUT_SKIN
{
     float4 position : POSITION;
     float3 normal    : NORMAL;
     float2 tex0    : TEXCOORD;
     float3 boneLinks : BONELINKS;    
};
struct SSAO_PS_INPUT
{
    float4 pos : SV_POSITION;
    float3 normal : TEXCOORD0;
    float depth : TEXCOORD1;
};
SSAO_PS_INPUT vs_SSAO(VS_INPUT_SKIN IN)
{
    SSAO_PS_INPUT OUT;

    float4 skinnedPos = skinVert(IN.position, IN.boneLinks[2], MatrixPalette[IN.boneLinks[0]], MatrixPalette[IN.boneLinks[1]]);
    float3 skinnedNormal = skinNorm(IN.normal, IN.boneLinks[2], MatrixPalette[IN.boneLinks[0]], MatrixPalette[IN.boneLinks[1]]);

    float4 worldPos = mul(skinnedPos, worldMatrix);

    OUT.pos = mul(worldPos, viewMatrix);
    OUT.pos = mul(OUT.pos, projMatrix);

    OUT.normal = mul(skinnedNormal, worldMatrix);
    OUT.normal = mul(OUT.normal, viewMatrix);
    OUT.normal = normalize(OUT.normal);

    OUT.depth = mul(worldPos, viewMatrix).z;

    return OUT;
}
float4 ps_SSAO(SSAO_PS_INPUT IN) : SV_Target
{
    return float4(IN.normal,IN.depth);
}
technique10 SSAO_T
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, vs_SSAO() ) );
        SetPixelShader( CompileShader( ps_4_0, ps_SSAO() ) );
        SetGeometryShader( NULL );
    }
}

Basically, I'm confused as how to execute pass 2 using my generated map.

Advertisement

You need to render a full screen quad to run the SSAO shader - the UV is just screen space UV (0,0 is top left, 1,1 is bottom right). You can even generate this using SV_POSITION and scale it by the inverse of the resolution.

Styves is right, you need a full screen quad (it's mentioned in the article, just not fleshed out) or a full screen triangle. Most post processing works so.

By the way. You do not need to use constants, neither do you need vertices/indices at all with this trick.

Alright, so I've got my UV coordinates for the full screen quad (dividing the SV_Position variable by the screen size), and am able to get the algorithm pretty much working. I switched over my map generation shader to render to multiple render targets, one for position, normals, and depth. My problem is now that when I calculate the AO factor, the screen seems to be split in 4 quadrants, and the render fades almost completely to black the further you go down the screen. I'll post screenshots once I get back to my computer.

EDIT: I just did some more testing. It appears that the resulting SSAO map changes in strange ways depending on what color I use to clear the pos/normal maps before rendering to them (green was what I was experiencing the problem I described above with). It looks like clearing with black is the right way to go. Now I'm noticing a lot of linear artefacting. Is there really any way to fix this outside of fully uv unwrapping the model? I attached screenshots below.

Here is my shader to generate the maps:


matrix MatrixPalette[255];
matrix worldMatrix;
matrix viewMatrix;

//Vertex Input
struct VS_INPUT_SKIN
{
     float4 position : POSITION;
     float3 normal    : NORMAL;
     float2 tex0    : TEXCOORD;
     float3 boneLinks : BONELINKS;    
};
struct SSAO_PS_INPUT
{
    float4 pos : SV_POSITION;
    float3 actPos : TEXCOORD0;
    float3 normal : TEXCOORD1;
    float depth : TEXCOORD2;
};
struct SSAO_PS_OUTPUT
{
    float4 posMap : SV_Target0;
    float4 normalMap : SV_Target1;
    float4 depthMap : SV_Target2;
};
float4 skinVert(float4 vert, float fact, matrix bone1, matrix bone2)
{
    float4 p = float4(0.0f, 0.0f, 0.0f, 1.0f);

    //vertex skinning
    float bone1Weight = fact;
    float bone2Weight = 1.0f - bone1Weight;
    p += bone1Weight * mul(vert,bone1);
    p += bone2Weight * mul(vert,bone2);
    p.w = 1.0f;

    return p;
}
float3 skinNorm(float3 vert, float fact, matrix bone1, matrix bone2)
{
    float3 norm = float4(0.0f, 0.0f, 0.0f, 1.0f);
    float bone1Weight = fact;
    float bone2Weight = 1.0f - bone1Weight;
    norm += bone1Weight * mul(vert,bone1);
    norm += bone2Weight * mul(vert,bone2);
    norm = normalize(norm);

    return norm;
}
SSAO_PS_INPUT vs_SSAO(VS_INPUT_SKIN IN)
{
    SSAO_PS_INPUT OUT;

    float4 skinnedPos = skinVert(IN.position, IN.boneLinks[2], MatrixPalette[IN.boneLinks[0]], MatrixPalette[IN.boneLinks[1]]);
    float3 skinnedNormal = skinNorm(IN.normal, IN.boneLinks[2], MatrixPalette[IN.boneLinks[0]], MatrixPalette[IN.boneLinks[1]]);

    float4 worldPos = mul(skinnedPos, worldMatrix);

    OUT.pos = mul(worldPos, viewMatrix);
    OUT.pos = mul(OUT.pos, projMatrix);

    OUT.actPos = mul(worldPos, viewMatrix);

    OUT.normal = mul(skinnedNormal, worldMatrix);
    OUT.normal = mul(OUT.normal, viewMatrix);

    OUT.depth = normalize(mul(worldPos, viewMatrix).z);

    return OUT;
}
SSAO_PS_OUTPUT ps_SSAO(SSAO_PS_INPUT IN) : SV_Target
{
    SSAO_PS_OUTPUT OUT;

    OUT.posMap = float4(IN.actPos, 1);
    OUT.normalMap = float4(IN.normal, 1);
    OUT.depthMap = float4(IN.depth, IN.depth, IN.depth, 1);

    return OUT;
}
technique10 SSAO_T
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, vs_SSAO() ) );
        SetPixelShader( CompileShader( ps_4_0, ps_SSAO() ) );
        SetGeometryShader( NULL );
    }
}
 

And here is my shader to generate the AO map:


matrix MatrixPalette[255];
matrix worldMatrix;
matrix viewMatrix;
matrix projMatrix;
matrix lightViewMatrix;

float2 texelSize;

Texture2D posMap;
Texture2D normalMap;
Texture2D depthMap;
Texture2D randomTexture;

SamplerState DifferredSampler
{
    Filter = MIN_MAG_MIP_LINEAR;
    AddressU = Clamp;
    AddressV = Clamp;
};

struct VS_OUTPUT
{
     float4 position : SV_POSITION;
     float3 normal : NORMAL;
};

float3 getPosition(in float2 uv)
{
    return posMap.Sample(DifferredSampler, uv).xyz;
}
float3 getNormal(in float2 uv)
{
    return normalize(normalMap.Sample(DifferredSampler, uv).xyz * 2.0f - 1.0f);
}
float getDepth(in float2 uv)
{
    return depthMap.Sample(DifferredSampler, uv).w;
}
float3 getRandom(in float2 uv)
{
    return randomTexture.Sample(DifferredSampler, texelSize*uv/float2(64,64)).xyz * 2.0f - 1.0f;
}
float g_sample_rad = 3.0f;
float g_intensity = 3.0f;
float g_scale = 1.0f;
float g_bias = 0.001f;
float doAmbientOcclusion(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm)
{
    float3 diff = getPosition(tcoord + uv) - p;
    const float3 v = normalize(diff);
    const float d = length(diff)*g_scale;
    return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d))*g_intensity;
}

float4 getOcclusion(float2 uv)
{
    float4 o;
    o.rgb = 1.0f;
    o.a = 1.0f;

    const float2 vec[4] = {float2(1,0),float2(-1,0),
                float2(0,1),float2(0,-1)};

    float3 p = getPosition(uv);
    float3 n = getNormal(uv);
    float2 rand = getRandom(uv);

    float ao = 0.0f;
    float rad = g_sample_rad/p.z;

    
    int iterations = 4;
    for (int j = 0; j < iterations; ++j)
    {
      float2 coord1 = reflect(vec[j],rand)*rad;
      float2 coord2 = float2(coord1.x*0.707 - coord1.y*0.707,
                  coord1.x*0.707 + coord1.y*0.707);
 
      ao += doAmbientOcclusion(uv,coord1*0.25, p, n);
      ao += doAmbientOcclusion(uv,coord2*0.5, p, n);
      ao += doAmbientOcclusion(uv,coord1*0.75, p, n);
      ao += doAmbientOcclusion(uv,coord2, p, n);
    }
    ao/=(float)iterations*4.0;

    o.rgb = ao;

    return o;
}

float4 ps_lighting(VS_OUTPUT IN) : SV_Target
{
    float4 ao = float4(1.0f, 1.0f, 1.0f, 1.0f);
    float2 uv = IN.position.xy;
    uv.x /= texelSize[0];
    uv.y /= texelSize[1];
    ao = getOcclusion(uv);
    return ao;
}
VS_OUTPUT vs_Skinning(VS_INPUT_SKIN IN)
{
        VS_OUTPUT OUT = (VS_OUTPUT)0;

        float4 p = float4(0.0f, 0.0f, 0.0f, 1.0f);
        float3 norm = float3(0.0f, 0.0f, 0.0f);

        //vertex skinning
        float bone1Weight = IN.boneLinks[2];
        float bone2Weight = 1.0f - bone1Weight;
        p += bone1Weight * mul(IN.position,MatrixPalette[IN.boneLinks[0]]);
        p += bone2Weight * mul(IN.position,MatrixPalette[IN.boneLinks[1]]);
        p.w = 1.0f;
        norm += bone1Weight * mul(IN.normal,MatrixPalette[IN.boneLinks[0]]);
        norm += bone2Weight * mul(IN.normal,MatrixPalette[IN.boneLinks[1]]);
        norm = normalize(norm);
        norm = mul(norm, worldMatrix);
        OUT.normal = normalize(mul(norm, lightViewMatrix)); 

        //move pos to worldviewproj space
        float4 worldPos = mul(p, worldMatrix);
        OUT.position = mul(worldPos, viewMatrix);
        OUT.position = mul(OUT.position, projMatrix);
        
        return OUT;
}
 


Position map with buffer cleared to black:

posnw.png

Normal map with buffer cleared to black:

normalep.png

Depth map with buffer cleared to black:

depthd.png

Resulting SSAO map with all buffers cleared to green:

ssaogreen.jpg

Resulting SSAO map with all buffers cleared to white:

ssaowhite.jpg

Resulting SSAO map with all buffers cleared to black:

ssaoblack.jpg

A couple of things that spring to mind:
  • The dependance of the clear color is strange. Normally it shouldn't matter except for the background, since you want to disable blending altogether when generating the G-buffers. So: Do you have blending enabled ?
  • Your normals are bad since you forgot to encode them. Either output 0.5 * normal + 0.5 for your G-buffer or use a signed format (e.g. SNorm), so no encode/decode is needed.
  • Sampling a G-buffer is usually done with point sampling (MIN_MAG_MIP_POINT). Note that the pixel shader input SV_Position is off by half a pixel.
  • Though the depth map isn't needed later, it is generated wrong. Normalizing a scalar will result in +/- 1, so in this case it's +1 ever.

Thanks for the help!

I haven't explicitly done anything to disable blending, but I do not have a blend state set for the technique to generate the G-buffer. Does this suffice, or is there something else I should do?

EDIT: I changed the technique to do this and the clear color doesn't affect the output anymore:


BlendState DisableBlending
{
    BlendEnable[0] = FALSE;
    RenderTargetWriteMask[0] = 1 | 2 | 4 | 8;
};
technique10 SSAO_T
{
    pass P0
    {
        SetBlendState(DisableBlending, float4(0.0f, 0.0f, 0.0f, 0.0f), 0xFFFFFFFF);
        SetVertexShader( CompileShader( vs_4_0, VS_Dif() ) );
        SetPixelShader( CompileShader( ps_4_0, PS_Dif() ) );
        SetGeometryShader( NULL );
    }
}

 

About the normals, I was a bit confused as to what to do about that. I saw different code samples just writing the normals out, and some encoding it. I changed mine to do the encoding just now.

I've also changed the sampler to use point sampling. Does this automatically correct for the 0.5 offset, or do I need to do that myself?

I noticed a problem which I also fixed, and this seemed to have the most effect on the output. The textures I was rendering to were created with DXGI_FORMAT_R8G8B8A8_UNORM, and I changed that to DXGI_FORMAT_R16G16B16A16_FLOAT, as that is what is needed to hold the G-buffer.

This is my output right now, any idea where that height-map like effect is coming from?

ssaotest.jpg

You need more precision for your depth/position buffers. To lower memory requirements, devs will usually store only depth to a R32F buffer and reconstruct positions from it (either using MUL in a pixel shader or the frustum-corners trick - look up MJPs "Position from depth" blog posts or forum posts here on the forums.)

Also consider storing 0-1 linear w depth into your buffer instead of what you're doing now. This will give you better results even with 16bit float precision targets. You also shouldn't normalize it.

Here's an example:


OUT.depth = OUT.pos.w * InverseFarPlane;

I haven't explicitly done anything to disable blending, but I do not have a blend state set for the technique to generate the G-buffer. Does this suffice, or is there something else I should do?

The default blend state is opaque, so it should be ok. Although, if you have set another one any time before it will still be active, and you should reset it to NULL in the G-Buffer generation pass.

About the normals, I was a bit confused as to what to do about that. I saw different code samples just writing the normals out, and some encoding it. I changed mine to do the encoding just now.

As said, depends on the format. A UNorm format can only store values from 0 to 1, that's why the encode/decode is needed.

I've also changed the sampler to use point sampling. Does this automatically correct for the 0.5 offset, or do I need to do that myself?

It does not correct the offset actually, it just doesn't matter anymore since no interpolation is happening. So yeah, fine.

I noticed a problem which I also fixed, and this seemed to have the most effect on the output. The textures I was rendering to were created with DXGI_FORMAT_R8G8B8A8_UNORM, and I changed that to DXGI_FORMAT_R16G16B16A16_FLOAT, as that is what is needed to hold the G-buffer.

DXGI_FORMAT_R8G8B8A8_UNORM is definitively too low for the position buffer. Since your output smells like there's still a quantization problem, start with full blown 32 bit floats for all buffers until it works and lower the precision only afterwards. This format is also good for debugging (e.g. comparing outputs to full precision). Not sure if this is actually the problem, though, I think your shader is fine. Maybe I give it a shot tomorrow.

Edit: And ninja-ed. Also corrected the link, since this is D3D10.

Yep, I changed it to be 32 bit float and it worked perfectly. I'm going to go through and do some optomization now, but with my computer as it is I can run the algorithm with 255 samples and still retain 30 fps. Still, I think some optimization would help. I'll probably start by going through MJP's post. Thanks for the help!

Glad to hear it's working.

Just as an aside: One can read hardware depth since D3D10 (worked with D3D9 with vendor extension only). It's a bit tricky to get the formats right (watch the debug output, it will tell you what to do).

This topic is closed to new replies.

Advertisement