# DX11 SSAO Using 32-bit pixel format as NormalDepth Texturemap

## Recommended Posts

Hey Guys,
I'm doing Exercise5 Ch22 SSAO on Frank Luna's DX11 book, I used DXGI_FORMAT_R8G8B8A8_UNORM to replace DXGI_FORMAT_R16G16B16A16_FLOAT when building normalDepth texture map.
When using DXGI_FORMAT_R16G16B16A16_FLOAT, I store view space normal to RGB channel, the alpha channel stores the view space depth(z-coordinate). Now using DXGI_FORMAT_R8G8B8A8_UNORM, I store normal vector x- and y- coordinate to RG channel, and BA combined store 16-bit depth value.
I construct the normal z-coordinate by nz = -sqrt(1-x^2-y^2).
To store the view space depth over two 8-bit UNORM channels, I normalized z to [0, 1] by dividing by the far plane depth zFar. Then I used a little tricks to save 8 most and 8 least significant digits to BA 16-bit channels(following code below).
When rendering normal and depth values of the scene to the DXGI_FORMAT_R8G8B8A8_UNORM 2D texture, the main code is

cbuffer cbPerScene
{
float gZFar;
};

struct VertexIn
{
float3 PosL    : POSITION;
float3 NormalL : NORMAL;
float2 Tex     : TEXCOORD;
};

struct VertexOut
{
float4 PosH       : SV_POSITION;
float3 PosV       : POSITION;
float3 NormalV    : NORMAL;
float2 Tex        : TEXCOORD0;
};

VertexOut VS(VertexIn vin)
{
VertexOut vout;

// Transform to view space.
vout.PosV    = mul(float4(vin.PosL, 1.0f), gWorldView).xyz;
vout.NormalV = mul(vin.NormalL, (float3x3)gWorldInvTransposeView);

// Transform to homogeneous clip space.
vout.PosH = mul(float4(vin.PosL, 1.0f), gWorldViewProj);

// Output vertex attributes for interpolation across triangle.
vout.Tex = mul(float4(vin.Tex, 0.0f, 1.0f), gTexTransform).xy;

return vout;
}

float4 PS(VertexOut pin, uniform bool gAlphaClip) : SV_Target
{
// Interpolating normal can unnormalize it, so normalize it.
pin.NormalV = normalize(pin.NormalV);

if(gAlphaClip)
{
float4 texColor = gDiffuseMap.Sample( samLinear, pin.Tex );

clip(texColor.a - 0.1f);
}

float4 normalDepth = float4(0, 0, 0, 0);
normalDepth.rg = pin.NormalV.rg;
float depth = pin.PosV.b;
float z = depth / gZFar;
normalDepth.ba = float2(z, frac(256.0f*z));
return normalDepth;
}

technique11 NormalDepth
{
pass P0
{
}
}


When using this DXGI_FORMAT_R8G8B8A8_UNORM texture to build SSAO, the main code is

cbuffer cbPerFrame
{
float4x4 gViewToTexSpace; // Proj*Texture
float4   gOffsetVectors[14];
float4   gFrustumCorners[4];
float     gZFar;

// Coordinates given in view space.
float    gSurfaceEpsilon     = 0.05f;
};

Texture2D gNormalDepthMap;
Texture2D gRandomVecMap;

SamplerState samNormalDepth
{
Filter = MIN_MAG_LINEAR_MIP_POINT;

// Set a very far depth value if sampling outside of the NormalDepth map
// so we do not get false occlusions.
BorderColor = float4(0.0f, 0.0f, 0.0f, 1e5f);
};

SamplerState samRandomVec
{
Filter = MIN_MAG_LINEAR_MIP_POINT;
};

struct VertexIn
{
float3 PosL            : POSITION;
float3 ToFarPlaneIndex : NORMAL;
float2 Tex             : TEXCOORD;
};

struct VertexOut
{
float4 PosH       : SV_POSITION;
float3 ToFarPlane : TEXCOORD0;
float2 Tex        : TEXCOORD1;
};

VertexOut VS(VertexIn vin)
{
VertexOut vout;

vout.PosH = float4(vin.PosL, 1.0f);

// We store the index to the frustum corner in the normal x-coord slot.
vout.ToFarPlane = gFrustumCorners[vin.ToFarPlaneIndex.x].xyz;

vout.Tex = vin.Tex;

return vout;
}

// Determines how much the sample point q occludes the point p as a function
// of distZ.
float OcclusionFunction(float distZ)
{
//
// If depth(q) is "behind" depth(p), then q cannot occlude p.  Moreover, if
// depth(q) and depth(p) are sufficiently close, then we also assume q cannot
// occlude p because q needs to be in front of p by Epsilon to occlude p.
//
// We use the following function to determine the occlusion.
//
//
//       1.0     -------------\
//               |           |  \
//               |           |    \
//               |           |      \
//               |           |        \
//               |           |          \
//               |           |            \
//  ------|------|-----------|-------------|---------|--> zv
//        0     Eps          z0            z1
//

float occlusion = 0.0f;
if(distZ > gSurfaceEpsilon)
{

// Linearly decrease occlusion from 1 to 0 as distZ goes
}

return occlusion;
}

float4 PS(VertexOut pin, uniform int gSampleCount) : SV_Target
{
// p -- the point we are computing the ambient occlusion for.
// n -- normal vector at p.
// q -- a random offset from p.
// r -- a potential occluder that might occlude p.

// Get viewspace normal and z-coord of this pixel.  The tex-coords for
float4 normalDepth = gNormalDepthMap.SampleLevel(samNormalDepth, pin.Tex, 0.0f);

float2 nxy = normalDepth.rg;
float nz = sqrt(1 - pow(nxy.r, 2) - pow(nxy.g, 2));
nz = -nz;
float3 n = float3(nxy, nz);
float pz = normalDepth.b + normalDepth.a/256.0f;
pz *= gZFar;

//
// Reconstruct full view space position (x,y,z).
// Find t such that p = t*pin.ToFarPlane.
// p.z = t*pin.ToFarPlane.z
// t = p.z / pin.ToFarPlane.z
//
float3 p = (pz/pin.ToFarPlane.z)*pin.ToFarPlane;

// Extract random vector and map from [0,1] --> [-1, +1].
float3 randVec = 2.0f*gRandomVecMap.SampleLevel(samRandomVec, 4.0f*pin.Tex, 0.0f).rgb - 1.0f;

float occlusionSum = 0.0f;

// Sample neighboring points about p in the hemisphere oriented by n.
[unroll]
for(int i = 0; i < gSampleCount; ++i)
{
// Are offset vectors are fixed and uniformly distributed (so that our offset vectors
// do not clump in the same direction).  If we reflect them about a random vector
// then we get a random uniform distribution of offset vectors.
float3 offset = reflect(gOffsetVectors[i].xyz, randVec);

// Flip offset vector if it is behind the plane defined by (p, n).
float flip = sign( dot(offset, n) );

// Sample a point near p within the occlusion radius.
float3 q = p + flip * gOcclusionRadius * offset;

// Project q and generate projective tex-coords.
float4 projQ = mul(float4(q, 1.0f), gViewToTexSpace);
projQ /= projQ.w;

// Find the nearest depth value along the ray from the eye to q (this is not
// the depth of q, as q is just an arbitrary point near p and might
// occupy empty space).  To find the nearest depth we look it up in the depthmap.

float2 rz = gNormalDepthMap.SampleLevel(samNormalDepth, projQ.xy, 0.0f).ba;
float rpz = rz.r + rz.g/256.0f;
rpz *= gZFar;

// Reconstruct full view space position r = (rx,ry,rz).  We know r
// lies on the ray of q, so there exists a t such that r = t*q.
// r.z = t*q.z ==> t = r.z / q.z

float3 r = (rpz / q.z) * q;

//
// Test whether r occludes p.
//   * The product dot(n, normalize(r - p)) measures how much in front
//     of the plane(p,n) the occluder point r is.  The more in front it is, the
//     more occlusion weight we give it.  This also prevents self shadowing where
//     a point r on an angled plane (p,n) could give a false occlusion since they
//     have different depth values with respect to the eye.
//   * The weight of the occlusion is scaled based on how far the occluder is from
//     the point we are computing the occlusion of.  If the occluder r is far away
//     from p, then it does not occlude it.
//

float distZ = p.z - r.z;
float dp = max(dot(n, normalize(r - p)), 0.0f);
float occlusion = dp * OcclusionFunction(distZ);

occlusionSum += occlusion;
}

occlusionSum /= gSampleCount;

float access = 1.0f - occlusionSum;

// Sharpen the contrast of the SSAO map to make the SSAO affect more dramatic.
return saturate(pow(access, 4.0f));
}

technique11 Ssao
{
pass P0
{
}
}


When I check the SSAO texture before bluring with camera to an angle, the image is

[attachment=18446:2013-10-19_164907.jpg]

and then I move camera to the right, the image is

[attachment=18447:2013-10-19_170115.jpg]

Basically, when I move camera, the black and white areas vary heavily in the SSAO image.It's like getting an annoying amount of halo-ing on these surfaces.

The image below is the original SSAO image before bluring using DXGI_FORMAT_R16G16B16A16_FLOAT

[attachment=18448:2013-10-19_170332.jpg]

The false display has something to do with view position and orientation, I tried to modify the cosntants value in OcclusionFunction, such as gOcclusionRadius, but it didn't work, not apparently..

How can I wipe out the wrong dark display when it's not occluded? What could be causing this?

Thank you very much.

##### Share on other sites

Since the only thing you changed was the format of the texture, and the corresponding mechanisms for reading/writing the texture, then I would assume that there is an issue in the code somewhere, or there is an inherent problem with using a low resolution texture for the data you need.

Have you tried to visualize the depth/normal texture prior to it being used?  This will likely give you a great insight into whether or not you are accurately reproducing the same data.  Create a simple shader to read the normal information and display in both the old code and the new code - then you can quickly see the differences visually.  If that looks reasonably similar, then I would check on the depth channels as well - make a similar before and after comparison.

It is just a hunch, but since you said you use a 'clever' trick for storing the upper and lower 8 bits into separate channels, I would suspect this as a potential issue.  Have you validated your technique with some test values?  Done any shader debugging to watch what value comes out of the reading functions?  Start here, and you should be able to find the issue.

##### Share on other sites

Could it be that your x,y position values are just too in accurate. Consider that there is only 256 different values for x,y positions and your buffer resolution is already bigger than each of the values?

Why not just store the depth and reconstruct the position from screen space x,y positions and the depth.

Cheers!

##### Share on other sites

Hey Guys, I'm back

Like Jason Z suggested, I tried to visualize the Normal Depth texture before and after using on next stage. I compare these two on R G B A channel respectively, which indicates normal_x normal_y depthz_hi8bits depthz_lo8bits. the result showing below.

[attachment=18458:2013-10-20_red.jpg]

upper-right is after,  lower-right is before [R]

[attachment=18459:2013-10-20_green.jpg]

upper-right is after,  lower-right is before [G]

[attachment=18460:2013-10-20_blue.jpg]

upper-right is after,  lower-right is before [B]

[attachment=18461:2013-10-20_alpha.jpg]

upper-right is after,  lower-right is before [A]

Observing these results, I found they're all corresponding to each other, However, I found halo-ing thing happening on A channel display just like zebra line, which is added to the final image. It's because A channel stores 8 low bits of depth z value, which varies heavily even on the same surface inherently, hence, I think the final image with halo-ing kind of thing has something related to it. I shall continue research on it

##### Share on other sites

Could it be that your x,y position values are just too in accurate. Consider that there is only 256 different values for x,y positions and your buffer resolution is already bigger than each of the values?

Why not just store the depth and reconstruct the position from screen space x,y positions and the depth.

Cheers!

Hi kauna

through tests with images above, I think a 2-8bit format storing x,y of normal values is OK. Maybe this weird display is due to some process with the low 8 bits of depth value.

Thanks anyway!

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627644
• Total Posts
2978373
• ### Similar Content

• hi,
i have read very much about the binding of a constantbuffer to a shader but something is still unclear to me.
e.g. when performing :   vertexshader.setConstantbuffer ( buffer,  slot )
is the buffer bound
or
b. to the VertexShader that is currently set as the active VertexShader
Is it possible to bind a constantBuffer to a VertexShader e.g. VS_A and keep this binding even after the active VertexShader has changed ?
I mean i want to bind constantbuffer_A  to VS_A, an Constantbuffer_B to VS_B  and  only use updateSubresource without using setConstantBuffer command every time.

Look at this example:
perform drawcall       ( buffer_A is used )

perform drawcall   ( buffer_B is used )
perform drawcall   (now which buffer is used ??? )

I ask this question because i have made a custom render engine an want to optimize to
the minimum  updateSubresource, and setConstantbuffer  calls

• I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:
IASetVertexBuffers IASetIndexBuffer VSSetConstantBuffers PSSetConstantBuffers  and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.
Do I need to rebind the buffer again in order for my update to take effect? If I dont rebind is that really bad as in I get a performance hit? My thought process behind this is that if the buffer is already bound why do I need to rebind it? I'm using that same buffer it is just different data

• I am really stuck with something that should be very simple in DirectX 11.
1. I can draw lines using a PC (position, colored) vertices and a simple shader just fine.
2. I can draw 3D triangles using PCN (position, colored, normal) vertices just fine (even transparency and SpecularBlinnPhong shaders).

However, if I'm using my 3D shader, and I want to draw my PC lines in the same scene how can I do that?

If I change my lines to PCN and pass them to the 3D shader with my triangles, then the lighting screws them all up.  I only want the lighting for the 3D triangles, but no SpecularBlinnPhong/Lighting for the lines (just PC).
I am sure this is because if I change the lines to PNC there is not really a correct "normal" for the lines.
I assume I somehow need to draw the 3D triangles using one shader, and then "switch" to another shader and draw the lines?  But I have no clue how to use two different shaders in the same scene.  And then are the lines just drawn on top of the triangles, or vice versa (maybe draw order dependent)?
I must be missing something really basic, so if anyone can just point me in the right direction (or link to an example showing the implementation of multiple shaders) that would be REALLY appreciated.

I'm also more than happy to post my simple test code if that helps as well!

• By Reitano
Hi,
I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs.
In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset.
The new allocator appears to be working but I am not 100% confident it is actually correct. In particular:
1) it relies on queries which I am not too familiar with. Are they 100% reliable ?
2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code:
BeginFrame:
page.data = device.Map(page.buffer)
device.Unmap(page.buffer)
RenderFrame
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
(Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer)
Is this valid ?
3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ?
4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet.
Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ?
For reference, I've uploaded the (WIP) allocator code at https://paste.ofcode.org/Bq98ujP6zaAuKyjv4X7HSv.  Feel free to adapt it in your engine and please let me know if you spot any mistakes
Thanks
Stefano Lanza

• Hey all. I've been working with compute shaders lately, and was hoping to build out some libraries to reuse code. As a prerequisite for my current project, I needed to sort a big array of data in my compute shader, so I was going to implement quicksort as a library function. My implementation was going to use an inout array to apply the changes to the referenced array.

I spent half the day yesterday debugging in visual studio before I realized that the solution, while it worked INSIDE the function, reverted to the original state after returning from the function.

My hack fix was just to inline the code, but this is not a great solution for the future.  Any ideas? I've considered just returning an array of ints that represents the sorted indices.

• 10
• 12
• 22
• 13
• 33