Sign in to follow this  
Killerregenwurm

Summed Area Table for variance Shadow Mapping, again...

Recommended Posts

Hello guys,

I'm trying do implement a summed area table for my shadowmap since a couple of weeks like it is in GPU Gems3. I found a lot of article an disscussions about it here and with google but I can't finde an answer for my Problem.
Im using DirectX11 with the DXUT and worked on a HLSL shader wich generate from the shadowmap a SAT-image.

c++:
[source lang="cpp"]RenderableTexture* renderSAT::createSAT(ID3D11DeviceContext* pd3dContext, RenderableTexture* Input)
{

shaderEffect->GetVariableByName("Samples")->AsScalar()->SetInt(SATRenderSamples);
pd3dContext->RSSetViewports(1, &m_Viewport);
m_original = Input;
//HorizontalPass
for(int i = 1; i < imgWidth; i *= SATRenderSamples)
{
int PassOffset[2] = { i, 0};

int done = i/SATRenderSamples;
D3D11_RECT region = { done, 0, imgWidth, imgHeight};
//only first pass
m_sourceEV->SetResource(i == 1 ? m_original->GetShaderResource() : m_src->GetShaderResource());
shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset);
//avoids warning
pd3dContext->OMSetRenderTargets(0, NULL, NULL);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);

ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget();
pd3dContext->OMSetRenderTargets(1, &pRTV, NULL);
pd3dContext->RSSetScissorRects(1, &region);
pd3dContext->IASetInputLayout(NULL);
pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);
pd3dContext->Draw(1,0);
// Swap pointers (ping pong)
std::swap(m_dest, m_src);
}
//vertical Pass
for(int i = 1; i < imgHeight; i *= SATRenderPasses)
{
int PassOffset[2] = { 0,i};

int done = i/SATRenderSamples;
D3D11_RECT region = { 0, done, imgWidth, imgHeight};
ID3D11ShaderResourceView* source = m_src->GetShaderResource();
m_sourceEV->SetResource(source);
shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset);
//avoids warning
pd3dContext->OMSetRenderTargets(0, NULL, NULL);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);

ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget();
pd3dContext->OMSetRenderTargets(1, &pRTV, NULL);
pd3dContext->RSSetScissorRects(1, &region);
pd3dContext->IASetInputLayout(NULL);
pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);
pd3dContext->Draw(1,0);
// Swap pointers (ping pong)
std::swap(m_dest, m_src);
}
return m_dest;
}
[/source]
and the pixelshader (works on a fullscreenquad)
[source lang="cpp"]float4 generateSAT(Vertex In) : SV_Target0
{
float2 dimension;
Source.GetDimensions(dimension.x, dimension.y);
float2 currentOffset = float2(PassOffset)/dimension;// (1, 0)/dimension or (0, 1)/dimension
float4 result;
for(int i = 0; i < Samples; i++)
{
result += Source.SampleLevel(samBorderPoint, In.Tex - i*currentOffset, 0);
}
return result;
}
[/source]

but if I sample the SAT-image with a 1x1 filter, I get this beautyful image, the image in use for the generation is displayed correctly.
[attachment=11469:Unbenannt.PNG]
left: SAT with 1x1 filter right: original ShadowMap
for the drawing on screen I use a bordercolor float4(0,0,0,1).

i think it looks like that the scissorRect don't fit, but i can't imagine why.
Did someone know what could I do to solve this Problem?

Share this post


Link to post
Share on other sites
Hi,

The way you compute your scan is quite expensive. I don't see the bug in your code, but I would approach it differently anyway. You could compute the complete SAT with only two compute shaders by using one vertical and one horizontal scan pass, as described by [url="http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html"]Harris et al[/url].

Perhaps, have a look into the GPU Computing SDK of Nvidia. It contains a [url="http://developer.nvidia.com/cuda/cuda-cc-sdk-code-samples#scan"]sample[/url] for the parallel scan (from Harris) in Cuda and OpenCL. Translating to DirectCompute should be straight-forward.

Here is some naïve scan implementation I used in Cuda to implement a SAT filter. I'm sure you can translate it to DirectCompute. (It's also a good practice, since a lot of useful code is currently only written in Cuda.)

Naive parallel scan (no up and down sweep):
[CODE]
__global__ void scan_naive(float *g_odata, float *g_idata)
{
// Allocated shared memory for scan kernels
__shared__ float temp[2*N];
int thid = threadIdx.x;
int bid = blockIdx.x;
int pout = 0;
int pin = 1;
// Cache the computational window in shared memory
temp[pout*N + thid] = (thid > 0) ? g_idata[bid * N + thid-1] : 0;
for (int offset = 1; offset < N; offset *= 2)
{
pout = 1 - pout;
pin = 1 - pout;
__syncthreads();
temp[pout*N+thid] = temp[pin*N+thid];
if (thid >= offset)
temp[pout*N+thid] += temp[pin*N+thid - offset];
}
__syncthreads();
g_odata[bid * N + thid] = temp[pout*N+thid];
}
[/CODE]
N is the image width.

And here a simple function that transposes an image, so that we can apply the same scan code for the vertical and horizontal pass (in case width == height).
[CODE]
__global__ void transpose(float *g_odata, float *g_idata)
{
int thid = threadIdx.x;
int bid = blockIdx.x;
g_odata[bid * N + thid] = g_idata[thid * N + bid];
}
[/CODE]

Also, consider using the [url="http://msdn.microsoft.com/en-us/library/windows/desktop/bb509694%28v=vs.85%29.aspx"]Load[/url] intrinsic if you read exactly from pixels. It by-passes the texture filtering and is therefore a bit faster.

Best regards! Edited by Tsus

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this