Jump to content
  • Advertisement
Sign in to follow this  
Killerregenwurm

Summed Area Table for variance Shadow Mapping, again...

This topic is 2150 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello guys,

I'm trying do implement a summed area table for my shadowmap since a couple of weeks like it is in GPU Gems3. I found a lot of article an disscussions about it here and with google but I can't finde an answer for my Problem.
Im using DirectX11 with the DXUT and worked on a HLSL shader wich generate from the shadowmap a SAT-image.

c++:
[source lang="cpp"]RenderableTexture* renderSAT::createSAT(ID3D11DeviceContext* pd3dContext, RenderableTexture* Input)
{

shaderEffect->GetVariableByName("Samples")->AsScalar()->SetInt(SATRenderSamples);
pd3dContext->RSSetViewports(1, &m_Viewport);
m_original = Input;
//HorizontalPass
for(int i = 1; i < imgWidth; i *= SATRenderSamples)
{
int PassOffset[2] = { i, 0};

int done = i/SATRenderSamples;
D3D11_RECT region = { done, 0, imgWidth, imgHeight};
//only first pass
m_sourceEV->SetResource(i == 1 ? m_original->GetShaderResource() : m_src->GetShaderResource());
shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset);
//avoids warning
pd3dContext->OMSetRenderTargets(0, NULL, NULL);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);

ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget();
pd3dContext->OMSetRenderTargets(1, &pRTV, NULL);
pd3dContext->RSSetScissorRects(1, &region);
pd3dContext->IASetInputLayout(NULL);
pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);
pd3dContext->Draw(1,0);
// Swap pointers (ping pong)
std::swap(m_dest, m_src);
}
//vertical Pass
for(int i = 1; i < imgHeight; i *= SATRenderPasses)
{
int PassOffset[2] = { 0,i};

int done = i/SATRenderSamples;
D3D11_RECT region = { 0, done, imgWidth, imgHeight};
ID3D11ShaderResourceView* source = m_src->GetShaderResource();
m_sourceEV->SetResource(source);
shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset);
//avoids warning
pd3dContext->OMSetRenderTargets(0, NULL, NULL);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);

ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget();
pd3dContext->OMSetRenderTargets(1, &pRTV, NULL);
pd3dContext->RSSetScissorRects(1, &region);
pd3dContext->IASetInputLayout(NULL);
pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);
SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext);
pd3dContext->Draw(1,0);
// Swap pointers (ping pong)
std::swap(m_dest, m_src);
}
return m_dest;
}
[/source]
and the pixelshader (works on a fullscreenquad)
[source lang="cpp"]float4 generateSAT(Vertex In) : SV_Target0
{
float2 dimension;
Source.GetDimensions(dimension.x, dimension.y);
float2 currentOffset = float2(PassOffset)/dimension;// (1, 0)/dimension or (0, 1)/dimension
float4 result;
for(int i = 0; i < Samples; i++)
{
result += Source.SampleLevel(samBorderPoint, In.Tex - i*currentOffset, 0);
}
return result;
}
[/source]

but if I sample the SAT-image with a 1x1 filter, I get this beautyful image, the image in use for the generation is displayed correctly.
[attachment=11469:Unbenannt.PNG]
left: SAT with 1x1 filter right: original ShadowMap
for the drawing on screen I use a bordercolor float4(0,0,0,1).

i think it looks like that the scissorRect don't fit, but i can't imagine why.
Did someone know what could I do to solve this Problem?

Share this post


Link to post
Share on other sites
Advertisement
Hi,

The way you compute your scan is quite expensive. I don't see the bug in your code, but I would approach it differently anyway. You could compute the complete SAT with only two compute shaders by using one vertical and one horizontal scan pass, as described by Harris et al.

Perhaps, have a look into the GPU Computing SDK of Nvidia. It contains a sample for the parallel scan (from Harris) in Cuda and OpenCL. Translating to DirectCompute should be straight-forward.

Here is some naïve scan implementation I used in Cuda to implement a SAT filter. I'm sure you can translate it to DirectCompute. (It's also a good practice, since a lot of useful code is currently only written in Cuda.)

Naive parallel scan (no up and down sweep):

__global__ void scan_naive(float *g_odata, float *g_idata)
{
// Allocated shared memory for scan kernels
__shared__ float temp[2*N];
int thid = threadIdx.x;
int bid = blockIdx.x;
int pout = 0;
int pin = 1;
// Cache the computational window in shared memory
temp[pout*N + thid] = (thid > 0) ? g_idata[bid * N + thid-1] : 0;
for (int offset = 1; offset < N; offset *= 2)
{
pout = 1 - pout;
pin = 1 - pout;
__syncthreads();
temp[pout*N+thid] = temp[pin*N+thid];
if (thid >= offset)
temp[pout*N+thid] += temp[pin*N+thid - offset];
}
__syncthreads();
g_odata[bid * N + thid] = temp[pout*N+thid];
}

N is the image width.

And here a simple function that transposes an image, so that we can apply the same scan code for the vertical and horizontal pass (in case width == height).

__global__ void transpose(float *g_odata, float *g_idata)
{
int thid = threadIdx.x;
int bid = blockIdx.x;
g_odata[bid * N + thid] = g_idata[thid * N + bid];
}


Also, consider using the Load intrinsic if you read exactly from pixels. It by-passes the texture filtering and is therefore a bit faster.

Best regards! Edited by Tsus

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!