Jump to content

  • Log In with Google      Sign In   
  • Create Account

Summed Area Table for variance Shadow Mapping, again...


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
1 reply to this topic

#1 Killerregenwurm   Members   -  Reputation: 104

Like
0Likes
Like

Posted 26 September 2012 - 11:24 AM

Hello guys,

I'm trying do implement a summed area table for my shadowmap since a couple of weeks like it is in GPU Gems3. I found a lot of article an disscussions about it here and with google but I can't finde an answer for my Problem.
Im using DirectX11 with the DXUT and worked on a HLSL shader wich generate from the shadowmap a SAT-image.

c++:
[source lang="cpp"]RenderableTexture* renderSAT::createSAT(ID3D11DeviceContext* pd3dContext, RenderableTexture* Input){ shaderEffect->GetVariableByName("Samples")->AsScalar()->SetInt(SATRenderSamples); pd3dContext->RSSetViewports(1, &m_Viewport); m_original = Input; //HorizontalPass for(int i = 1; i < imgWidth; i *= SATRenderSamples) { int PassOffset[2] = { i, 0}; int done = i/SATRenderSamples; D3D11_RECT region = { done, 0, imgWidth, imgHeight}; //only first pass m_sourceEV->SetResource(i == 1 ? m_original->GetShaderResource() : m_src->GetShaderResource()); shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset); //avoids warning pd3dContext->OMSetRenderTargets(0, NULL, NULL); SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext); ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget(); pd3dContext->OMSetRenderTargets(1, &pRTV, NULL); pd3dContext->RSSetScissorRects(1, &region); pd3dContext->IASetInputLayout(NULL); pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST); SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext); pd3dContext->Draw(1,0); // Swap pointers (ping pong) std::swap(m_dest, m_src); } //vertical Pass for(int i = 1; i < imgHeight; i *= SATRenderPasses) { int PassOffset[2] = { 0,i}; int done = i/SATRenderSamples; D3D11_RECT region = { 0, done, imgWidth, imgHeight}; ID3D11ShaderResourceView* source = m_src->GetShaderResource(); m_sourceEV->SetResource(source); shaderEffect->GetVariableByName("PassOffset")->AsVector()->SetIntVector(PassOffset); //avoids warning pd3dContext->OMSetRenderTargets(0, NULL, NULL); SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext); ID3D11RenderTargetView* pRTV = m_dest->GetRenderTarget(); pd3dContext->OMSetRenderTargets(1, &pRTV, NULL); pd3dContext->RSSetScissorRects(1, &region); pd3dContext->IASetInputLayout(NULL); pd3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST); SATRenderTechnique->GetPassByIndex(0)->Apply(0, pd3dContext); pd3dContext->Draw(1,0); // Swap pointers (ping pong) std::swap(m_dest, m_src); } return m_dest;}[/source]
and the pixelshader (works on a fullscreenquad)
[source lang="cpp"]float4 generateSAT(Vertex In) : SV_Target0{ float2 dimension; Source.GetDimensions(dimension.x, dimension.y); float2 currentOffset = float2(PassOffset)/dimension;// (1, 0)/dimension or (0, 1)/dimension float4 result; for(int i = 0; i < Samples; i++) { result += Source.SampleLevel(samBorderPoint, In.Tex - i*currentOffset, 0); } return result;}[/source]

but if I sample the SAT-image with a 1x1 filter, I get this beautyful image, the image in use for the generation is displayed correctly.
Unbenannt.PNG
left: SAT with 1x1 filter right: original ShadowMap
for the drawing on screen I use a bordercolor float4(0,0,0,1).

i think it looks like that the scissorRect don't fit, but i can't imagine why.
Did someone know what could I do to solve this Problem?

Sponsor:

#2 Tsus   Members   -  Reputation: 1062

Like
0Likes
Like

Posted 29 September 2012 - 02:49 AM

Hi,

The way you compute your scan is quite expensive. I don't see the bug in your code, but I would approach it differently anyway. You could compute the complete SAT with only two compute shaders by using one vertical and one horizontal scan pass, as described by Harris et al.

Perhaps, have a look into the GPU Computing SDK of Nvidia. It contains a sample for the parallel scan (from Harris) in Cuda and OpenCL. Translating to DirectCompute should be straight-forward.

Here is some naïve scan implementation I used in Cuda to implement a SAT filter. I'm sure you can translate it to DirectCompute. (It's also a good practice, since a lot of useful code is currently only written in Cuda.)

Naive parallel scan (no up and down sweep):
__global__ void scan_naive(float *g_odata, float *g_idata)
{
	// Allocated shared memory for scan kernels
	__shared__  float temp[2*N];
	int thid = threadIdx.x;
	int bid = blockIdx.x;
	int pout = 0;
	int pin = 1;
	// Cache the computational window in shared memory
	temp[pout*N + thid] = (thid > 0) ? g_idata[bid * N + thid-1] : 0;
	for (int offset = 1; offset < N; offset *= 2)
	{
		pout = 1 - pout;
		pin  = 1 - pout;
		__syncthreads();
		temp[pout*N+thid] = temp[pin*N+thid];
		if (thid >= offset)
			temp[pout*N+thid] += temp[pin*N+thid - offset];
	}
	__syncthreads();
	g_odata[bid * N + thid] = temp[pout*N+thid];
}
N is the image width.

And here a simple function that transposes an image, so that we can apply the same scan code for the vertical and horizontal pass (in case width == height).
__global__ void transpose(float *g_odata, float *g_idata)
{
	int thid = threadIdx.x;
	int bid = blockIdx.x;
	g_odata[bid * N + thid] = g_idata[thid * N + bid];
}

Also, consider using the Load intrinsic if you read exactly from pixels. It by-passes the texture filtering and is therefore a bit faster.

Best regards!

Edited by Tsus, 29 September 2012 - 02:50 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS