SSAO running slow

Started by
6 comments, last by MJP 5 years, 9 months ago

Hi everyone : )

I'm trying to implement SSAO with D3D12 (using the implementation found on learnopengl.com https://learnopengl.com/Advanced-Lighting/SSAO), but I seem to have a performance problem...

Here is a part of the code of the SSAO pixel shader :


Texture2D PositionMap : register(t0);
Texture2D NormalMap : register(t1);
Texture2D NoiseMap : register(t2);

SamplerState s1 : register(s0);

// I hard coded the variables just for the test
const static int kernel_size = 64;
const static float2 noise_scale = float2(632.0 / 4.0, 449.0 / 4.0);
const static float radius = 0.5;
const static float bias = 0.025;

cbuffer ssao_cbuf : register(b0)
{
    float4x4 gProjectionMatrix;
    float3 SSAO_SampleKernel[64];
}

float main(VS_OUTPUT input) : SV_TARGET
{
[....]

  float occlusion = 0.0;
  for (int i = 0; i < kernel_size; i++)
  {
      float3 ksample = mul(TBN, SSAO_SampleKernel[i]);
      ksample = pos + ksample * radius;

      float4 offset = float4(ksample, 1.0);
      offset = mul(gProjectionMatrix, offset);
      offset.xyz /= offset.w;
      offset.xyz = offset.xyz * 0.5 + 0.5;

      float sampleDepth = PositionMap.Sample(s1, offset.xy).z;

      float rangeCheck = smoothstep(0.0, 1.0, radius / abs(pos.z - sampleDepth));
      occlusion += (sampleDepth >= ksample.z + bias ? 1.0 : 0.0) * rangeCheck;
  }

[....]
}

The problem is this for loop. When I run it, it takes around 140 ms to draw the frame (a simple torus knot...) on a GTX 770. Without this loop, it's 5ms. Running it without the PositionMap sampling and the matrix multiplication takes around 25ms. I understand that matrix multiplication and sampling are "expensive", but I don't think it's enough to justify the sluggish drawing time.

I suppose the shader code from the tutorial is working, so unless I've made something terribly stupid that I don't see I suppose my problem comes from something I did wrong with D3D12 that I'm not aware of (I just started learning D3D2).

Both PositionMap and NormalMap are render targets from the gbuffer, for each one I created two DescriptorHeap : one as D3D12_DESCRIPTOR_HEAP_TYPE_RTV and one as D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, and called both CreateRenderTargetView and CreateShaderResourceView.

The NoiseMap only has one descriptor heap of type D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.

Before calling DrawIndexedInstanced for the SSAO pass, I copy the relevant to a descriptor heap that I then bind, like so :


CD3DX12_CPU_DESCRIPTOR_HANDLE ssao_heap_hdl(_pSSAOPassDesciptorHeap->GetCPUDescriptorHandleForHeapStart());
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _gBuffer.PositionMap().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(), 
                              D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
ssao_heap_hdl.Offset(CBV_descriptor_inc_size);
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _gBuffer.NormalMap().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(), 
                              D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
ssao_heap_hdl.Offset(CBV_descriptor_inc_size);
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _ssaoPass.GetNoiseTexture().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(), 
                              D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);

ID3D12DescriptorHeap* descriptor_heaps[] = { _pSSAOPassDesciptorHeap };

pCommandList->SetDescriptorHeaps(1, descriptor_heaps);
pCommandList->SetGraphicsRootDescriptorTable(0, _pSSAOPassDesciptorHeap->GetGPUDescriptorHandleForHeapStart());
pCommandList->SetGraphicsRootConstantBufferView(1, _cBuffSamplesKernel[0].GetVirtualAddress());

Debug/Release build give me the same results, so do shader compilation flags with/without optimisation.

So does anyone see something weird in my code that would cause the slowness ?

By the way, when I run the pixel shader in the graphics debugger, this line :


offset.xyz /= offset.w;

does not seem to produce the expected results, the two lines in the following table are the values in the debugger before and after the execution of that line of code

  Name Value Type  

offset

offset

x = -1.631761000, y = 1.522913000, z = 2.634875000, w = 2.634875000

x = -0.619293700, y = 0.577983000, z = 2.634875000, w = 2.634875000

float4

float4

 

so X and Y are okay, not Z.

 

Please tell me if you need more info/code.

Thank you for your help !

Advertisement

Have you tried using PIX For Windows to gain some insight into the performance of this Draw Call?

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

What happens if you force the compiler to NOT unroll the loop?

1 hour ago, ajmiles said:

Have you tried using PIX For Windows to gain some insight into the performance of this Draw Call?

I didn't know that tool ! So I downloaded it to try, and when I clicked the start analysis button, a message box popped up telling me "Hey, you're stupid !". Seriously, it said "This capture was created on a different GPU (Microsoft Basic Render Driver..." I didn't notice it, but I was asking for D3D_FEATURE_LEVEL_11_1, and the GTX 770 supports up to D3D_FEATURE_LEVEL_11_0 -.-

So thank you very much ajmiles, you solved my issue !!

37 minutes ago, JoeJ said:

What happens if you force the compiler to NOT unroll the loop?

Thanks for your answer JoeJ ! I tried to specify [roll] but it did not change the render time. Setting [unroll] on the other side reduced it to around 73ms !

 

Sorry I wasted your time on something so stupid...

Thanks a lot to both of you !

WARP can be surprisingly fast sometimes. I've written and run little samples for days at a time and forgotten I'd hardcoded my "warp=true" codepath to true and only later discovered my error! Throw enough work at it though and you eventually realise you're not running on a GPU at all :)

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

4 minutes ago, ajmiles said:

WARP can be surprisingly fast sometimes.

Yes, that's what I thought when I realized it !

By the way, do I have to close the thread ? I don't see any close thread button :|

5 hours ago, GMCommand said:

Yes, that's what I thought when I realized it !

By the way, do I have to close the thread ? I don't see any close thread button :|

There's no option to do that here. Sometimes people like to jump in with an extra question or comment even after the initial question was answered, so we leave the threads open. :)

This topic is closed to new replies.

Advertisement