• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By cozzie
      Hi all,
      As a part of the debug drawing system in my engine,  I want to add support for rendering simple text on screen  (aka HUD/ HUD style). From what I've read there are a few options, in short:
      1. Write your own font sprite renderer
      2. Using Direct2D/Directwrite, combine with DX11 rendertarget/ backbuffer
      3. Use an external library, like the directx toolkit etc.
      I want to go for number 2, but articles/ documentation confused me a bit. Some say you need to create a DX10 device, to be able to do this, because it doesn't directly work with the DX11 device.  But other articles tell that this was 'patched' later on and should work now.
      Can someone shed some light on this and ideally provide me an example or article on  how to set this up?
      All input is appreciated.
    • By stale
      I've just started learning about tessellation from Frank Luna's DX11 book. I'm getting some very weird behavior when I try to render a tessellated quad patch if I also render a mesh in the same frame. The tessellated quad patch renders just fine if it's the only thing I'm rendering. This is pictured below:
      However, when I attempt to render the same tessellated quad patch along with the other entities in the scene (which are simple triangle-lists), I get the following error:

      I have no idea why this is happening, and google searches have given me no leads at all. I use the following code to render the tessellated quad patch:
      ID3D11DeviceContext* dc = GetGFXDeviceContext(); dc->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_4_CONTROL_POINT_PATCHLIST); dc->IASetInputLayout(ShaderManager::GetInstance()->m_JQuadTess->m_InputLayout); float blendFactors[] = { 0.0f, 0.0f, 0.0f, 0.0f }; // only used with D3D11_BLEND_BLEND_FACTOR dc->RSSetState(m_rasterizerStates[RSWIREFRAME]); dc->OMSetBlendState(m_blendStates[BSNOBLEND], blendFactors, 0xffffffff); dc->OMSetDepthStencilState(m_depthStencilStates[DSDEFAULT], 0); ID3DX11EffectTechnique* activeTech = ShaderManager::GetInstance()->m_JQuadTess->Tech; D3DX11_TECHNIQUE_DESC techDesc; activeTech->GetDesc(&techDesc); for (unsigned int p = 0; p < techDesc.Passes; p++) { TerrainVisual* terrainVisual = (TerrainVisual*)entity->m_VisualComponent; UINT stride = sizeof(TerrainVertex); UINT offset = 0; GetGFXDeviceContext()->IASetVertexBuffers(0, 1, &terrainVisual->m_VB, &stride, &offset); Vector3 eyePos = Vector3(cam->m_position); Matrix rotation = Matrix::CreateFromYawPitchRoll(entity->m_rotationEuler.x, entity->m_rotationEuler.y, entity->m_rotationEuler.z); Matrix model = rotation * Matrix::CreateTranslation(entity->m_position); Matrix view = cam->GetLookAtMatrix(); Matrix MVP = model * view * m_ProjectionMatrix; ShaderManager::GetInstance()->m_JQuadTess->SetEyePosW(eyePos); ShaderManager::GetInstance()->m_JQuadTess->SetWorld(model); ShaderManager::GetInstance()->m_JQuadTess->SetWorldViewProj(MVP); activeTech->GetPassByIndex(p)->Apply(0, GetGFXDeviceContext()); GetGFXDeviceContext()->Draw(4, 0); } dc->RSSetState(0); dc->OMSetBlendState(0, blendFactors, 0xffffffff); dc->OMSetDepthStencilState(0, 0); I draw my scene by looping through the list of entities and calling the associated draw method depending on the entity's "visual type":
      for (unsigned int i = 0; i < scene->GetEntityList()->size(); i++) { Entity* entity = scene->GetEntityList()->at(i); if (entity->m_VisualComponent->m_visualType == VisualType::MESH) DrawMeshEntity(entity, cam, sun, point); else if (entity->m_VisualComponent->m_visualType == VisualType::BILLBOARD) DrawBillboardEntity(entity, cam, sun, point); else if (entity->m_VisualComponent->m_visualType == VisualType::TERRAIN) DrawTerrainEntity(entity, cam); } HR(m_swapChain->Present(0, 0)); Any help/advice would be much appreciated!
    • By KaiserJohan
      Am trying a basebones tessellation shader and getting unexpected result when increasing the tessellation factor. Am rendering a group of quads and trying to apply tessellation to them.
      OutsideTess = (1,1,1,1), InsideTess= (1,1)

      OutsideTess = (1,1,1,1), InsideTess= (2,1)

      I expected 4 triangles in the quad, not two. Any idea of whats wrong?
      struct PatchTess { float mEdgeTess[4] : SV_TessFactor; float mInsideTess[2] : SV_InsideTessFactor; }; struct VertexOut { float4 mWorldPosition : POSITION; float mTessFactor : TESS; }; struct DomainOut { float4 mWorldPosition : SV_POSITION; }; struct HullOut { float4 mWorldPosition : POSITION; }; Hull shader:
      PatchTess PatchHS(InputPatch<VertexOut, 3> inputVertices) { PatchTess patch; patch.mEdgeTess[ 0 ] = 1; patch.mEdgeTess[ 1 ] = 1; patch.mEdgeTess[ 2 ] = 1; patch.mEdgeTess[ 3 ] = 1; patch.mInsideTess[ 0 ] = 2; patch.mInsideTess[ 1 ] = 1; return patch; } [domain("quad")] [partitioning("fractional_odd")] [outputtopology("triangle_ccw")] [outputcontrolpoints(4)] [patchconstantfunc("PatchHS")] [maxtessfactor( 64.0 )] HullOut hull_main(InputPatch<VertexOut, 3> verticeData, uint index : SV_OutputControlPointID) { HullOut ret; ret.mWorldPosition = verticeData[index].mWorldPosition; return ret; }  
      Domain shader:
      [domain("quad")] DomainOut domain_main(PatchTess patchTess, float2 uv : SV_DomainLocation, const OutputPatch<HullOut, 4> quad) { DomainOut ret; const float MipInterval = 20.0f; ret.mWorldPosition.xz = quad[ 0 ].mWorldPosition.xz * ( 1.0f - uv.x ) * ( 1.0f - uv.y ) + quad[ 1 ].mWorldPosition.xz * uv.x * ( 1.0f - uv.y ) + quad[ 2 ].mWorldPosition.xz * ( 1.0f - uv.x ) * uv.y + quad[ 3 ].mWorldPosition.xz * uv.x * uv.y ; ret.mWorldPosition.y = quad[ 0 ].mWorldPosition.y; ret.mWorldPosition.w = 1; ret.mWorldPosition = mul( gFrameViewProj, ret.mWorldPosition ); return ret; }  
      Any ideas what could be wrong with these shaders?
    • By simco50
      I've stumbled upon Urho3D engine and found that it has a really nice and easy to read code structure.
      I think the graphics abstraction looks really interesting and I like the idea of how it defers pipeline state changes until just before the draw call to resolve redundant state changes.
      This is done by saving the state changes (blendEnabled/SRV changes/RTV changes) in member variables and just before the draw, apply the actual state changes using the graphics context.
      It looks something like this (pseudo):
      void PrepareDraw() { if(renderTargetsDirty) { pD3D11DeviceContext->OMSetRenderTarget(mCurrentRenderTargets); renderTargetsDirty = false } if(texturesDirty) { pD3D11DeviceContext->PSSetShaderResourceView(..., mCurrentSRVs); texturesDirty = false } .... //Some more state changes } This all looked like a great design at first but I've found that there is one big issue with this which I don't really understand how it is solved in their case and how I would tackle it.
      I'll explain it by example, imagine I have two rendertargets: my backbuffer RT and an offscreen RT.
      Say I want to render my backbuffer to the offscreen RT and then back to the backbuffer (Just for the sake of the example).
      You would do something like this:
      //Render to the offscreen RT pGraphics->SetRenderTarget(pOffscreenRT->GetRTV()); pGraphics->SetTexture(diffuseSlot, pDefaultRT->GetSRV()) pGraphics->DrawQuad() pGraphics->SetTexture(diffuseSlot, nullptr); //Remove the default RT from input //Render to the default (screen) RT pGraphics->SetRenderTarget(nullptr); //Default RT pGraphics->SetTexture(diffuseSlot, pOffscreenRT->GetSRV()) pGraphics->DrawQuad(); The problem here is that the second time the application loop comes around, the offscreen rendertarget is still bound as input ShaderResourceView when it gets set as a RenderTargetView because in Urho3D, the state of the RenderTargetView will always be changed before the ShaderResourceViews (see top code snippet) even when I set the SRV to nullptr before using it as a RTV like above causing errors because a resource can't be bound to both input and rendertarget.
      What is usually the solution to this?
    • By MehdiUBP
      I wrote a MatCap shader following this idea:
      Given the image representing the texture, we compute the sample point by taking the dot product of the vertex normal and the camera position and remapping this to [0,1].
      This seems to work well when I look straight at an object with this shader. However, in cases where the camera points slightly on the side, I can see the texture stretch a lot.
      Could anyone give me a hint as how to get a nice matcap shader ?
      Here's what I wrote:
      Shader "Unlit/Matcap"
              _MainTex ("Texture", 2D) = "white" {}
              Tags { "RenderType"="Opaque" }
              LOD 100
                  #pragma vertex vert
                  #pragma fragment frag
                  // make fog work
                  #include "UnityCG.cginc"
                  struct appdata
                      float4 vertex : POSITION;
                      float3 normal : NORMAL;
                  struct v2f
                      float2 worldNormal : TEXCOORD0;
                      float4 vertex : SV_POSITION;
                  sampler2D _MainTex;            
                  v2f vert (appdata v)
                      v2f o;
                      o.vertex = UnityObjectToClipPos(v.vertex);
                      o.worldNormal = mul((float3x3)UNITY_MATRIX_V, UnityObjectToWorldNormal(v.normal)).xy*0.3 + 0.5;  //UnityObjectToClipPos(v.normal)*0.5 + 0.5;
                      return o;
                  fixed4 frag (v2f i) : SV_Target
                      // sample the texture
                      fixed4 col = tex2D(_MainTex, i.worldNormal);
                      // apply fog
                      return col;
  • Advertisement
  • Advertisement

DX11 Weird performance problem with SSAO

Recommended Posts

Hey folks. So I'm having this problem in which if my camera is close to a surface, the SSAO pass suddenly spikes up to around taking 16 milliseconds.

When still looking towards the same surface, but less close. The framerate resolves itself and becomes regular again.

This happens with ANY surface of my model, I am a bit clueless in regards to what could cause this. Any ideas?

In attached image: y axis is time in ms, x axis is current frame. The dips in SSAO milliseconds are when I moved away from the surface, the peaks happen when I am very close to the surface.



Edit: So I've done some more in-depth profiling with Nvidia nsight. So these are the facts from my results

Count of command buffers goes from 4 (far away from surface) to ~20(close to surface).

The command buffer duration in % goes from around ~30% to ~99%

Sometimes the CPU duration takes up to 0.03 to 0.016 milliseconds per frame while comparatively usually it takes around 0.002 milliseconds.

I am using a vertex shader which generates my full-screen quad and afterwards I do my SSAO calculations in my pixel shader, could this be a GPU driver bug? I'm a bit lost myself. It seems there could be a CPU/GPU resource stall. But why would the amount of command buffers be variable depending on distance from a surface?



Edit n2: Any resolution above 720p starts to have this issue, and I am fairly certain my SSAO is not that performance heavy it would crap itself at a bit higher resolutions.


Edited by Mercesa

Share this post

Link to post
Share on other sites

This is common, you solve it by using mip maps for the depth buffer, so you can sample a larger area with less semples.

Share this post

Link to post
Share on other sites

Maybe this (did not read it): http://research.nvidia.com/publication/scalable-ambient-obscurance

However, what i mean is simple:

Close to camera means you need to sample a large area in screen space, so samples get spread in memory and also the sample count can increase (depending on algorithm).

If you have a mip map pyramid of the depth you can pick a higher mip map level so performance remains constant independent of distance.


Edit: Are you sure increasing command buffer count comes from SSAO? Makes no sense.

Edited by JoeJ

Share this post

Link to post
Share on other sites

And I am not sure if the command buffer count comes from SSAO, but what I do know is that SSAO takes up most of my performance (as you can see in the graph) and in those frames command buffer counts increase as well.


Edit: And I think you are talking about cache misses from texture samples? And I don't really understand your mip map pyramid, I believe if you downsample a depth texture it does not really make sense anymore? Since it will linear interpolate between the values during downsampling?


Edit2: I lowered my samplerate and framerate does improve a lot, so I guess the amount of samples attributes to too much random memory access which causes cache misses as you stated :) 

Link to scrnshot: 21ad9b39ba.png

Edited by Mercesa

Share this post

Link to post
Share on other sites

The cost of a texture sample depends whether you hit the cache or not, which depends on whether your sampling is coherent or not (e g. Do neighbouring pixels sample neighbouring texels). If your SSAO changes it's sampling radius based on the distance to the surface, then this is a predictable result. At long range, your pixels might be sampling a small 3x3 area of texels, which is quite predicable, but at near range perhaps you start sampling a 1000x1000 area of pixels (111k times larger), which is very incoherent and the cache suddenly can't help you any more.

These kinds of variable radius effects either need a way to reduce the size of the data set that they're sampling on, such as the mipmaps mentioned above (a hierarchical structure) or simply clamping your filter radius with "min".

Share this post

Link to post
Share on other sites
9 hours ago, Mercesa said:

Edit: And I think you are talking about cache misses from texture samples? And I don't really understand your mip map pyramid, I believe if you downsample a depth texture it does not really make sense anymore? Since it will linear interpolate between the values during downsampling?

Unlike to shadow maps downsampling depth with interpolation should actually increase quality for SSAO as it prefilters (f. ex. VSM shadow maps also benefit from downsampling). You could even implement your own trilinear filtering by blending results from two mips, or use dithereing to avoid banding from switching mips... if the switch becomes visible at all.

Probably you should distribute your samples over multiple frames so you get high sample count and quality for free, similar to temporal aliasing. Should bring you down to 1-3 ms or something. High quality methods can use 4-5 ms, but IMO that's a real waste even on 1000$ GPUs :)

I guess the varying commandbuffer count could be caused by frustum / occlusion culling or NPCs running around?


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement