• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
360GAMZ

DX11
Puzzling Fill Rate

3 posts in this topic

I've written a deferred renderer for DX11 and am optimizing its performance. While benchmarking fill rate, I ran into something puzzling. When I stub out one of the shaders used to render the scene (pre-lighting) so that it essentially has a null VS and its PS simply writes out constants to the GBuffers, I see a pretty significant increase in performance when rendering the scene. That seemed reasonable.

So then I started thinking about doing a depth prepass, but before going through the work of implementing it, I decided to first try a simple test to see if it had potential. For the test, I simply cleared the depth buffer to 0 instead of 1. The idea is that every pixel in the scene would then be rejected before the PS and I would see pretty much the same significant increase in performance as with the null shader above. However, I saw absolutely no speed increase at all.

Does this imply that my PS is being executed even if it's occluded by closer depth values? How is that possible?

Here's the PS and the functions it calls (see PSMain for the pixel shader):

[code]
StructuredBuffer<SpecBuf_Params_s> SpecBuf_Params;
SpecBuf_Params_s GetParams( in uint uInstIndex, in uint uSubMtlIndex ) {
return SpecBuf_Params[ Spec_GetParamsIndex( uInstIndex, uSubMtlIndex ) ];
}

float4 Spec_Motif( SpecMotif_s Motif ) {
float4 vTableColor = g_avMotifColor[ Motif.m_uMotifIndex ];
float4 vBiasedColor = (vTableColor * Motif.m_fScale + Motif.m_fOffset) * Motif.m_vBaseColor;
float3 vFinalColor = (Motif.m_uFlags & 1) ? Motif.m_vBaseColor.rgb : vBiasedColor.rgb;
float fFinalAlpha = (Motif.m_uFlags & 2) ? Motif.m_vBaseColor.a : vBiasedColor.a;
return float4( vFinalColor, fFinalAlpha );
}

uint Spec_AlphaToCoverage( float fUnitAlpha ) {
uint uCoverage;
#if SPEC_MSAA_COUNT == 2
if( fUnitAlpha < (1.0f / 3.0f) ) {
uCoverage = 0;
} else if( fUnitAlpha < (2.0f / 3.0f) ) {
uCoverage = 1;
} else {
uCoverage = 3;
}
#elif SPEC_MSAA_COUNT == 4
if( fUnitAlpha < (1.0f / 5.0f) ) {
uCoverage = 0;
} else if( fUnitAlpha < (2.0f / 5.0f) ) {
uCoverage = 1;
} else if( fUnitAlpha < (3.0f / 5.0f) ) {
uCoverage = 3;
} else if( fUnitAlpha < (4.0f / 5.0f) ) {
uCoverage = 7;
} else {
uCoverage = 15;
}
#else
uCoverage = 0xffffffff;
#endif
return uCoverage;
}

SpecRawGBuffer_s Spec_PackGBuffer( in SpecGBufferSource_s Source ) {
SpecRawGBuffer_s RawGBuffer;
// Compute flags field for Tex2...
uint uEdgePixel = any( frac( Source.m_vCentroidPosXY_SS ) - 0.5f );
uint uFlags = (Source.m_uNoAO << 5) | ((uEdgePixel & 1) << 4) | max( min( uint( Source.m_fSpecUnitSharpness * 15.0f ), 15 ), 1 );
// Store values in packed GBuffer...
RawGBuffer.m_vTex0 = float4( Source.m_vDiffuseColor, 0 );
RawGBuffer.m_vTex1 = float4( Source.m_vEmissiveColor, Source.m_fSpecUnitIntensity );
RawGBuffer.m_vuTex2 = uint4( (255.0f/2.0f) + (255.0f/2.0f)*Source.m_vUnitNorm_WS, uFlags );
return RawGBuffer;
}

SpecRawGBuffer_s PSMain( VS_Out Input, out uint uCoverage : SV_Coverage ) : SV_TARGET {
SpecBuf_Params_s Params = GetParams( Input.uInstanceID, Input.uSubMtlIndex );
uint uFlags = Params.m_uFlags;
uint uFlag_VtxRGB_Tint = uFlags & FLAG_VTX_RGB_TINT;
uint uFlag_VtxRGB_Emis = uFlags & FLAG_VTX_RGB_EMIS;
uint uFlag_VtxA_Tint = uFlags & FLAG_VTX_A_TINT;
uint uFlag_VtxA_Emis = uFlags & FLAG_VTX_A_EMIS;
uint uFlag_VtxA_Opac = uFlags & FLAG_VTX_A_OPAC;
uint uFlag_VtxA_Glos = uFlags & FLAG_VTX_A_GLOS;
uint uFlag_BaseA_Emis = uFlags & FLAG_BASE_A_EMIS;
uint uFlag_BaseA_Opac = uFlags & FLAG_BASE_A_OPAC;
uint uFlag_BaseA_Glos = uFlags & FLAG_BASE_A_GLOS;

float4 vMotifTintOpac = Spec_Motif( Params.m_MotifTintOpac );
float4 vMotifEmisGlos = Spec_Motif( Params.m_MotifEmisGlos );
float3 vVtxTint = (uFlag_VtxRGB_Tint ? Input.vColorVtx.rgb : 1) * (uFlag_VtxA_Tint ? Input.vColorVtx.a : 1);
float3 vVtxEmis = Params.m_fAddEmis + (uFlag_VtxRGB_Emis ? Input.vColorVtx.rgb : 0) + (uFlag_VtxA_Emis ? Input.vColorVtx.a : 0);
float fVtxOpac = (uFlag_VtxA_Opac ? Input.vColorVtx.a : 1);
float fVtxGlos = (uFlag_VtxA_Glos ? Input.vColorVtx.a : 0);
float4 vVtxTintOpac = float4( vVtxTint, fVtxOpac ) * vMotifTintOpac;
float4 vVtxEmisGlos = float4( vVtxEmis, fVtxGlos );

// Compute normal...
float3 vUnitNorm_WS = normalize( Input.vNormal_WS );

// Compute base color...
float3 vTC_BaseRGB = float3( Input.vTC_Base.xy, Params.m_uTexSliceIndexBaseRGB );
float3 vTC_BaseA = float3( Params.m_fTexCoordScale_BaseA * Input.vTC_Base.zw, Params.m_uTexSliceIndexBaseA );
float3 vTexColorBaseRGB = TexBase.Sample( SamplerBase, vTC_BaseRGB ).rgb;
float fTexColorBaseA = TexBase.Sample( SamplerBase, vTC_BaseA ).a;
float fDetailMult = lerp( 1, 2 * fTexColorBaseA, Params.m_Switch_fBaseA_Detl );
vTexColorBaseRGB *= fDetailMult;

// Compute reflection color...
float4 vMotifCube = Spec_Motif( Params.m_MotifCube );
float3 vUnitVtxToCam_WS = normalize( Spec_GetCamPos() - Input.vPos_WS );
float3 vUnitReflect_WS = reflect( -vUnitVtxToCam_WS, vUnitNorm_WS );
float3 vReflectionColor = vMotifCube.rgb * TexCube.Sample( SamplerCube, float4( vUnitReflect_WS, Params.m_uTexSliceIndexCube ) ).rgb;

// Compute final values...
float fFinalOpac = Input.fUnitFadeAlpha * vVtxTintOpac.a * (uFlag_BaseA_Opac ? fTexColorBaseA : 1);
float fFinalGlos = vMotifEmisGlos.a * (vVtxEmisGlos.a + (uFlag_BaseA_Glos ? fTexColorBaseA : 0));
float3 vFinalDiff = saturate( vVtxTintOpac.rgb * vTexColorBaseRGB ) + fFinalGlos * vReflectionColor;
float3 vFinalEmis = vMotifEmisGlos.rgb * (vVtxEmisGlos.rgb + (uFlag_BaseA_Emis ? fTexColorBaseA : 0));
uCoverage = Spec_AlphaToCoverage( fFinalOpac );

// Store everything into our gbuffers...
SpecGBufferSource_s GBufSource = Spec_GetDefaultGBufSource();
GBufSource.m_vDiffuseColor = vFinalDiff;
GBufSource.m_vEmissiveColor = vFinalDiff * vFinalEmis;
GBufSource.m_vCentroidPosXY_SS = Input.vPos_HS.xy;
GBufSource.m_vUnitNorm_WS = vUnitNorm_WS;
GBufSource.m_fSpecUnitSharpness = Params.m_fSpecUnitSharpness;
GBufSource.m_fSpecUnitIntensity = fFinalGlos;
return Spec_PackGBuffer( GBufSource );
}
[/code]

Edit: After some more testing, it's really looking like early Z rejection just isn't working. Though, I'm not sure why yet. Do NVIDIA and ATI provide docs that describe the conditions which must be met to keep early Z rejection enabled?
0

Share this post


Link to post
Share on other sites
Nvidia has some guidelines in [url="http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf"]this doc[/url], but they might be a but out of date depending on which hardware you're working with. Alpha to coverage or outputting SV_Coverage can definitely mess with Z cull, so you might want to try disabling that to see if it makes a difference.

Also, if you want to see whether your pixel shader is actually running you can use the D3D11_QUERY_DATA_PIPELINE_STATISTICS to get the number of pixel shader invocations.
0

Share this post


Link to post
Share on other sites
Thanks for the info MJP. I was indeed not meeting some of those requirements. However, even after fixing things up the query reports no change in the number of PS invocations.

- PS no longer outputs SV_Coverage.
- Using ClearDepthStencilView() to clear the depth buffer.
- PS doesn't write depth.
- The direction of the depth test is <= while both writing and comparing the depth buffer and doesn't change in between.
- Depth buffer is a Texture2DMS (no array).
- The PS uses the XY components of the SV_Position semantic, but not the z component.
- The PS doesn't use clip, texkil, or discard.
- Alpha to coverage is disabled
- SampleMask is always 0xFFFFFFFF.

The depth buffer is, however, DXGI_FORMAT_D32_FLOAT, but the NVIDIA doc doesn't list that as a reason early Z would be disabled. I don't have a stencil buffer. I'm am using 2x MSAA render targets. I can see in Pix that the depth buffer has been written to with the pre-pass.

When writing the depth, I bind a read/write DSV to the pipeline and use this state:

D3D11_BLEND_DESC:
AlphaToCoverageEnable = 0
IndependentblenEnable = 0
BlendEnable = 0
SrcBlend = ONE
DestBlend = ZERO
BlendOp = ADD
SrcBlendAlpha = ONE
DestBlendAlpha = ZERO
BlendOpAlpha = ADD
RenderTargetWriteMask = 0

D3D11_DEPTH_STENCIL_DESC:
DepthEnable = 1
DepthWriteMask = ALL
DepthFunc = LESS_EQUAL
(all stencil members are 0)

D3D11_RASTERIZER_DESC:
FillMode = SOLID
CullMode = BACK
FrontCounterClockwise = 1
DepthBias = 0
DepthBiasClamp = 0
SlopeScaledDepthBias = 0
DepthClipEnable = 1
ScissorEnable = 0
MultisampleEnable = 0
AntialiasedLineEnable = 0


When rendering the scene, I bind a read-only DSV to the pipeline and use this state:

D3D11_BLEND_DESC:
AlphaToCoverageEnable = 0
IndependentblenEnable = 0
BlendEnable = 0
SrcBlend = ONE
DestBlend = ZERO
BlendOp = ADD
SrcBlendAlpha = ONE
DestBlendAlpha = ZERO
BlendOpAlpha = ADD
RenderTargetWriteMask = 15

D3D11_DEPTH_STENCIL_DESC:
DepthEnable = 1
DepthWriteMask = ZERO
DepthFunc = LESS_EQUAL
(all stencil members are 0)

D3D11_RASTERIZER_DESC:
FillMode = SOLID
CullMode = BACK
FrontCounterClockwise = 1
DepthBias = 0
DepthBiasClamp = 0
SlopeScaledDepthBias = 0
DepthClipEnable = 1
ScissorEnable = 0
MultisampleEnable = 0
AntialiasedLineEnable = 0

Here's the depth prepass shader:

[code]
// Constant buffer with cam info:
cbuffer SpecBuf_Camera {
row_major float4x4 g_ProjCamMtx;
};

// Vertex in:
struct VS_In {
float3 vPos_WS : POSITION;
};

// Vertex out:
struct VS_Out {
float4 vPos_HS : SV_POSITION;
};

// Vertex shader:
VS_Out VS( VS_In Input ) {
VS_Out Output;
Output.vPos_HS = mul( g_ProjCamMtx, float4( Input.vPos_WS, 1 ) );
return Output;
}

// Pixel shader:
float4 PS( VS_Out Input ) : SV_TARGET {
return 0;
}

// Technique:
technique11 Terrain {
pass P1 {
SetVertexShader( CompileShader( vs_5_0, VS() ) );
SetPixelShader( CompileShader( ps_5_0, PS() ) );
}
}
[/code]


What else could I be doing to turn off early Z?
0

Share this post


Link to post
Share on other sites
Matt, thanks to your book for pointing out the [earlydepthstencil] attribute for pixel shaders! I tried this and the query now reports over a million invocations of the PS have been eliminated (without a depth pre-pass - just drawing the scene from front to back as much as possible). Though, I didn't notice much of a gain in terms of performance. Doing a full depth prepass increases the draw call count substantially which negatively impacts frame rate.

If anyone's interested, the book I'm referring to is called Practical Rendering & Computation with Direct3D 11, and it's proven to be a valuable resource to me during my adventure through DX11.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By YixunLiu
      Hi,
      I have a surface mesh and I want to use a cone to cut a hole on the surface mesh.
      Anybody know a fast method to calculate the intersected boundary of these two geometries?
       
      Thanks.
       
      YL
       
    • By hiya83
      Hi, I tried searching for this but either I failed or couldn't find anything. I know there's D11/D12 interop and there are extensions for GL/D11 (though not very efficient). I was wondering if there's any Vulkan/D11 or Vulkan/D12 interop?
      Thanks!
    • By lonewolff
      Hi Guys,
      I am just wondering if it is possible to acquire the address of the backbuffer if an API (based on DX11) only exposes the 'device' and 'context' pointers?
      Any advice would be greatly appreciated
    • By MarcusAseth
      bool InitDirect3D::Init() { if (!D3DApp::Init()) { return false; } //Additional Initialization //Disable Alt+Enter Fullscreen Toggle shortkey IDXGIFactory* factory; CreateDXGIFactory(__uuidof(IDXGIFactory), reinterpret_cast<void**>(&factory)); factory->MakeWindowAssociation(mhWindow, DXGI_MWA_NO_WINDOW_CHANGES); factory->Release(); return true; }  
      As stated on the title and displayed on the code above, regardless of it Alt+Enter still takes effect...
      I recall something from the book during the swapChain creation, where in order to create it one has to use the same factory used to create the ID3D11Device, therefore I tested and indeed using that same factory indeed it work.
      How is that one particular factory related to my window and how come the MakeWindowAssociation won't take effect with a newly created factory?
      Also what's even the point of being able to create this Factories if they won't work,?(except from that one associated with the ID3D11Device) 
    • By ProfL
      Can anyone recommend a wrapper for Direct3D 11 that is similarly simple to use as SFML? I don't need all the image formats etc. BUT I want a simple way to open a window, allocate a texture, buffer, shader.
  • Popular Now