• 11
• 9
• 10
• 9
• 10
• ### Similar Content

• By stale
I'm continuing to learn more about terrain rendering, and so far I've managed to load in a heightmap and render it as a tessellated wireframe (following Frank Luna's DX11 book). However, I'm getting some really weird behavior where a large section of the wireframe is being rendered with a yellow color, even though my pixel shader is hard coded to output white.

The parts of the mesh that are discolored changes as well, as pictured below (mesh is being clipped by far plane).

Here is my pixel shader. As mentioned, I simply hard code it to output white:
float PS(DOUT pin) : SV_Target { return float4(1.0f, 1.0f, 1.0f, 1.0f); } I'm completely lost on what could be causing this, so any help in the right direction would be greatly appreciated. If I can help by providing more information please let me know.

• Hello,
i try to implement voxel cone tracing in my game engine.
At first step i try to emplement the easiest "poor mans" method
a.  my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo)
b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure
c. every voxel does have the same color for every side ( top, bottom, front .. )
d.  one directional light injects light to the voxels ( another stuctured buffer )
I will try to say what i think is correct ( please correct me )
GI lighting a given vertecie  in a ideal method
A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie
B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.
C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays
Voxel GI lighting
In priciple we want to do the same thing with our voxel structure.
Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels.
Saving time for weighted summing up of colors of each voxel
To save the time for weighted summing up of colors of each voxel we build bricks or clusters.
Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ).
The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation.

After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color.
Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture.
Cone tracing, howto ??
Here my understanding is confus ?? How is the voxel structure efficiently traced.
I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample.
Supposed,  i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone
- i would see some single voxels near or far
- i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded
How do i make a weighted sum of this ligting area ??
e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded.
Please be patient with me, i really try to understand but maybe i need some more explanation than others
best regards evelyn

• Hi guys, when I do picking followed by ray-plane intersection the results are all wrong. I am pretty sure my ray-plane intersection is correct so I'll just show the picking part. Please take a look:

// get projection_matrix DirectX::XMFLOAT4X4 mat; DirectX::XMStoreFloat4x4(&mat, projection_matrix); float2 v; v.x = (((2.0f * (float)mouse_x) / (float)screen_width) - 1.0f) / mat._11; v.y = -(((2.0f * (float)mouse_y) / (float)screen_height) - 1.0f) / mat._22; // get inverse of view_matrix DirectX::XMMATRIX inv_view = DirectX::XMMatrixInverse(nullptr, view_matrix); DirectX::XMStoreFloat4x4(&mat, inv_view); // create ray origin (camera position) float3 ray_origin; ray_origin.x = mat._41; ray_origin.y = mat._42; ray_origin.z = mat._43; // create ray direction float3 ray_dir; ray_dir.x = v.x * mat._11 + v.y * mat._21 + mat._31; ray_dir.y = v.x * mat._12 + v.y * mat._22 + mat._32; ray_dir.z = v.x * mat._13 + v.y * mat._23 + mat._33;
That should give me a ray origin and direction in world space but when I do the ray-plane intersection the results are all wrong.
If I click on the bottom half of the screen ray_dir.z becomes negative (more so as I click lower). I don't understand how that can be, shouldn't it always be pointing down the z-axis ?
I had this working in the past but I can't find my old code

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.

• Hello,
in my game engine i want to implement my own bone weight painting tool, so to say a virtual brush painting tool for a mesh.
I have already implemented my own "dual quaternion skinning" animation system with "morphs" (=blend shapes)  and "bone driven"  "corrective morphs" (= morph is dependent from a bending or twisting bone)
But now i have no idea which is the best method to implement a brush painting system.
Just some proposals
a.  i would build a kind of additional "vertecie structure", that can help me to find the surrounding (neighbours) vertecie indexes from a given "central vertecie" index
b.  the structure should also give information about the distance from the neighbour vertecsies to the given "central vertecie" index
c.  calculate the strength of the adding color to the "central vertecie" an the neighbour vertecies by a formula with linear or quadratic distance fall off
d.  the central vertecie would be detected as that vertecie that is hit by a orthogonal projection from my cursor (=brush) in world space an the mesh
but my problem is that there could be several  vertecies that can be hit simultaniously. e.g. i want to paint the inward side of the left leg. the right leg will also be hit.
I think the given problem is quite typical an there are standard approaches that i dont know.
Any help or tutorial are welcome
P.S. I am working with SharpDX, DirectX11

# DX11 Puzzling Fill Rate

This topic is 2233 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've written a deferred renderer for DX11 and am optimizing its performance. While benchmarking fill rate, I ran into something puzzling. When I stub out one of the shaders used to render the scene (pre-lighting) so that it essentially has a null VS and its PS simply writes out constants to the GBuffers, I see a pretty significant increase in performance when rendering the scene. That seemed reasonable.

So then I started thinking about doing a depth prepass, but before going through the work of implementing it, I decided to first try a simple test to see if it had potential. For the test, I simply cleared the depth buffer to 0 instead of 1. The idea is that every pixel in the scene would then be rejected before the PS and I would see pretty much the same significant increase in performance as with the null shader above. However, I saw absolutely no speed increase at all.

Does this imply that my PS is being executed even if it's occluded by closer depth values? How is that possible?

Here's the PS and the functions it calls (see PSMain for the pixel shader):

 StructuredBuffer<SpecBuf_Params_s> SpecBuf_Params; SpecBuf_Params_s GetParams( in uint uInstIndex, in uint uSubMtlIndex ) { return SpecBuf_Params[ Spec_GetParamsIndex( uInstIndex, uSubMtlIndex ) ]; } float4 Spec_Motif( SpecMotif_s Motif ) { float4 vTableColor = g_avMotifColor[ Motif.m_uMotifIndex ]; float4 vBiasedColor = (vTableColor * Motif.m_fScale + Motif.m_fOffset) * Motif.m_vBaseColor; float3 vFinalColor = (Motif.m_uFlags & 1) ? Motif.m_vBaseColor.rgb : vBiasedColor.rgb; float fFinalAlpha = (Motif.m_uFlags & 2) ? Motif.m_vBaseColor.a : vBiasedColor.a; return float4( vFinalColor, fFinalAlpha ); } uint Spec_AlphaToCoverage( float fUnitAlpha ) { uint uCoverage; #if SPEC_MSAA_COUNT == 2 if( fUnitAlpha < (1.0f / 3.0f) ) { uCoverage = 0; } else if( fUnitAlpha < (2.0f / 3.0f) ) { uCoverage = 1; } else { uCoverage = 3; } #elif SPEC_MSAA_COUNT == 4 if( fUnitAlpha < (1.0f / 5.0f) ) { uCoverage = 0; } else if( fUnitAlpha < (2.0f / 5.0f) ) { uCoverage = 1; } else if( fUnitAlpha < (3.0f / 5.0f) ) { uCoverage = 3; } else if( fUnitAlpha < (4.0f / 5.0f) ) { uCoverage = 7; } else { uCoverage = 15; } #else uCoverage = 0xffffffff; #endif return uCoverage; } SpecRawGBuffer_s Spec_PackGBuffer( in SpecGBufferSource_s Source ) { SpecRawGBuffer_s RawGBuffer; // Compute flags field for Tex2... uint uEdgePixel = any( frac( Source.m_vCentroidPosXY_SS ) - 0.5f ); uint uFlags = (Source.m_uNoAO << 5) | ((uEdgePixel & 1) << 4) | max( min( uint( Source.m_fSpecUnitSharpness * 15.0f ), 15 ), 1 ); // Store values in packed GBuffer... RawGBuffer.m_vTex0 = float4( Source.m_vDiffuseColor, 0 ); RawGBuffer.m_vTex1 = float4( Source.m_vEmissiveColor, Source.m_fSpecUnitIntensity ); RawGBuffer.m_vuTex2 = uint4( (255.0f/2.0f) + (255.0f/2.0f)*Source.m_vUnitNorm_WS, uFlags ); return RawGBuffer; } SpecRawGBuffer_s PSMain( VS_Out Input, out uint uCoverage : SV_Coverage ) : SV_TARGET { SpecBuf_Params_s Params = GetParams( Input.uInstanceID, Input.uSubMtlIndex ); uint uFlags = Params.m_uFlags; uint uFlag_VtxRGB_Tint = uFlags & FLAG_VTX_RGB_TINT; uint uFlag_VtxRGB_Emis = uFlags & FLAG_VTX_RGB_EMIS; uint uFlag_VtxA_Tint = uFlags & FLAG_VTX_A_TINT; uint uFlag_VtxA_Emis = uFlags & FLAG_VTX_A_EMIS; uint uFlag_VtxA_Opac = uFlags & FLAG_VTX_A_OPAC; uint uFlag_VtxA_Glos = uFlags & FLAG_VTX_A_GLOS; uint uFlag_BaseA_Emis = uFlags & FLAG_BASE_A_EMIS; uint uFlag_BaseA_Opac = uFlags & FLAG_BASE_A_OPAC; uint uFlag_BaseA_Glos = uFlags & FLAG_BASE_A_GLOS; float4 vMotifTintOpac = Spec_Motif( Params.m_MotifTintOpac ); float4 vMotifEmisGlos = Spec_Motif( Params.m_MotifEmisGlos ); float3 vVtxTint = (uFlag_VtxRGB_Tint ? Input.vColorVtx.rgb : 1) * (uFlag_VtxA_Tint ? Input.vColorVtx.a : 1); float3 vVtxEmis = Params.m_fAddEmis + (uFlag_VtxRGB_Emis ? Input.vColorVtx.rgb : 0) + (uFlag_VtxA_Emis ? Input.vColorVtx.a : 0); float fVtxOpac = (uFlag_VtxA_Opac ? Input.vColorVtx.a : 1); float fVtxGlos = (uFlag_VtxA_Glos ? Input.vColorVtx.a : 0); float4 vVtxTintOpac = float4( vVtxTint, fVtxOpac ) * vMotifTintOpac; float4 vVtxEmisGlos = float4( vVtxEmis, fVtxGlos ); // Compute normal... float3 vUnitNorm_WS = normalize( Input.vNormal_WS ); // Compute base color... float3 vTC_BaseRGB = float3( Input.vTC_Base.xy, Params.m_uTexSliceIndexBaseRGB ); float3 vTC_BaseA = float3( Params.m_fTexCoordScale_BaseA * Input.vTC_Base.zw, Params.m_uTexSliceIndexBaseA ); float3 vTexColorBaseRGB = TexBase.Sample( SamplerBase, vTC_BaseRGB ).rgb; float fTexColorBaseA = TexBase.Sample( SamplerBase, vTC_BaseA ).a; float fDetailMult = lerp( 1, 2 * fTexColorBaseA, Params.m_Switch_fBaseA_Detl ); vTexColorBaseRGB *= fDetailMult; // Compute reflection color... float4 vMotifCube = Spec_Motif( Params.m_MotifCube ); float3 vUnitVtxToCam_WS = normalize( Spec_GetCamPos() - Input.vPos_WS ); float3 vUnitReflect_WS = reflect( -vUnitVtxToCam_WS, vUnitNorm_WS ); float3 vReflectionColor = vMotifCube.rgb * TexCube.Sample( SamplerCube, float4( vUnitReflect_WS, Params.m_uTexSliceIndexCube ) ).rgb; // Compute final values... float fFinalOpac = Input.fUnitFadeAlpha * vVtxTintOpac.a * (uFlag_BaseA_Opac ? fTexColorBaseA : 1); float fFinalGlos = vMotifEmisGlos.a * (vVtxEmisGlos.a + (uFlag_BaseA_Glos ? fTexColorBaseA : 0)); float3 vFinalDiff = saturate( vVtxTintOpac.rgb * vTexColorBaseRGB ) + fFinalGlos * vReflectionColor; float3 vFinalEmis = vMotifEmisGlos.rgb * (vVtxEmisGlos.rgb + (uFlag_BaseA_Emis ? fTexColorBaseA : 0)); uCoverage = Spec_AlphaToCoverage( fFinalOpac ); // Store everything into our gbuffers... SpecGBufferSource_s GBufSource = Spec_GetDefaultGBufSource(); GBufSource.m_vDiffuseColor = vFinalDiff; GBufSource.m_vEmissiveColor = vFinalDiff * vFinalEmis; GBufSource.m_vCentroidPosXY_SS = Input.vPos_HS.xy; GBufSource.m_vUnitNorm_WS = vUnitNorm_WS; GBufSource.m_fSpecUnitSharpness = Params.m_fSpecUnitSharpness; GBufSource.m_fSpecUnitIntensity = fFinalGlos; return Spec_PackGBuffer( GBufSource ); } 

Edit: After some more testing, it's really looking like early Z rejection just isn't working. Though, I'm not sure why yet. Do NVIDIA and ATI provide docs that describe the conditions which must be met to keep early Z rejection enabled?

##### Share on other sites
Nvidia has some guidelines in this doc, but they might be a but out of date depending on which hardware you're working with. Alpha to coverage or outputting SV_Coverage can definitely mess with Z cull, so you might want to try disabling that to see if it makes a difference.

Also, if you want to see whether your pixel shader is actually running you can use the D3D11_QUERY_DATA_PIPELINE_STATISTICS to get the number of pixel shader invocations.

##### Share on other sites
Thanks for the info MJP. I was indeed not meeting some of those requirements. However, even after fixing things up the query reports no change in the number of PS invocations.

- PS no longer outputs SV_Coverage.
- Using ClearDepthStencilView() to clear the depth buffer.
- PS doesn't write depth.
- The direction of the depth test is <= while both writing and comparing the depth buffer and doesn't change in between.
- Depth buffer is a Texture2DMS (no array).
- The PS uses the XY components of the SV_Position semantic, but not the z component.
- The PS doesn't use clip, texkil, or discard.
- Alpha to coverage is disabled

The depth buffer is, however, DXGI_FORMAT_D32_FLOAT, but the NVIDIA doc doesn't list that as a reason early Z would be disabled. I don't have a stencil buffer. I'm am using 2x MSAA render targets. I can see in Pix that the depth buffer has been written to with the pre-pass.

When writing the depth, I bind a read/write DSV to the pipeline and use this state:

D3D11_BLEND_DESC:
AlphaToCoverageEnable = 0
IndependentblenEnable = 0
BlendEnable = 0
SrcBlend = ONE
DestBlend = ZERO
SrcBlendAlpha = ONE
DestBlendAlpha = ZERO

D3D11_DEPTH_STENCIL_DESC:
DepthEnable = 1
DepthFunc = LESS_EQUAL
(all stencil members are 0)

D3D11_RASTERIZER_DESC:
FillMode = SOLID
CullMode = BACK
FrontCounterClockwise = 1
DepthBias = 0
DepthBiasClamp = 0
SlopeScaledDepthBias = 0
DepthClipEnable = 1
ScissorEnable = 0
MultisampleEnable = 0
AntialiasedLineEnable = 0

When rendering the scene, I bind a read-only DSV to the pipeline and use this state:

D3D11_BLEND_DESC:
AlphaToCoverageEnable = 0
IndependentblenEnable = 0
BlendEnable = 0
SrcBlend = ONE
DestBlend = ZERO
SrcBlendAlpha = ONE
DestBlendAlpha = ZERO

D3D11_DEPTH_STENCIL_DESC:
DepthEnable = 1
DepthFunc = LESS_EQUAL
(all stencil members are 0)

D3D11_RASTERIZER_DESC:
FillMode = SOLID
CullMode = BACK
FrontCounterClockwise = 1
DepthBias = 0
DepthBiasClamp = 0
SlopeScaledDepthBias = 0
DepthClipEnable = 1
ScissorEnable = 0
MultisampleEnable = 0
AntialiasedLineEnable = 0

 // Constant buffer with cam info: cbuffer SpecBuf_Camera { row_major float4x4 g_ProjCamMtx; }; // Vertex in: struct VS_In { float3 vPos_WS : POSITION; }; // Vertex out: struct VS_Out { float4 vPos_HS : SV_POSITION; }; // Vertex shader: VS_Out VS( VS_In Input ) { VS_Out Output; Output.vPos_HS = mul( g_ProjCamMtx, float4( Input.vPos_WS, 1 ) ); return Output; } // Pixel shader: float4 PS( VS_Out Input ) : SV_TARGET { return 0; } // Technique: technique11 Terrain { pass P1 { SetVertexShader( CompileShader( vs_5_0, VS() ) ); SetPixelShader( CompileShader( ps_5_0, PS() ) ); } } 

What else could I be doing to turn off early Z?