• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

133 Neutral

About 360GAMZ

  • Rank
  1. DX11 [DX11] Update StructuredBuffer

    I wonder if this is a data alignment issue. Try aligning your array to a 16 byte boundary.
  2. Jason, FWIW, I hope Microsoft addresses this in a future release. It would be nice if when you bound a resource, DX just automatically unbound it from whatever it was bound to before. If DX did this, we'd all benefit from it instead of each of us having to write that management layer!
  3. DX11 Puzzling Fill Rate

    Matt, thanks to your book for pointing out the [earlydepthstencil] attribute for pixel shaders! I tried this and the query now reports over a million invocations of the PS have been eliminated (without a depth pre-pass - just drawing the scene from front to back as much as possible). Though, I didn't notice much of a gain in terms of performance. Doing a full depth prepass increases the draw call count substantially which negatively impacts frame rate. If anyone's interested, the book I'm referring to is called Practical Rendering & Computation with Direct3D 11, and it's proven to be a valuable resource to me during my adventure through DX11.
  4. DX11 Puzzling Fill Rate

    Thanks for the info MJP. I was indeed not meeting some of those requirements. However, even after fixing things up the query reports no change in the number of PS invocations. - PS no longer outputs SV_Coverage. - Using ClearDepthStencilView() to clear the depth buffer. - PS doesn't write depth. - The direction of the depth test is <= while both writing and comparing the depth buffer and doesn't change in between. - Depth buffer is a Texture2DMS (no array). - The PS uses the XY components of the SV_Position semantic, but not the z component. - The PS doesn't use clip, texkil, or discard. - Alpha to coverage is disabled - SampleMask is always 0xFFFFFFFF. The depth buffer is, however, DXGI_FORMAT_D32_FLOAT, but the NVIDIA doc doesn't list that as a reason early Z would be disabled. I don't have a stencil buffer. I'm am using 2x MSAA render targets. I can see in Pix that the depth buffer has been written to with the pre-pass. When writing the depth, I bind a read/write DSV to the pipeline and use this state: D3D11_BLEND_DESC: AlphaToCoverageEnable = 0 IndependentblenEnable = 0 BlendEnable = 0 SrcBlend = ONE DestBlend = ZERO BlendOp = ADD SrcBlendAlpha = ONE DestBlendAlpha = ZERO BlendOpAlpha = ADD RenderTargetWriteMask = 0 D3D11_DEPTH_STENCIL_DESC: DepthEnable = 1 DepthWriteMask = ALL DepthFunc = LESS_EQUAL (all stencil members are 0) D3D11_RASTERIZER_DESC: FillMode = SOLID CullMode = BACK FrontCounterClockwise = 1 DepthBias = 0 DepthBiasClamp = 0 SlopeScaledDepthBias = 0 DepthClipEnable = 1 ScissorEnable = 0 MultisampleEnable = 0 AntialiasedLineEnable = 0 When rendering the scene, I bind a read-only DSV to the pipeline and use this state: D3D11_BLEND_DESC: AlphaToCoverageEnable = 0 IndependentblenEnable = 0 BlendEnable = 0 SrcBlend = ONE DestBlend = ZERO BlendOp = ADD SrcBlendAlpha = ONE DestBlendAlpha = ZERO BlendOpAlpha = ADD RenderTargetWriteMask = 15 D3D11_DEPTH_STENCIL_DESC: DepthEnable = 1 DepthWriteMask = ZERO DepthFunc = LESS_EQUAL (all stencil members are 0) D3D11_RASTERIZER_DESC: FillMode = SOLID CullMode = BACK FrontCounterClockwise = 1 DepthBias = 0 DepthBiasClamp = 0 SlopeScaledDepthBias = 0 DepthClipEnable = 1 ScissorEnable = 0 MultisampleEnable = 0 AntialiasedLineEnable = 0 Here's the depth prepass shader: [code] // Constant buffer with cam info: cbuffer SpecBuf_Camera { row_major float4x4 g_ProjCamMtx; }; // Vertex in: struct VS_In { float3 vPos_WS : POSITION; }; // Vertex out: struct VS_Out { float4 vPos_HS : SV_POSITION; }; // Vertex shader: VS_Out VS( VS_In Input ) { VS_Out Output; Output.vPos_HS = mul( g_ProjCamMtx, float4( Input.vPos_WS, 1 ) ); return Output; } // Pixel shader: float4 PS( VS_Out Input ) : SV_TARGET { return 0; } // Technique: technique11 Terrain { pass P1 { SetVertexShader( CompileShader( vs_5_0, VS() ) ); SetPixelShader( CompileShader( ps_5_0, PS() ) ); } } [/code] What else could I be doing to turn off early Z?
  5. I've written a deferred renderer for DX11 and am optimizing its performance. While benchmarking fill rate, I ran into something puzzling. When I stub out one of the shaders used to render the scene (pre-lighting) so that it essentially has a null VS and its PS simply writes out constants to the GBuffers, I see a pretty significant increase in performance when rendering the scene. That seemed reasonable. So then I started thinking about doing a depth prepass, but before going through the work of implementing it, I decided to first try a simple test to see if it had potential. For the test, I simply cleared the depth buffer to 0 instead of 1. The idea is that every pixel in the scene would then be rejected before the PS and I would see pretty much the same significant increase in performance as with the null shader above. However, I saw absolutely no speed increase at all. Does this imply that my PS is being executed even if it's occluded by closer depth values? How is that possible? Here's the PS and the functions it calls (see PSMain for the pixel shader): [code] StructuredBuffer<SpecBuf_Params_s> SpecBuf_Params; SpecBuf_Params_s GetParams( in uint uInstIndex, in uint uSubMtlIndex ) { return SpecBuf_Params[ Spec_GetParamsIndex( uInstIndex, uSubMtlIndex ) ]; } float4 Spec_Motif( SpecMotif_s Motif ) { float4 vTableColor = g_avMotifColor[ Motif.m_uMotifIndex ]; float4 vBiasedColor = (vTableColor * Motif.m_fScale + Motif.m_fOffset) * Motif.m_vBaseColor; float3 vFinalColor = (Motif.m_uFlags & 1) ? Motif.m_vBaseColor.rgb : vBiasedColor.rgb; float fFinalAlpha = (Motif.m_uFlags & 2) ? Motif.m_vBaseColor.a : vBiasedColor.a; return float4( vFinalColor, fFinalAlpha ); } uint Spec_AlphaToCoverage( float fUnitAlpha ) { uint uCoverage; #if SPEC_MSAA_COUNT == 2 if( fUnitAlpha < (1.0f / 3.0f) ) { uCoverage = 0; } else if( fUnitAlpha < (2.0f / 3.0f) ) { uCoverage = 1; } else { uCoverage = 3; } #elif SPEC_MSAA_COUNT == 4 if( fUnitAlpha < (1.0f / 5.0f) ) { uCoverage = 0; } else if( fUnitAlpha < (2.0f / 5.0f) ) { uCoverage = 1; } else if( fUnitAlpha < (3.0f / 5.0f) ) { uCoverage = 3; } else if( fUnitAlpha < (4.0f / 5.0f) ) { uCoverage = 7; } else { uCoverage = 15; } #else uCoverage = 0xffffffff; #endif return uCoverage; } SpecRawGBuffer_s Spec_PackGBuffer( in SpecGBufferSource_s Source ) { SpecRawGBuffer_s RawGBuffer; // Compute flags field for Tex2... uint uEdgePixel = any( frac( Source.m_vCentroidPosXY_SS ) - 0.5f ); uint uFlags = (Source.m_uNoAO << 5) | ((uEdgePixel & 1) << 4) | max( min( uint( Source.m_fSpecUnitSharpness * 15.0f ), 15 ), 1 ); // Store values in packed GBuffer... RawGBuffer.m_vTex0 = float4( Source.m_vDiffuseColor, 0 ); RawGBuffer.m_vTex1 = float4( Source.m_vEmissiveColor, Source.m_fSpecUnitIntensity ); RawGBuffer.m_vuTex2 = uint4( (255.0f/2.0f) + (255.0f/2.0f)*Source.m_vUnitNorm_WS, uFlags ); return RawGBuffer; } SpecRawGBuffer_s PSMain( VS_Out Input, out uint uCoverage : SV_Coverage ) : SV_TARGET { SpecBuf_Params_s Params = GetParams( Input.uInstanceID, Input.uSubMtlIndex ); uint uFlags = Params.m_uFlags; uint uFlag_VtxRGB_Tint = uFlags & FLAG_VTX_RGB_TINT; uint uFlag_VtxRGB_Emis = uFlags & FLAG_VTX_RGB_EMIS; uint uFlag_VtxA_Tint = uFlags & FLAG_VTX_A_TINT; uint uFlag_VtxA_Emis = uFlags & FLAG_VTX_A_EMIS; uint uFlag_VtxA_Opac = uFlags & FLAG_VTX_A_OPAC; uint uFlag_VtxA_Glos = uFlags & FLAG_VTX_A_GLOS; uint uFlag_BaseA_Emis = uFlags & FLAG_BASE_A_EMIS; uint uFlag_BaseA_Opac = uFlags & FLAG_BASE_A_OPAC; uint uFlag_BaseA_Glos = uFlags & FLAG_BASE_A_GLOS; float4 vMotifTintOpac = Spec_Motif( Params.m_MotifTintOpac ); float4 vMotifEmisGlos = Spec_Motif( Params.m_MotifEmisGlos ); float3 vVtxTint = (uFlag_VtxRGB_Tint ? Input.vColorVtx.rgb : 1) * (uFlag_VtxA_Tint ? Input.vColorVtx.a : 1); float3 vVtxEmis = Params.m_fAddEmis + (uFlag_VtxRGB_Emis ? Input.vColorVtx.rgb : 0) + (uFlag_VtxA_Emis ? Input.vColorVtx.a : 0); float fVtxOpac = (uFlag_VtxA_Opac ? Input.vColorVtx.a : 1); float fVtxGlos = (uFlag_VtxA_Glos ? Input.vColorVtx.a : 0); float4 vVtxTintOpac = float4( vVtxTint, fVtxOpac ) * vMotifTintOpac; float4 vVtxEmisGlos = float4( vVtxEmis, fVtxGlos ); // Compute normal... float3 vUnitNorm_WS = normalize( Input.vNormal_WS ); // Compute base color... float3 vTC_BaseRGB = float3( Input.vTC_Base.xy, Params.m_uTexSliceIndexBaseRGB ); float3 vTC_BaseA = float3( Params.m_fTexCoordScale_BaseA * Input.vTC_Base.zw, Params.m_uTexSliceIndexBaseA ); float3 vTexColorBaseRGB = TexBase.Sample( SamplerBase, vTC_BaseRGB ).rgb; float fTexColorBaseA = TexBase.Sample( SamplerBase, vTC_BaseA ).a; float fDetailMult = lerp( 1, 2 * fTexColorBaseA, Params.m_Switch_fBaseA_Detl ); vTexColorBaseRGB *= fDetailMult; // Compute reflection color... float4 vMotifCube = Spec_Motif( Params.m_MotifCube ); float3 vUnitVtxToCam_WS = normalize( Spec_GetCamPos() - Input.vPos_WS ); float3 vUnitReflect_WS = reflect( -vUnitVtxToCam_WS, vUnitNorm_WS ); float3 vReflectionColor = vMotifCube.rgb * TexCube.Sample( SamplerCube, float4( vUnitReflect_WS, Params.m_uTexSliceIndexCube ) ).rgb; // Compute final values... float fFinalOpac = Input.fUnitFadeAlpha * vVtxTintOpac.a * (uFlag_BaseA_Opac ? fTexColorBaseA : 1); float fFinalGlos = vMotifEmisGlos.a * (vVtxEmisGlos.a + (uFlag_BaseA_Glos ? fTexColorBaseA : 0)); float3 vFinalDiff = saturate( vVtxTintOpac.rgb * vTexColorBaseRGB ) + fFinalGlos * vReflectionColor; float3 vFinalEmis = vMotifEmisGlos.rgb * (vVtxEmisGlos.rgb + (uFlag_BaseA_Emis ? fTexColorBaseA : 0)); uCoverage = Spec_AlphaToCoverage( fFinalOpac ); // Store everything into our gbuffers... SpecGBufferSource_s GBufSource = Spec_GetDefaultGBufSource(); GBufSource.m_vDiffuseColor = vFinalDiff; GBufSource.m_vEmissiveColor = vFinalDiff * vFinalEmis; GBufSource.m_vCentroidPosXY_SS = Input.vPos_HS.xy; GBufSource.m_vUnitNorm_WS = vUnitNorm_WS; GBufSource.m_fSpecUnitSharpness = Params.m_fSpecUnitSharpness; GBufSource.m_fSpecUnitIntensity = fFinalGlos; return Spec_PackGBuffer( GBufSource ); } [/code] Edit: After some more testing, it's really looking like early Z rejection just isn't working. Though, I'm not sure why yet. Do NVIDIA and ATI provide docs that describe the conditions which must be met to keep early Z rejection enabled?
  6. [quote name='mhagain' timestamp='1328194547' post='4908717'] So right now I'm just casting the pData member of D3D11_MAPPED_SUBRESOURCE to my vertex structure type, incrementing the pointer by the required amount, and writing in. Like I said, that seems to work and it doesn't cause any performance loss, but in the absence of any documented description of what's happening (and it could be as simple as "yeah, do this" and "no, don't do that") it feels an awful lot like throwing magic pixie dust in the air and seeing what comes down. That's not fun. D3D9 had a clear and well-documented approach for this buffer usage pattern (and let's leave out buffer types to which it doesn't apply), D3D11 doesn't. [/quote] What you're describing sounds correct to me and consistent with the link provided by Erik above. The DX11 docs also suggest that you use NO_OVERWRITE along with DISCARD to insure you won't be overwriting any data the GPU may be using. Same as with DX9, but the DX9 docs explain it a lot better (i.e. use DISCARD when you reach the end of the buffer). As you pointed out, the difference between the DX9 and DX11 APIs are that you specified the region in DX9 whereas DX11 gives you a pointer to the start of the buffer, and then we must apply the offset afterwards. I agree that the docs could explain all of this better.
  7. For vertex & index buffers, you should be able to use the NO_OVERWRITE/DISCARD locking pattern just as in the DX9 days. You'll just need to make sure you create the resources with DYNAMIC and CPU write access. For Structured Buffers, Raw Buffers, and normal buffers, NO_OVERWRITE isn't supported which is very unfortunate. So, the only way to do incremental updating of one of these types of buffers is through UpdateSubresource(). In this case, the resource needs to be created with DEFAULT and no CPU write access.
  8. [quote name='360GAMZ' timestamp='1328051880' post='4908205'] I'm unclear as to whether our Structured Buffer should be created as DEFAULT or DYNAMIC for best performance with UpdateSubresource. [/quote] Update: UpdateSubresource requires that the resource have been created as DEFAULT without CPU write access.
  9. On a related note, D3D11_MAP_WRITE_NO_OVERWRITE is not allowed with Structured Buffers, Raw Buffers, or regular buffers. This is unfortunate when you want to dynamically append data to what you've already written. DX 11.1 fixes this, but not sure when it'll be released, whether it'll be available on Windows 7, and whether existing DX11 graphics cards will support it. Assuming that DX 11.1 isn't a possibility, the only ways we have of updating a Structured Buffer are either D3D11_MAP_WRITE_DISCARD or UpdateSubresource. For our app, DISCARD won't work because it makes all previously written data unavailable to further draw calls, and our shaders need access to the entire buffer. So that leaves UpdateSubresource as our only option, I believe. I'm unclear as to whether our Structured Buffer should be created as DEFAULT or DYNAMIC for best performance with UpdateSubresource. Structured Buffers created as DYNAMIC reside permanently in system memory. The data is streamed to the graphics card over the bus as the shader needs it. I assume that when UpdateResource is called, the driver makes a copy of the data in temp memory and then simply copies it to the buffer in system memory when it's safe to. Contrast this with D3D11_MAP_WRITE_NO_OVERWRITE on vertex buffers, where the app can write directly to the destination memory location without the driver making a copy. StructuredBuffers created as DEFAULT reside in video memory with faster access by the shader. When UpdateResource is called, I assume the driver still needs to make a copy of the data in temp memory and then upload it to video memory at a safe time. Still a lot of copying. If anyone has experience with the fastest way to incrementally update Structured Buffers, would love to hear it!
  10. [quote name='MJP' timestamp='1316110690' post='4862152'] BC6 and BC7 are really awesome (HDR, and hi-quality LDR respectively), but only available on DX11-class hardware. There's also not really any tools support for it yet. The D3DX library can encode to it, but it's super slow. There's also a sample in the SDK that does the encoding on the GPU using a compute shader, but it's pretty bare bones and doesn't support cube maps or mipmaps [/quote] MJP, this post wasn't all that long ago, but wondering whether you've since come across a decent compression tool that supports BC6/7.
  11. Could somebody point me to documentation that describes what the MSAA sample offsets are in hlsl SM5, for various MSAA sample counts (2, 4, etc.)? Having trouble locating it. Thanks!
  12. Thanks for the incredibly helpful reply, MJP! [quote] ...but you couldn't also use other blending modes like multiply or screen. [/quote] It's not clear to me why rendering translucent geo into a render target with the blend mode set to multiply wouldn't work. [quote]Just sampling the first subsample and outputting it to SV_Depth should work well enough. Obviously you don't get MSAA with your transparents if you go this route.[/quote] So I bind the depth buffer as a SRV and run the pixel shader at per-pixel frequency by not specifying SV_SampleIndex as an input to the shader? Then, just simply read the depth texture and write it out to SV_Depth? It sounds like this method (depth buffer resolve shader) is a better choice for our application. We draw a lot of translucent particles like smoke and so rendering that into a non-MSAA buffer sounds like less bandwidth. And since the particles tend to have smooth texture edges, MSAA probably wouldn't benefit us much.
  13. I've run into a problem trying to render translucent objects into the scene after the deferred rendering has finished with the opaque objects. Since a picture is worth a thousand words, here's my current DX11 rendering pipeline: [sharedmedia=gallery:images:1545] Since the translucent objects need to sort against the opaque scene, I want to reuse the depth buffer created during the deferred pass. However, the depth buffer is MSAA while the final render target is non-MSAA and so they can't be used together. Here's one possible solution: [sharedmedia=gallery:images:1544] Here, the Lauritzen resolve shader is replaced with a shader that converts the flat StructuredBuffer into an MSAA render target (compute shaders cannot write to MSAA buffers, which is why Lauritzen uses a flat StructuredBuffer that holds all MSAA samples of the image). Since the lit render target is now MSAA, it can be used in conjunction with the MSAA depth buffer to render translucent objects. Finally, the ID3D11DeviceContext::ResolveSubresource() method is used to resolve the MSAA buffer to a non-MSAA buffer such as the back buffer. Before I undertake this approach, I thought it would be a good idea to get feedback from the gurus here on this approach vs. any others that may come up. Here are a few questions: 1) Is it possible to wite such a shader to convert the flat buffer to a hardware compliant MSAA render target (meaning something the hardware can resolve to a non-MSAA buffer)? I'm not so sure this is possible since the flat buffer contains only the sample colors and no coverage mask. 2) If this method isn't possible, what are my alternatives? Can a depth buffer be resolved with ID3D11DeviceContext::ResolveSubresource()? If so, then Method 1 becomes much easier. [EDIT]: I've confirmed that a MSAA depth buffer cannot be resolved to non-MSAA.
  14. Deferred Tech

    Just a place to put some diagrams related to deferred rendering.
  15. Ardilla, not sure about consoles, but on PC tile based has been superior from what I've experienced. Andrew Lauritzen has a paper and full demo with source code that allows you to play around with various methods including tile based vs. quad based: [url="http://visual-computing.intel-research.net/art/publications/deferred_rendering/"]http://visual-computing.intel-research.net/art/publications/deferred_rendering/[/url]
  • Advertisement