• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By fs1
      I have been trying to see how the ID3DInclude, and how its methods Open and Close work.
      I would like to add a custom path for the D3DCompile function to search for some of my includes.
      I have not found any working example. Could someone point me on how to implement these functions? I would like D3DCompile to look at a custom C:\Folder path for some of the include files.
    • By stale
      I'm continuing to learn more about terrain rendering, and so far I've managed to load in a heightmap and render it as a tessellated wireframe (following Frank Luna's DX11 book). However, I'm getting some really weird behavior where a large section of the wireframe is being rendered with a yellow color, even though my pixel shader is hard coded to output white. 

      The parts of the mesh that are discolored changes as well, as pictured below (mesh is being clipped by far plane).

      Here is my pixel shader. As mentioned, I simply hard code it to output white:
      float PS(DOUT pin) : SV_Target { return float4(1.0f, 1.0f, 1.0f, 1.0f); } I'm completely lost on what could be causing this, so any help in the right direction would be greatly appreciated. If I can help by providing more information please let me know.
    • By evelyn4you
      i try to implement voxel cone tracing in my game engine.
      I have read many publications about this, but some crucial portions are still not clear to me.
      At first step i try to emplement the easiest "poor mans" method
      a.  my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo)
      b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure
      c. every voxel does have the same color for every side ( top, bottom, front .. )
      d.  one directional light injects light to the voxels ( another stuctured buffer )
      I will try to say what i think is correct ( please correct me )
      GI lighting a given vertecie  in a ideal method
      A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie
      B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.
      C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays
      Voxel GI lighting
      In priciple we want to do the same thing with our voxel structure.
      Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels.
      Saving time for weighted summing up of colors of each voxel
      To save the time for weighted summing up of colors of each voxel we build bricks or clusters.
      Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ).
      The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation.

      After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color.
      Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture.
      Cone tracing, howto ??
      Here my understanding is confus ?? How is the voxel structure efficiently traced.
      I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample.
      Supposed,  i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone
      - i would see some single voxels near or far
      - i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded
      How do i make a weighted sum of this ligting area ??
      e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded.
      Please be patient with me, i really try to understand but maybe i need some more explanation than others
      best regards evelyn
    • By Endemoniada

      Hi guys, when I do picking followed by ray-plane intersection the results are all wrong. I am pretty sure my ray-plane intersection is correct so I'll just show the picking part. Please take a look:
      // get projection_matrix DirectX::XMFLOAT4X4 mat; DirectX::XMStoreFloat4x4(&mat, projection_matrix); float2 v; v.x = (((2.0f * (float)mouse_x) / (float)screen_width) - 1.0f) / mat._11; v.y = -(((2.0f * (float)mouse_y) / (float)screen_height) - 1.0f) / mat._22; // get inverse of view_matrix DirectX::XMMATRIX inv_view = DirectX::XMMatrixInverse(nullptr, view_matrix); DirectX::XMStoreFloat4x4(&mat, inv_view); // create ray origin (camera position) float3 ray_origin; ray_origin.x = mat._41; ray_origin.y = mat._42; ray_origin.z = mat._43; // create ray direction float3 ray_dir; ray_dir.x = v.x * mat._11 + v.y * mat._21 + mat._31; ray_dir.y = v.x * mat._12 + v.y * mat._22 + mat._32; ray_dir.z = v.x * mat._13 + v.y * mat._23 + mat._33;  
      That should give me a ray origin and direction in world space but when I do the ray-plane intersection the results are all wrong.
      If I click on the bottom half of the screen ray_dir.z becomes negative (more so as I click lower). I don't understand how that can be, shouldn't it always be pointing down the z-axis ?
      I had this working in the past but I can't find my old code
      Please help. Thank you.
    • By turanszkij
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
  • Advertisement
  • Advertisement
Sign in to follow this  

DX11 Does the "discard" statement incur slowdown?

This topic is 463 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been reading in a number of articles that using the discard statement in a pixel shader is a bad idea. 


My question is this, if I already have an if() statement in my shader, will the addition of discard in the if block cause any slowdown? Or should I just set the alpha of that pixel to zero?


I'm using DX11 for displaying 2D graphics, so I do not do anything fancy and the shader I will be using the discard statement in will only be used to display text in a specific instance. My tests have shown no difference, but I'm afraid I might be missing something.



Share this post

Link to post
Share on other sites

I am not sure whether it is relevant to what you asked, but from what I learned from my intern experience: on GCN kill pixel in shader will disable EarlyZ so it means your shader will be executed even occluded (unless you use ReZ, which is still slower than EarlyZ), and that will be slow if you ps is expensive and have a lot of overlapped pixels.

Edited by Mr_Fox

Share this post

Link to post
Share on other sites

It varies. In some cases it can speed things up, and in others it can reduce performance. You need to test for your specific usage, on the hardware that you care about, and see what happens.


One way it can speed things up is when you're hitting memory bandwidth limits on writing to the frame buffer. In my experience this mostly affects lower end graphics cards. In those cases using discard to implement an alpha test so you don't write out fully transparent pixels can improve performance. This might apply to the rendering of a particle system, for example.


On the other hand, in shaders that write to the depth buffer, using discard can hurt performance. This is because using discard has a side effect of disabling some of the hardware's depth buffer optimizations, because it makes the depth output of the shader impossible to predict before the shader runs. Disabling those optimizations can make future draw calls go more slowly, especially ones that would fail the depth test.


In addition note that enabling alpha blending can also have a performance cost - it uses more memory bandwidth than opaque rendering, because it has to do a read-modify-write of the frame buffer instead of just a write.

Edited by Adam_42

Share this post

Link to post
Share on other sites
Typically a shader binary will include a flag internally that specifies whether the program contains a discard instruction or not, because it impacts the GPU at a pretty high level.

Mobile and desktop have very different perf characteristics here. Mobile GPUs tend to use "deferred" architectures internally, which don't actually execute pixel shaders immediately after rasterization - instead they record which triangle covers each pixel, and then run one PS for each pixel at the end.
'Discard' completely messes with this, so these mobile GPUs have to fall back to a less optimized approach.

On PC, 'discard' often messes with early-Z / Hi-Z optimizations, which affects the impact of overdraw in your scenes.

In both cases, it's common to try to render all your opaque objects first, followed by your objects that make use of the discard instruction, followed by transparent/blended objects.

Despite these impacts, as mentioned above, discard can help in other cases. If discarded early, any texture fetches that occur after the discard statement should have no memory bandwidth impact, and the final PS output should also havr no bandwidth impact, assuming you're not using a write-mask.

Share this post

Link to post
Share on other sites

Here you have a good article about Pixel Shader where the author describes discard too


It says:



Another pixel shader specific is the discard instruction. A pixel shader can decide to “kill” the current pixel, which means it won’t get written. Again, if all pixels inside a batch get discarded, the shader unit can stop and go to another batch; but if there’s at least one thread left standing, the rest will be dragged along. DX11 adds more fine-grained control here by way of writing the output pixel coverage from the pixel shader (this is always ANDed with the original triangle/Z-test coverage, to make sure that a shader can’t write outside its primitive, for sanity). This allows the shader to discard individual samples instead of whole pixels; it can be used to implement Alpha-to-Coverage with a custom dithering algorithm in the shader, for example.


I bolded the relevant part for your question.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement