Why discard pixel take a noticeable performance hit?

Started by
7 comments, last by xlope01 10 years ago

something as simple as that


if(textureColor.a < 0.5f)
	discard;

take around 25% hit in frame rate (assuming all the objects on your scene use that)

Advertisement

Normally, GPUs can do depth testing and writing before running a fragment shader. When there's a possibility that a fragment might be discarded, then the GPU has to do the depth test/depth write after running the fragment shader (else it risks writing pixels to the z-buffer that might end up discarded). This means that simple addition can cause a lot of extra work to be done.

Also, GPUs have various optimisations for depth buffers such as hierarchical depth and depth buffer compression going on under the hood, it's possible that these are also being bypassed by the possibility of a discard.

That's just a simple explanation. For more detail this article is great: http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/ (parts 7, 8 and 9 look relevant)

For a discarded pixel you've already paid the cost of its fragment shader even though it doesn't writa anything. Further, you'll have to at least run the fragment shader for the visible geometry behind it, so that's an added cost. This really adds up if there's lots of overlapping discarded pixels - lots of fragment shader computation performed, with the results thrown away.

Things that you may try in this case:

- use discard / clip only with objects absolutely needing it, so fully opaque objects should use a pixel shader without discard / clip in it

- optimize meshes - works best with particles maybe, but optimizing the mesh size / form in respect of the texture may reduce the pixel shader invocations

Cheers!

Some other tips:

  • If you can replace the discard call with an alpha blending function you'll be able to remove it entirely. You'll pay some overhead for blending of course, but it should balance out in your favour. Look at the step function for one way of handling "< 0.5" cases. For your example, something like "output.a *= step (0.5, textureColor.a)" with an alpha blending mode would be appropriate.
  • If you have a GS active, and if you can move the comparison you're making the discard on to the GS, you could have that output a zero-area triangle instead of discarding in the pixel shader. You'll pay some overhead for the GS branching here, but the triangle won't even get rasterized, so again you should win.
  • If you're always comparing texture alpha to a fixed value then you could just hack the texture alpha values at load time - if they're less than 0.5 slam them to 0 there.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

One reason to do what the op does is that it can give smooth edges on magnified textures, and it can be easily used regardless of rendering order. The reasons for the slowdown seems outlined by others, and I just wanted to point out that I have seen the opposite behavior, where adding discard for alpha < 0.5 increases performance for alpha-blended triangles that have large areas with alpha = 0. However, when alpha-blended geometry is drawn back to front as it was in my case, there is no need for depth writes so there were no conditional depth writes.

Using the technique only on triangles that need it (and if the reason for it is not rendering order, possibly combined with drawing affected geometry last) should limit the performance impact.

  • If you can replace the discard call with an alpha blending function you'll be able to remove it entirely. You'll pay some overhead for blending of course, but it should balance out in your favour. Look at the step function for one way of handling "< 0.5" cases. For your example, something like "output.a *= step (0.5, textureColor.a)" with an alpha blending mode would be appropriate.

I'm not sure if I agree with this device. The main issue with switching to alpha blending isn't the blending itself, but the fact that you can no longer write to your depth buffer when using it. This can result in overdraw which increases your shader costs, and also poses problems for deferred techniques or post-processing stages that rely on the depth buffer.

  • If you can replace the discard call with an alpha blending function you'll be able to remove it entirely. You'll pay some overhead for blending of course, but it should balance out in your favour. Look at the step function for one way of handling "< 0.5" cases. For your example, something like "output.a *= step (0.5, textureColor.a)" with an alpha blending mode would be appropriate.

I'm not sure if I agree with this device. The main issue with switching to alpha blending isn't the blending itself, but the fact that you can no longer write to your depth buffer when using it. This can result in overdraw which increases your shader costs, and also poses problems for deferred techniques or post-processing stages that rely on the depth buffer.

That depends on exactly what the OP is trying to achieve. It would be useful to actually have that info as right now I'm feeling that we've jumped into the middle of a problem-solving exercise and are trying to assist with the OP's chosen solution, without knowing what the original objective that prompted it was. There may be other completely different approaches.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I need to discard the pixels so the alpha part of the texture will not be visible, that i use for the foliage. I can minimize the performance hit by using "discard" only on the parts that i need, however doing this I have to break down the objects creating more objects meaning more cpu load.(example a building have some foliage on it and it is one object all together, now will became two object).

This topic is closed to new replies.

Advertisement