Does the "discard" statement incur slowdown?

Started by
4 comments, last by Eric F. 7 years, 3 months ago

I've been reading in a number of articles that using the discard statement in a pixel shader is a bad idea.

My question is this, if I already have an if() statement in my shader, will the addition of discard in the if block cause any slowdown? Or should I just set the alpha of that pixel to zero?

I'm using DX11 for displaying 2D graphics, so I do not do anything fancy and the shader I will be using the discard statement in will only be used to display text in a specific instance. My tests have shown no difference, but I'm afraid I might be missing something.

Thanks!

Advertisement

I am not sure whether it is relevant to what you asked, but from what I learned from my intern experience: on GCN kill pixel in shader will disable EarlyZ so it means your shader will be executed even occluded (unless you use ReZ, which is still slower than EarlyZ), and that will be slow if you ps is expensive and have a lot of overlapped pixels.

It varies. In some cases it can speed things up, and in others it can reduce performance. You need to test for your specific usage, on the hardware that you care about, and see what happens.

One way it can speed things up is when you're hitting memory bandwidth limits on writing to the frame buffer. In my experience this mostly affects lower end graphics cards. In those cases using discard to implement an alpha test so you don't write out fully transparent pixels can improve performance. This might apply to the rendering of a particle system, for example.

On the other hand, in shaders that write to the depth buffer, using discard can hurt performance. This is because using discard has a side effect of disabling some of the hardware's depth buffer optimizations, because it makes the depth output of the shader impossible to predict before the shader runs. Disabling those optimizations can make future draw calls go more slowly, especially ones that would fail the depth test.

In addition note that enabling alpha blending can also have a performance cost - it uses more memory bandwidth than opaque rendering, because it has to do a read-modify-write of the frame buffer instead of just a write.

Typically a shader binary will include a flag internally that specifies whether the program contains a discard instruction or not, because it impacts the GPU at a pretty high level.

Mobile and desktop have very different perf characteristics here. Mobile GPUs tend to use "deferred" architectures internally, which don't actually execute pixel shaders immediately after rasterization - instead they record which triangle covers each pixel, and then run one PS for each pixel at the end.
'Discard' completely messes with this, so these mobile GPUs have to fall back to a less optimized approach.

On PC, 'discard' often messes with early-Z / Hi-Z optimizations, which affects the impact of overdraw in your scenes.

In both cases, it's common to try to render all your opaque objects first, followed by your objects that make use of the discard instruction, followed by transparent/blended objects.

Despite these impacts, as mentioned above, discard can help in other cases. If discarded early, any texture fetches that occur after the discard statement should have no memory bandwidth impact, and the final PS output should also havr no bandwidth impact, assuming you're not using a write-mask.

Here you have a good article about Pixel Shader where the author describes discard too

It says:

Another pixel shader specific is the discard instruction. A pixel shader can decide to “kill” the current pixel, which means it won’t get written. Again, if all pixels inside a batch get discarded, the shader unit can stop and go to another batch; but if there’s at least one thread left standing, the rest will be dragged along. DX11 adds more fine-grained control here by way of writing the output pixel coverage from the pixel shader (this is always ANDed with the original triangle/Z-test coverage, to make sure that a shader can’t write outside its primitive, for sanity). This allows the shader to discard individual samples instead of whole pixels; it can be used to implement Alpha-to-Coverage with a custom dithering algorithm in the shader, for example.

I bolded the relevant part for your question.

Many thanks for the link and information. I will read up and do tests on some different machine.

Much appreciated.

This topic is closed to new replies.

Advertisement