discard in FS vs stencil test speed question.

Started by
5 comments, last by kRogue 16 years, 6 months ago
A simple question for those out there (and if anyone says profile yourself to see, that would only test it on _my_ hardware and not others, so please experience wanted with this issue). There are a few ways that one can discard fragments: depth testing, alpha testing, stencil testing and finally discard from withn a fragment shader. Assuming the GPU does dynamic branching in the fragment shader (so GeForce FX and lower are not considered here) how much slower is it to have as the first line in a fragment shader an if() which if true does discard versus doing stencil test. I ask this because I am doing a pretty advanced differed shading system, where each pixel has a lighting shader, and an optional fx shader (which is done after lighting). Right now I am packing the stencil buffer with which shaders to use, however the stencil buffer is only 8bits... making matters worse, I need to use atleast 3 of those bits for another purpose, so I am left with only five(!) bits, as of now, I use only 1 bit for lighting shader selection and the remaining 4 for FX (which means I have 2 different lighting shaders and 15 fX shaders {as value 0 means no FX}) which is kind of tight... but if I used one bit for FX (indicating if there is an FX) and the remaining 4 bits for lighting that would be great, and which FX would be encoded in one of the fx's buffer's channels, and at the beginning of the each FX shader I'd simple do "if(int(texture.a)!=fx_buffer_id) discard;" but before coding this up (as it is non-trivial) I am wondering how much does that extra testing that runs in the very beginning of the fragment shader cost? a second question for those with experience, how many textures look ups can one do per-pixel before pushing your luck? right now my lighting shaders do up to 17 ( of which 9 are dependent on a first texture lookup) how slow are texture lookups in the fragment shader? [Edited by - kRogue on October 15, 2007 12:23:16 AM]
Close this Gamedev account, I have outgrown Gamedev.
Advertisement
A interesting counter-question: what exactly is glDiscard?

Projects: Top Down City: http://mathpudding.com/

I probbaly mean using the "discard" command within a fragment shader, my bad on calling it glDiscard.
Close this Gamedev account, I have outgrown Gamedev.
You got two questions, right ?

1. How fast is the command 'discard' ?
Well, this could be quite slow from what I've heard so far (a matter with early-z-rejection not working properbly). Try to avoid using the discard command and instead use alpha-testing or whatever to discard your fragment.

2. How fast is dynamic branching ?
This depends on your hardware and your shader speed. But with newer hardware avoiding expensive shader calculation by using dynamic branching should be much faster than a multipass approach or executing always all shaders in a single pass.

Your code should look something like this:

if( cond1) {
.. shader 1
} else if(cond2) {
.. shader 2
...
} else {
// discard fragment by setting alpha to 0
final_color.a = 0.0;
}


--
Ashaman


Quote:Original post by Ashaman73
You got two questions, right ?

1. How fast is the command 'discard' ?
Well, this could be quite slow from what I've heard so far (a matter with early-z-rejection not working properbly). Try to avoid using the discard command and instead use alpha-testing or whatever to discard your fragment.


In case you didn't know, alpha testing has the same issue.
discard is better for SM3 hw because it can kill a fragment early and stop running the fs.

I think stencil or depth testing is quicker way to get rid of fragments as long as you don't modify the depth value of a fragment. Not sure about this one.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Quote:Original post by kRogue
a second question for those with experience, how many textures look ups can one do per-pixel before pushing your luck? right now my lighting shaders do up to 17 ( of which 9 are dependent on a first texture lookup) how slow are texture lookups in the fragment shader?


Texture lookups are one of the slowest operations but what's the alternative?
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
The use of the the discard is for differed shading, so early z-kill does not matter, to be precise, my rendering system draws stuff to some offscreen buffers (via framebuffer_ext). Each pixel chooses what lights, what differed lighting shader and what differed fx shader (if any to run). All lights for objects are stored in a texture. The lighting shaders look up the position, color, attenuation, direction and other properties of the lights from the texture that stores the light data. From there it does the lighting calculation on each pixel, note that I can have different light sources for every object, but the lighting is all done by drawing one rectangle to the screen. The choices of the lighting shader and FX shader are stored in the stencil buffer. So if a total of N lighting shaders are in use, I draw N-rectangles each with a different value for the stencil test and for each FX shader in use I draw a rectangle with a different stencil value for the stencil test. Notice that at this stage the depth buffer is not in use at all, so depth testing and early z-cull do not come into play. But now the question, which would be faster: the current system of setting the stencil reference value and using stencil testing for each shader, or to add an "if() discard;" to each lighting and fx shader? {in truth chances are that I will only have 2 lighting shaders in use, but possible many different FX shaders)

P.S. I made a mistake on how many texture lookups I have to do, I need to do between 1-8 lookups for the properties of the pixel and an additional 4 texture look ups _per_ light... currently my system gets 20FPS on my hardware (Athlon XP, GeForce 6600GT) when I have on the screen (same results if I have 20+ portalled rooms each with over 40 MD5 models), adding more stuff does not really drop the frame rate, it does eventually, but lots of stuff has to be added.. so I *think* tha my bottle next is probably at the differed shading stage, but if having discard to the Fragment shaders is the same speed as the stencil tests, then I want to do it, but there is a fair amount of stuff I would need to do to get there.... and I'd have to spend many hours on something only to see it was a waste of time if someone with experience coul dhav eotld me "could have otld you that discard would have sucked big time".


Edit: I found an nVidia white paper that sates that early stencil culling is disabled as soon as the the stencil function, mask, or reference value is chagned or when the stencil value can change when the stencil test fails. Natually it stays disabled until a glClear is issued... so to use early stencil culling is non-trivial!

[Edited by - kRogue on October 16, 2007 7:03:14 AM]
Close this Gamedev account, I have outgrown Gamedev.

This topic is closed to new replies.

Advertisement