Sign in to follow this  
3TATUK2

peculiar request

Recommended Posts

3TATUK2    714

This is a long shot, because I strongly believe what I'm attempting is not even possible...

 

But, does anyone happen to know of a way, even if it's "hackish" - to restrict the max output# samples/fragments/pixels of a render draw call? Specifically even to 1 max pixel before the rest are somehow discarded...

 

 

I'm using this for a visibility test, i'm rendering octree bounding boxes and only need to know if they're visible. Obviously I wrap the draw in an occlusion query... But, again, I only need to know if even *one* pixel passes. All the pixels rendered after the first passed pixel are superfluous and negatively effect performance

Edited by 3TATUK2

Share this post


Link to post
Share on other sites
mhagain    13430

Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for, but this is really just a hint rather than explicit behaviour.  The same is even more true for GL_ANY_SAMPLES_PASSED_CONSERVATIVE.  I'm not aware of any explicit way of requesting this behaviour, and I suspect that even if one did exist, it may not be possible for some implementations without bypassing optimizations elsewhere; i.e. it may turn out to be slower than just doing the full thing.  I've no evidence for that, just a hunch.

Share this post


Link to post
Share on other sites
Ohforf sake    2052
I would strongly advise against this, unless the driver offers this as an option.

If this is purely for visibility testing of octree nodes, then every fragment should only read a single depth value and not read or write any colors.
You could introduce a global "Has any fragment already passed the depth test" variable (and I'm pretty sure this is possible in OpenGL 4 / DX 11) and let the fragment program discard immediately, if this global flag has already been set.
However you would then have another read (doubling the number of reads per fragment) and force an actual fragment program to be executed for each fragment. I don't know if this still holds, but nVidia cards used to render twice as fast when no fragment program was needed.
Also, I believe discarding kills your hiZ, so you would most likely increase the number of rasterized fragments, instead of decreasing them.

Are you sure, that rendering a couple of boxes without any fragment programs is actually your bottleneck, and that you are not mistaking the latency for the actual rendering speed?

Share this post


Link to post
Share on other sites
mhagain    13430

Are you sure ... that you are not mistaking the latency for the actual rendering speed?

 

This point is key; I'm assuming that this question is related to the OP's other question here.  The term "rudimentary occlusion culling" in that question leads me to suspect that the main cause of the performance differential is the OP fetching the query results in the same frame as the queries are executed.  This is almost absolutely guaranteed to stall the pipeline as in most normal cases the results won't actually be available until a frame or two later.  Trying to fetch them in the same frame will cause all pending GL operations to immediately flush and everything will stall until they've completed: in other words, it completely breaks CPU/GPU asynchronous processing.

 

A better approach is to test if the results are ready yet (using GL_QUERY_RESULT_AVAILABLE?) and if not use the last-known-good result.  If there is no last-known-good result, assume that the object is visible.  This is of course a more complex and more conservative approach that will on occasion draw some things that shouldn't be visible, but it's better than introducing pipeline stalls.

Share this post


Link to post
Share on other sites
michalferko    796

If you still insist (despite everyone not recommending it) on using your idea, you could use an atomic counter variable (with default value 0) and once the first fragment shader executes, increment it. Each fragment shader would test the atomic variable at the beginning, and if it has been set to 1, it will automatically discard the fragment. But it's an ugly hack.

Share this post


Link to post
Share on other sites
3TATUK2    714


Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for

 

This isn't a solution for what I'm requesting. This will simply draw or not draw the entire thing based on a *previous* draw result. I need to not draw specific fragments based on a single fragment passing in *the same* draw call.

 


Are you sure, that rendering a couple of boxes without any fragment programs is actually your bottleneck

 

I'm not 100% sure, but I'm *pretty* sure it at least adds to performance loss. The bounding boxes take up the entire screen since they are recursively within eachother, so it ends up being something like 4*resolution pixels processed. I've tested with less of this pixels processed in this manner and it indeed effects performance.

 


Trying to fetch them in the same frame

 

I don't. But I don't use GL_QUERY_RESULT_AVAILABLE either, because the occurrence of it not being available is so minuscule, it doesn't really effect performance. Instead I just grab the query result for the last frame - which is typically already available, like you suggest, right before the begin/end of the next query for the current frame.

 


you could use an atomic counter

 

I've considered this and it's probably what Ohforf was suggesting... But it's only v4.2+ so not amazing compatibility factor. I'd like to avoid using stuff that's entirely restricted to extremely modern versions only. :)

 

Thanks all.

Share this post


Link to post
Share on other sites
Hodgman    51234

Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for

This isn't a solution for what I'm requesting. This will simply draw or not draw the entire thing based on a *previous* draw result. I need to not draw specific fragments based on a single fragment passing in *the same* draw call.
The only difference between samples-passed and any-samples-passed is that the first returns an integer counter of how many pixels passed the depth test, while the latter returns a Boolean indicating whether any pixels passed the depth test (basically, returning 'counter>0').

If the GPU is capable of short-circuiting a draw call as you're requesting, then using this 'any' version query is a hint to the driver that it should go ahead and perform this short-circuit optimization.

The any-conservative query is the same, but tells the driver that it's allowed to perform the test against the Hi-Z buffer instead of the Z buffer, which will be quicker but less accurate (may return true when the ground-truth answer is false).

So, mhagain has answered the original question perfectly ;-D

The other solutions of implementing an atomic test in the fragment shader will always be slower, because in a typical occlusion querying situation the fragment shader does absolutuely zero work anyway.

FWIW though, in my experience, GPU occlusion queries are a terrible solution for occlusion culling if you're after performance. I'd personally still recommend CPU based solutions...

Share this post


Link to post
Share on other sites
3TATUK2    714

Oh, I wasn't aware that simply using _ANY_ would cause such an early-out . . . I tend to avoid it though cause it doesn't seem to work on one of my linux/intel legacy laptops whereas regular _SAMPLES_ works . . . furthermore I just tested by swapping in _ANY_ and I get the same framerate :] Also, even if _ANY_ does early-exit - it will only be for the query test... The other pixels will still obviously get processed because if you have color mask enabled, you obviously still want to SEE them

Edited by 3TATUK2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this