• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
3TATUK2

peculiar request

7 posts in this topic

This is a long shot, because I strongly believe what I'm attempting is not even possible...

 

But, does anyone happen to know of a way, even if it's "hackish" - to restrict the max output# samples/fragments/pixels of a render draw call? Specifically even to 1 max pixel before the rest are somehow discarded...

 

 

I'm using this for a visibility test, i'm rendering octree bounding boxes and only need to know if they're visible. Obviously I wrap the draw in an occlusion query... But, again, I only need to know if even *one* pixel passes. All the pixels rendered after the first passed pixel are superfluous and negatively effect performance

Edited by 3TATUK2
0

Share this post


Link to post
Share on other sites

Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for, but this is really just a hint rather than explicit behaviour.  The same is even more true for GL_ANY_SAMPLES_PASSED_CONSERVATIVE.  I'm not aware of any explicit way of requesting this behaviour, and I suspect that even if one did exist, it may not be possible for some implementations without bypassing optimizations elsewhere; i.e. it may turn out to be slower than just doing the full thing.  I've no evidence for that, just a hunch.

1

Share this post


Link to post
Share on other sites
I would strongly advise against this, unless the driver offers this as an option.

If this is purely for visibility testing of octree nodes, then every fragment should only read a single depth value and not read or write any colors.
You could introduce a global "Has any fragment already passed the depth test" variable (and I'm pretty sure this is possible in OpenGL 4 / DX 11) and let the fragment program discard immediately, if this global flag has already been set.
However you would then have another read (doubling the number of reads per fragment) and force an actual fragment program to be executed for each fragment. I don't know if this still holds, but nVidia cards used to render twice as fast when no fragment program was needed.
Also, I believe discarding kills your hiZ, so you would most likely increase the number of rasterized fragments, instead of decreasing them.

Are you sure, that rendering a couple of boxes without any fragment programs is actually your bottleneck, and that you are not mistaking the latency for the actual rendering speed?
1

Share this post


Link to post
Share on other sites

Are you sure ... that you are not mistaking the latency for the actual rendering speed?

 

This point is key; I'm assuming that this question is related to the OP's other question here.  The term "rudimentary occlusion culling" in that question leads me to suspect that the main cause of the performance differential is the OP fetching the query results in the same frame as the queries are executed.  This is almost absolutely guaranteed to stall the pipeline as in most normal cases the results won't actually be available until a frame or two later.  Trying to fetch them in the same frame will cause all pending GL operations to immediately flush and everything will stall until they've completed: in other words, it completely breaks CPU/GPU asynchronous processing.

 

A better approach is to test if the results are ready yet (using GL_QUERY_RESULT_AVAILABLE?) and if not use the last-known-good result.  If there is no last-known-good result, assume that the object is visible.  This is of course a more complex and more conservative approach that will on occasion draw some things that shouldn't be visible, but it's better than introducing pipeline stalls.

0

Share this post


Link to post
Share on other sites

If you still insist (despite everyone not recommending it) on using your idea, you could use an atomic counter variable (with default value 0) and once the first fragment shader executes, increment it. Each fragment shader would test the atomic variable at the beginning, and if it has been set to 1, it will automatically discard the fragment. But it's an ugly hack.

0

Share this post


Link to post
Share on other sites


Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for

 

This isn't a solution for what I'm requesting. This will simply draw or not draw the entire thing based on a *previous* draw result. I need to not draw specific fragments based on a single fragment passing in *the same* draw call.

 


Are you sure, that rendering a couple of boxes without any fragment programs is actually your bottleneck

 

I'm not 100% sure, but I'm *pretty* sure it at least adds to performance loss. The bounding boxes take up the entire screen since they are recursively within eachother, so it ends up being something like 4*resolution pixels processed. I've tested with less of this pixels processed in this manner and it indeed effects performance.

 


Trying to fetch them in the same frame

 

I don't. But I don't use GL_QUERY_RESULT_AVAILABLE either, because the occurrence of it not being available is so minuscule, it doesn't really effect performance. Instead I just grab the query result for the last frame - which is typically already available, like you suggest, right before the begin/end of the next query for the current frame.

 


you could use an atomic counter

 

I've considered this and it's probably what Ohforf was suggesting... But it's only v4.2+ so not amazing compatibility factor. I'd like to avoid using stuff that's entirely restricted to extremely modern versions only. :)

 

Thanks all.

0

Share this post


Link to post
Share on other sites

Using GL_ANY_SAMPLES_PASSED may allow your driver to perform the more efficient test you're looking for

This isn't a solution for what I'm requesting. This will simply draw or not draw the entire thing based on a *previous* draw result. I need to not draw specific fragments based on a single fragment passing in *the same* draw call.
The only difference between samples-passed and any-samples-passed is that the first returns an integer counter of how many pixels passed the depth test, while the latter returns a Boolean indicating whether any pixels passed the depth test (basically, returning 'counter>0').

If the GPU is capable of short-circuiting a draw call as you're requesting, then using this 'any' version query is a hint to the driver that it should go ahead and perform this short-circuit optimization.

The any-conservative query is the same, but tells the driver that it's allowed to perform the test against the Hi-Z buffer instead of the Z buffer, which will be quicker but less accurate (may return true when the ground-truth answer is false).

So, mhagain has answered the original question perfectly ;-D

The other solutions of implementing an atomic test in the fragment shader will always be slower, because in a typical occlusion querying situation the fragment shader does absolutuely zero work anyway.

FWIW though, in my experience, GPU occlusion queries are a terrible solution for occlusion culling if you're after performance. I'd personally still recommend CPU based solutions...
0

Share this post


Link to post
Share on other sites

Oh, I wasn't aware that simply using _ANY_ would cause such an early-out . . . I tend to avoid it though cause it doesn't seem to work on one of my linux/intel legacy laptops whereas regular _SAMPLES_ works . . . furthermore I just tested by swapping in _ANY_ and I get the same framerate :] Also, even if _ANY_ does early-exit - it will only be for the query test... The other pixels will still obviously get processed because if you have color mask enabled, you obviously still want to SEE them

Edited by 3TATUK2
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0