Sign in to follow this  
kRogue

occulsion query costs and depth buffer writing.

Recommended Posts

I have often seen recommended that before doing any "real" rendering to first draw to just the z-buffer all one's stuff (but with fragment shader that only just has glFragColor=vec4(1,1,1,1) ) to write to the z-buffer and then to draw again with "the real" fragment shader so that one does not computer more than one lighting calculation per pixel... now since one is drawing anyways to the z-buffer, why not also run an occulsion query for each object, and if enough pixesl for the object pass then actually draw it... the advantage being that one will not have to reprocess thoses vertices of objects which are "almost" hidden.. but the question comes up, how expensive is the occulsion query?

Share this post


Link to post
Share on other sites
Technically speaking an occulsion query it's self isn't expensive as it's performed inline with drawing.

The problem comes with the results read back, as geometry has to be rendered and the results sent back to the card and with command queues and frame buffering this can be a significant delay. If you take the naive methods and try to do test-readback-render you'll be stalling both the the GPU and CPU.

GPU Gems 2 has a good chapter on over coming this problem however (I happen to read it just the other night in fact), the short version being you have to make some assumptions based on the way the last frame was rendered and given the fact that between frames very little really changes.

The optimisation you mention I've only heard of doing it with a z-only pass or a z-pass with a very basic colour pass (ambient term for example), never with a constant colour.

Share this post


Link to post
Share on other sites
What you are trying to do is somethinca called early z testing. Typically speaking, you would clear depth, render to depth only (i.e. no colour, no shaders, no state changes). Turn depth test off, render to colour only using shaders/states...etc

Thats what Doom3/Quake4 uses to minimise shader calculations on back faces. Rather neat I think :)

Regards,
DarkProphet

Share this post


Link to post
Share on other sites
Quote:

What you are trying to do is somethinca called early z testing. Typically speaking, you would clear depth, render to depth only (i.e. no colour, no shaders, no state changes). Turn depth test off, render to colour only using shaders/states...etc


don't I need to keep depth testing on? otherwise "any" object will draw to the pixel.. I guess one would turn depth writing off though...


Quote:

GPU Gems 2 has a good chapter on over coming this problem however (I happen to read it just the other night in fact), the short version being you have to make some assumptions based on the way the last frame was rendered and given the fact that between frames very little really changes.


so basicly, one uses the previous frame to decide what to cull... so does one do this:

1st time drawing: a)draw everything, running one's occulsion tests.
b)once all drawing commands are done, fetch the results of the occulsion test and save them.

2nd frame: a) using the results of the occulsion tests of the previous frame, decide what to draw. When drawing run the occulsion tests as well.

b) Possibly also draw objects which "dont have enough pixels" to the depth buffer as well running an occulsion test on them as well.

3rd frame: same,

etc...


the ugly fine tuning part being b), I suppose lots of folk may in fact do HOM's on CPU often.. ehh.. I have no wish to write a Z-drawer in asm to do that though... ewww....


but sounds like the GPU does a great deal amouont of buffering of the drawing commands before it decides to really draw... I suppose it probably flushes the drawing commands when one draws, changes shader, and draws again... or when one changes certian states (namely alpha blending)... on that subject: notice that if one has the typcial setup of depth testing on and no stencil buffer using, then the order in which drawing commands are executed does not matter.... thus I imagine that in those cases the GPU's driver will reorder the drawing commands even... but if stencil testing is on or belnding is on, then the GPU has to draw stuff in the order of the draw commands, which probbaly means it "does" the commands whenever shaders change... what you think? if so, then whenever the shader changes, then get occulsion results... so if that is good then one can do this:

for each frame: for each object do an occulsion thingy and draw to z-buffer only. After all z-writing is done, then fetch and save occulsion results. Use these results to decide what to draw on the color/lighting pass....the main effect here is that we avoid doing the vertex processing twice for those objects that are not visible...


any thoughts?

Share this post


Link to post
Share on other sites
Well, the command queue isn't flushed on a shader change, it'll cause a GPU state flush/update when that command is executed but the stream continues anyways.

The problem with your method is that you are doing work you don't need to. Grouping by nodes of a spatial hierarchy and walking down the nodes until you get to invisable ones will cost you less in the right scenes.

Also, the lack of temporal coherance in your method means, again, you'll end up doing more work than required.

That said, a z-first pass with occulsion queries is a good combination indeed, but if you want to do it well and waste as little time as possible I do urge you to take a look at the article in GPU Gems 2 (it's also a good book in general), as it does present an execellent system imo.

Share this post


Link to post
Share on other sites
Quote:
Original post by phantom
That said, a z-first pass with occulsion queries is a good combination indeed, but if you want to do it well and waste as little time as possible I do urge you to take a look at the article in GPU Gems 2 (it's also a good book in general), as it does present an execellent system imo.

Indeed this is a great article. But if you don't have GPU Gems 2 lying around and you don't want to buy it just for one article (it has many great articles, but you may not be interested in them) then check the on-line versions of both the article phantom mentioned and its source code from the book.

Coherent Hierarchical Culling : Hardware Occlusion Queries Made Useful
Source code from the chapter in GPU Gems 2

HellRaiZer

Share this post


Link to post
Share on other sites
hmm... I did some thinking earlier today, and I was considering this method, it avoids forcing the GPU to do a draw, but well, take a look and offer your inputs:

Frame 0: a) glClear(DEPTH, COLOR)
b) caclualte data for the "world and models" to render.
c) issue glDraw* commands to draw worlds and models
d) clear depth buffer
e) for each node/object, start a unique occulsion query.
1) draw the object to depth buffer only.
2) stop query.
f) caclualte data for the "world" to render for frame 1.
g) swap screen buffers or glFinish
h) retrieve results of all occulsion queries

Frame n(n>=1): a) glClear(COLOR)
b) for each node/object draw object to screen if and only if the associated occulsion query says "enough pixels visible"
c) calculate data for the "world" to render for frame n+1
d) clear z-buffer.
e) for each node/object, start a unique occulsion query.
1) draw the object to depth buffer only.
2) stop query.
f) swap screen buffers or glFinish
g) retrieve results of all occulsion queries



my thinking is that the glFinish/swap buffers forces the GPU to draw the stuff anways, and so if you ask for the results of the occulsion query just after doing that but before doing any other drawing commands, then you will not stall.... or atleast I hope so.... here a node can be a hole room with stuff in it which in turn has it's own child nodes, etc...

the only part that seems fishy is that I do not take advantage of any temporal coherence....


edit: note that this algorithm mostly just stops reprocesing those vertices of models which are totally occluded, thus for those models, the early z-drawing does not cost anything extra, but for drawn models it still costs...

I took a look at that paper, the idea is reasonably simple, and can be implemetned with almost any heirachry one has (it need not be kD-trees) and it tempts me the most *right* now... the mess I am making has portals, and I want to do an occulsion query on the portal to decide weather or not to draw beyond it... that paper gives a reasonable way to keep the GPU and CPU busy... and takes advantage of the previous frame helping to decide what nodes are worth doing (in my case the occulsion query queue might be quite small, just the portals) but that method is friendly only as long as drawing order of nodes does not matter too much.... but in my situation, a parent node _MUST_ be rendered before a child node (this is because the portals are like gateways, not just doorways) so it would complicate my life some: I cannot process a child until I have processed it's parent... (this is because the stencil buffer is used at each portal, stuff beyond the portal is rendered if and only if the pixel shows through the portal, so the stencil buffer must be written to before hand

[Edited by - kRogue on December 15, 2006 7:17:40 AM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this