Archived

This topic is now archived and is closed to further replies.

GL_occlusion_query... how good are they ?

This topic is 4982 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Let me first explain what i am doing. I have a Binary tree ( Axis Aligned ) and each Node has an average poly count of around 200. I am rendering my world from near to far to save fill-rate and also do the occlusion culling. For occlusion culling, i am basically drawing the bounding box (AABB) and testing whether it is visible and then rendering the polys withing it. But unfortunately, the FPS actuallys falls quite a lot when i turn on the occlusion culling; btw, i am using GL_NV_occlusion_query for the purpose. In one circumstance, the number of nodes rendered came out to be only 3 with occlusion culling turned on and the fps was around 100 (with VSYNC turned off ofcourse) but when i turn off the occulsion culling the FPS shoots above 130 even when itz rendering around 50 nodes. One issue could be the fact that NV_occlusion_query counts pixel which i really dont need. All i need is whether the AABB is visible or not and for that i need only to know whether atleast one pixel ended up on the frame buffer or not. I dont need to go through the whole AABB rendering and then count pixels. Just for your information..... 1) I am rendering the Tree from front to back 2) Rendering the AABB on NV_occlusion_query with back face culling on and with all effects turned off Does anyone have any suggestion or ideas on how to go about doin occlusion culling at runtime without using editors and manually set a lot of junks like "occluders."

Share this post


Link to post
Share on other sites
I too am using the NV occlusion queries, and yes they can kill your framerate, apparently its from stalling the pipeline while waiting on the result (although I dont know a great deal about these things).
Anyways, I''ve been getting good results, which is rather the opposite of what people have stated on this site, then again I''m doing things a little differently.
Firstly, I frustum cull my scene, marking non visible nodes and occluders as I go (yes, I use occluders).Next I render those occluders, and setup the occlusion queries, traverse treenodes that are visible to the frustum and render its aabb. Finally I read back the results and mark the non visible nodes. The remaining nodes are the portions of the tree that are visible.

http://xout.blackened-interactive.com/dump/new/XenGine08.jpg

Share this post


Link to post
Share on other sites
Simply put, when doing work with the graphics card, you either _send_ stuff, or you _get_ stuff. Thus, when rendering 50 nodes all at once, you will only have to _send_ stuff. If you query the card for information once in a while rendering, the pipeline will sorta "switch" between _send_ and _get_ and vice versa, which takes a lot of time.
The trick is to bunch up all _sends_ and all _gets_ and get do as little "switches" as possible. Find _all_ the nodes you want to test at one moment, render their occlusion geometry, query the info, get _all_ the next nodes you want to test and so on. Also remember, that sometimes conservative culling ("diffuse culling") can be faster than exact culling.

Share this post


Link to post
Share on other sites
Well.. do to put those occluders manually with some editor or do u somehow pre-calculate occluders ? And if u do pre-calculate them then how do you do it ?

Share this post


Link to post
Share on other sites
The main reason that using occlusion culling slows things down is that it stalls the pipeline. Normally when you call a drawing function in the API the command just gets added to a command queue and the function returns before anything is actually drawn. The graphics card works its way through the command queue as fast as it can in parallel with the CPU. It's fairly common for the driver to have up to 3 frames worth of commands queued at any point. When you try and read data back from the GPU you break the parallelism - the CPU has to sit waiting for the GPU to 'catch up' by completing all the queued commands necessary to return the data you've asked for. It's not only occlusion queries that cause these kinds of stalls, but anything which involves reading data back which is why ATI and nVIDIA are always telling us not to do that. Things like locking the z-buffer and reading values, locking a render target and reading values, etc. all cause pipeline stalls.

The best way to avoid these kinds of stalls is to avoid reading anything back whenever you can and wait as long as possible for the answer when you absolutely have to read data back. The longer you wait between issuing the drawing command and trying to read back the data you need the better. Supposedly the occlusion query stuff is going to be made more useful by allowing draws that are conditional on the results of an earlier occlusion query (i.e. the driver / GPU can check the results of the query when they're ready to draw the potentially occluded geometry without any intervention from the application) but I don't know when that functionality is going to be exposed. I think it's only going to be available on the NV40 (and maybe the new ATI part when it comes out) anyway.

[edited by - mattnewport on April 26, 2004 10:58:43 PM]

Share this post


Link to post
Share on other sites
I''ve read on this forum before that you can get better performance from by drawing the previous frame before fetching the occlusion results, I cant verify this, but I intend trying it out.
I''ll let you know how it goes.
In reply to the occluder question, I place mine manually with my map editor (worldcraft) I just use simple shapes like boxes that are inside large possible occluders. I render all of the possible occluders in one swoop.

Share this post


Link to post
Share on other sites