glOcculsio query and funny perfomance.

Started by
1 comment, last by biki_ 16 years, 11 months ago
I am sure this is not a shocker to some of those who use glOcclusion query, bu I though I'd share with you all some experience I've had with it: 1) (obvious) don't ask for the samples passed value unless it is ready, otherwise you get a GPU stall and performance goes to the toilet. 2) if you have a hiearchy of visiblility tests, make a stack of occlusion queries, and regulary try to update the stack, i.e. check if an occlusion query is available and if so then fetch it, if not enough samples passed then make it os that your stacks says all above that one also did not pass... and also regularly check that stack and if it says "not enuf samples" then bail out at that visibility layer.. but the next one made my jaw drop to the ground: I have a portalling system, behind each portal is a set of meshes and another set of portals, I made an "Occlusion Stack" which I push an element onto whenever I recurse into a portal: if I am at spot S with meshes S_M and Portals S_P, first I quasi-draw the portals (i.e. color and dpeth buffer masked) each having their own unique occlusion query id, then for each mesh m of S_M I draw the mesh, I check if the occlusion stack has any elements that I come from reporting "not enough samples passed", if so, I immediatly return, otherwise I advance to the next mesh in S_M. Then for each portal I push my occlusion stack with the portal's occlusion query id, then recurse into the spot on the other side of the portal and go on... once that recurse returns I pop the occlustion query stack and move onto the next portal, I though that was a good system, since I never force a GPU stall and such... but now the part where my jaw dropped to the ground: apparently when I enabled optimizations it appears that because the CPU is not doing anything, the GPU does not have time to do stuff between draw mesh calls, so all the DrawMesh calls get queued up by the video card driver and none of the occulsion queries are ready, so nothing is listed as occluded, but when I do swap buffers, then the GPU has to finish, so the system slows down.. in effect the optimized build runs so much slower than the unoptimzied build because of this... I was wondering how to rectify this... one temptation I have is to run the physics and game rules in an entrily seperate thread. So when it comes to draw, I ready the drawing stuff state (i.e. position to draw, state to draw, ect) and I just query the physics system whatever the object's stuff is at, and just let the physics and game rule thread run mostly on it's own independent of the rendering thread, in the hopes that by having two threads (and later a third for sound) that the CPU would be busy anyways enough between draw mesh calls to do something so that enough of the occulsion queries are "ready".... Anyone have thoughts/suggestions?
Close this Gamedev account, I have outgrown Gamedev.
Advertisement
perhaps do something thats not to concerned with exact physical correctness
eg particle systems

draw occlision stuff
do particle stuff on cpu // usually requires a bit of cpu work
get occlusion results

try using FIFO instead of stack
here is a small example:
http://iubar.org/~biki/query2.zip
it's polygon soup with kd-tree and automatic portals
then it uses occlusion query and fifo.
not perfect but much better than stack (more distance between
issuing occlusion query and fetching result)

This topic is closed to new replies.

Advertisement