Occlusion again speed issues

Started by
29 comments, last by Soiled 19 years, 6 months ago
I’ve been re-reading and re-reading the thread on "Occlusion Culling" where Yann L gives his golden inputs. But I just do not understand this info from him: From Yann L: [START] Now, a few months ago, I decided to reimplement the occlusion system on the GPU, using this HP/NV occlusion query stuff. Well, to make it short, I trashed the code pretty fast. Due to the bubbles introduced into the command stream (even by using multiple deferred queries with the NV extension in OGL), it was slower than the software approach. Esp, if the later uses CPU/GPU concurrency. [END] Now, what are bubbles ? Is it really so, that the occlusion queries in DirectX/OpenGL running on the GPU today in 2004 are that slow ? - hence forcing us to make an alternative software occlusion implementation or a PVS approach ... that can be right ? Anyway, anyone knows the game Yann L end up making, I would like to buy it :)
www.pokersniffer.org
Advertisement
Quote:Now, what are bubbles ?


bubbles stall your rendering pipeline. they occur either when your CPU is waiting without doing anything for your GPU to finish rendering (usually waits through a blocking driver call), or when your GPU stands there doing nothing but waiting for stuff to render. or you can have both.
and as you probably guessed, bubbles are quite bad performance-wise...

the first one typically occurs when you use HW occlusion query (and when you have to wait for the hardware to give you the answer to your query), glReadPixels, or things where you have to wait for the hardware to give you data back. and it can provoque the second type too...

it's feasible, but heavily depends on your engine structure.
acutally I couldn't easily integrate it into mine without introducing too many rendering pipeline stalls, so I implemented an SSE optimized software HOM, pretty much like the one Yann described, took about 3 days to code the working version (I already had a software rasterizing background, but had to learn SSE, anyway it's quite fast to code), plus a few days tweaking misc things, and it works very well (and it's very fast).
anyway as time goes on, you should implement and try both (considering the time it takes to code the sw HOM, it's really worth the time).
I don't know the hw occlusion query perfs on latest 6800 and x800, but it probably hasn't changed much, it will globally depend on your pipeline design...

hope that helps :)
Quote:Original post by Mille
Now, what are bubbles ?

Basically the stall of the graphics pipeline by requesting result of occlusion query immediately after issuing it - I think that's what he's referring to as a bubble (empty pipeline stages).

Quote:
Is it really so, that the occlusion queries in DirectX/OpenGL running on the GPU today in 2004 are that slow ?

Well there's an article in GPU Gems that uses it, but they get results of query issued at frame n at frame n+1 thus mostly minimising the bubble. The problem then becomes rendering artefacts as objects around corners pop into view - though with only one frame of deferring and a high frame rate they say it's not really noticeable - depends on your application I guess.

Quote:
- hence forcing us to make an alternative software occlusion implementation or a PVS approach ... that can be right ?

I vaguely recall Yann mentioning that deferring by one or more frames would've helped his situation alot in terms of speed but that he didn't like the artefacts - perhaps he can clear that up.
I'm personally going to try his software approach since I have an old rasteriser sitting around from some years back.

Thanks for the nice input; it cleared thinks up quite a lot (rate->up)

But could you give me some advice on how to implement a HOM approach, that is, do you know any good tutorials, links (books) to help me get started (or must I just hit gooogle) ?

Thanks again ..
www.pokersniffer.org
mh, I don't know any specific HOM tutorials, but you should have enough information in the original thread, or look into google for hierarchical occlusion maps.
you should also get your rendering API's specs to implement the same rasterization specs in your sw renderer.

the basic HOM rendering process could be split into:

engine loading:
- generate all occlusion map hierarchy (base level + mipmaps down to a reasonable size, mipmaps like 1*1 or 2*2 or even 4*4 are pretty useless, you can add them anyway if you feel like it)

runtime:- clear base OM.- render occluders:  - rasterize occluder's triangles/quads/ngons/whatever primitives your sw renderer support.- generate occlusion hierarchy, that only consists of filling the mipmaps with the downsampled base OM..- test ocludees:  - rasterize occludee's triangles/quads/..., using depth test with the first OM on the hierarchy (the smallest one)  - as soon as a fragment passes the depth test, move to the higher level HOM, until you reach the base level,    or until no fragments are visible.  - if there are fragments visible, well, the occlusion test failed,    the primitive isn't occluded, and you'll have to render whatever it bounds...

now you can add some misc stuff to the occlusion test itself.
either you return visible as soon as a fragment is visible, or you continue to rasterize every fragment, and return the visibility ratio (number of visible fragments / total number of rasterized fragments), and use that to determine the visibility. for ex, if the ratio is below 2%, you might want to consider the object as occluded... (doing so will obviously introduce artifacts)

that's the basic idea.
if you have no sw rendering background, it's too long to explain here, I suggest you go read chris hecker's sw rendering articles, or just google for software rendering tutorials ;p
Yes, chris hecker's sw rendering articles as sBibi mentioned are a good place for a good explanation of sw rasterizer. I used those articles when writing mine.
you don t do the occlusion querys every frame right? checking the angle & position difference to the last occ query frame and if they changed enough do another occlusion query process

http://www.8ung.at/basiror/theironcross.html
Quote:Original post by Basiror
you don t do the occlusion querys every frame right? checking the angle & position difference to the last occ query frame and if they changed enough do another occlusion query process


Good point, I actually just changes the occlusion query yesterday, so I didn't run every frame - damn how stupid can one be ... sleeping and coding is not a good mix,

Anyway the changes gave >100 fps in my test scene with about 2000 triangles

Allright I will look into chris hecker's docs!

www.pokersniffer.org
With 2000 tris, there is no need to do occlusion culling. You probably need at least 100 000 or so for it to matter and even that is pretty low. Also, for a wealth of excellent information on software rendering tricks for occlusion culling, check out the dPVS manual here: http://www.hybrid.fi/main/dpvs/download/dpvs_online.pdf
I don't know why people around here are so excited about CPU-based approaches to occlusion culling. Is it because Yann does it?

There is a simple solution to pipeline stalls called Coherent Hierarchical Occlusion Culling. It allows you to significantly reduce the number of queries issued, and allows you to do useful work while you are waiting for outstanding queries to come back. You don't render on the CPU, why would you do occlusion culling there?

http://www.cg.tuwien.ac.at/research/vr/chcull/
"Math is hard" -Barbie

This topic is closed to new replies.

Advertisement