Archived

This topic is now archived and is closed to further replies.

duhroach

Occlusion Culling + Octree

Recommended Posts

duhroach    225
Hey all, was wondering if anyone had any references to some good Occlusion tutorials. I''m useing Octree for my rendering, along with frustum culling, and backface culling, But i''m still getting like 13fps when rendering a 20k poly scene. Anyhow, any links to some good Occlusion tutorials would be helpful. thanks all! ~Main == Colt "MainRoach" McAnlis Programmer www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
v71    100
i''m working on that but since its a commercial project i can''t talk about my algorithm , anyway when the application will ship i plan to release the source code of it.

Share this post


Link to post
Share on other sites
Yann L    1802
There are lots of different approaches to occlusion culling, but generally you can divide them into two distinct categories: geometric methods and image space methods. The former ones work directly on the geometry of your scene, they are very precise but prone to accuracy problems and can''t take advantage of 3D hardware. The later ones can be hardware accelerated, can easily handle occluder fusion, are stable but less precise.

For a general purpose game engine, I''d recommend an image space method. I''m not aware of any tutorial, but there are quite a few research papers out there (for example this one).

The basic steps are very simple:

1) render potential occluders (the geometry that potentially hides the most objects) into a zbuffer. That''s the occlusion map.

2) use depth tests to determine if an objects bounding box is fully hidden behind the occluders. That''s normally done by rasterizing the bounding box into the occlusion map and comparing depths.

3) repeat (2) for all objects.

Step 2 can either be performed on the CPU alone, or can be HW accelerated on newer 3D cards.

/ Yann

Share this post


Link to post
Share on other sites
duhroach    225
Doesn''t image space comparisons cause you to actually render the scene multiple times per chosen occluer? Or is it just one pass for close Z occluers in the frustum, and then the rest is rendered or not based on it''s inclusion in the zbuffer? Also, in that doesn''t it require multiple passes?

I''ve read over the HOM & IOM methods, and wasn''t quite sure if image based rendering was reliable for sceen occlusion, mainly because, isn''t their a great amount of cards that don''t have any hardware support for the occlusion process? Which would cause me to just do plane image rendering? Which would just kinda defeat the purpose?

Ultimatly i was thinking of useing the Heirchial ZBuffering algo (RealTimeRendering v2 pg 383) because it''s made specificly for octrees. Combined with a occlusion horizion method, would produce some rather good results. And i''m useing ShadowMaps, so i''m having to render the scene from the light''s view anyhow, was trying to cut down on renderings.

So is the image based occlusion is what everyone''s useing now adays?

~Main



==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
Mark Duffill    156
Hmm, i''ve been thinking about ocllusion methods for a while. Multiple rendering of the scene is bad, having to read back from the display/z buffer is evil on most hardware.

So How about this:-

cull the scene using normal bsp/octree etc methods, each object of any significance ( 500+ polys/ large peices of buildings/terrain) have a *very* simple polys defining the "ocludded" area it contains. Render all these ocludders first to the z buffer, but not the display buffer.

Then when you draw you normal geometry ( front to back order ) there will be big areas (hopefully) that are rejected in the z compare. As a lot of modern gfx cards have a quick get out if the z test fails.

ie. inside a house with a single window, the occluder would be a box the shape of the house, with the square for the window cut out.

So what do you think? ( I''ll have to get around to trying it soon!

Share this post


Link to post
Share on other sites
Yann L    1802
With image space occlusion culling, you don't need to render the whole scene multiple times. If you look at a typical 3D scene, you'll notice that only about 10% (or even less) of the faces contribute significantly to occlusion.

So the idea is to select a fixed maximum number of occlusion faces per frame, out of the set of those 10%. Say you have a scene with 100k faces, with around 10k showing a large contribution to occlusion. Each frame, you'd select about 100 to 500 faces as occluders. The performance impact will be minimal.

Depending on the type of your scene, you can even create special 'occlusion geometry'. That's what Mark Duffill had in mind in the previous post. This is esp. interesting in scenes with lots of large buildings, eg. a city model. You create a very simple and coarse 3D skin of your models, modelling occlusion. This skin is then rendered to the occlusion map instead of the (complexer) main geometry. Using this method, you can get away with 50-100 faces per frame, and still get decent occlusion behaviour.

Another very important point of image space methods is the excellent handling of occluder fusion. Lots of small occluders that overlap will automatically form a big unified occluder with those methods. That's a lot harder to achieve with geometric methods.

It's true that most 3D cards do not support HW occlusion queries yet. But this feature will be supported by every new 3D card, so it's worth including in any engine at this point. A fallback function for older 3D cards is important, of course. The zbuffer readback is typically not that slow, considering that your occlusion map can be pretty low resolution (128*128 is more than enough for most scenes). If you want to completely avoid the readback, then get out your old software 3D rasterizer. It can be pretty fast to do a small 128*128 depth only rendering on the CPU alone. No lighting, no colours, no texture, and only around 100 faces - a well optimized SW rasterizer can do that in less time than a zbuffer readback would require.

/ Yann

[edited by - Yann L on October 1, 2002 12:58:29 PM]

Share this post


Link to post
Share on other sites
duhroach    225
Hum..
So, we:
a) Determine nodes of octree in view frustum (list: VIEWABLE)
b) Z-Order the OBJECTS in VIEWABLE from front to back(list: Z-VIEW)
c) In Z-VIEW, determine qualifying OCCULERS (items closest to the camera in Z-VIEW? What's the best way to do this?)
d) Disable alpha, texture, lighting
e) Render our qualifying OCCULERS to the ZBUFFER
f) Test all other items in Z-VIEW against the ZBUFFER, if it returns true (viewable) Set flag in object to RENDER
g) cycle through Z-VIEW, all OBJECTS with flag RENDER, we draw to the rendering screen.

Is that about right?

And if it is, can we just not render to the zbuffer? instead to the screen and do a depth test there?

~Main



==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

[edited by - duhroach on October 1, 2002 6:55:02 PM]

Share this post


Link to post
Share on other sites
Yann L    1802
Basically correct.

"b) Z-Order the OBJECTS in VIEWABLE from front to back(list: Z-VIEW)"

You can do that, but it''s not mandatory. It might give you the benefits of early z rejection, but make sure to check if the overhead of sorting is worth it or not.

"c) In Z-VIEW, determine qualifying OCCULERS (items closest to the camera in Z-VIEW? What''s the best way to do this?)"

Different heuristics are possible. The best occluders are the objects with the biggest projected area WRT the camera. Ie, take the perspective projected size (bounding boxes) and relative orientation to the camera into account. If you use the dedicated occlusion geometry approach, then selection is even easier: just take the nearest ones, until your maximum facecount has been reached.

quote:

And if it is, can we just not render to the zbuffer? instead to the screen and do a depth test there?


Hmm, don''t understand what you mean here. You can only do a depth test using a depth buffer. What do you mean by making a depth test on the screen ?

/ Yann

Share this post


Link to post
Share on other sites
duhroach    225
Yann, Good call on getting rid of step B in figuring out occulers.

As for doing a "depth" test on the screen, i mean do we have to render to the depth buffer to determine what''s viewable? IE can we just render to the screen, and use that?
Or do we have to render to depth buffer, then render to screen?

thanks for the help yann

~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
Mark Duffill    156
Hi,

What Yann is saying about rendering to a small z buffer surface and using it to reject objects that are hidden is what I tried intitally. However I found that a lot of cards wont give you access to the z buffer surface unless you only run in 16 bit modes. (DX8) And if they do you can get a bad frame rate hit.

AFAIK only some HP gfx cards support ocllusion querying so far, and so modified my process to render the oclluding polys to a small render target texture. Reslulting in a black/white image which could then be read back and tested.

However even that is a big performance hit, since on most modern gfx doing any sort of reading back data generates stalls in the gfx card due to syncronisation issues.

So I could render in software the ~100 oclluder polys, and test that, but gfx cards (Geforce+) are so fast you probably could render several thousand polys in the same time, making it a waste of time.

So the thats why I though of the idea of rendering the oclluders to the z buffer first then the scene front to back and thus take advantage of the early z buffer reject.

btw drawning front to back instead of back to front resulted in a 30% speed up on a project which has just finnished. This was for GeForce 3 equivilant cards. So it may be worth profiling in your case what effect it has.

Anyhow roll on cards that support ocllusion testings!

Share this post


Link to post
Share on other sites
Yann L    1802
quote:

However I found that a lot of cards wont give you access to the z buffer surface unless you only run in 16 bit modes. (DX8)


That's weird. I don't know too much about D3D, but in OpenGL you can get full access to the 24bit depth buffer in all screen modes. I would guess that D3D allows the same functionality ?

quote:

AFAIK only some HP gfx cards support ocllusion querying so far,


All GeForce3+ cards support hardware occlusion queries. So does the new Radeon.

quote:

So the thats why I though of the idea of rendering the oclluders to the z buffer first then the scene front to back and thus take advantage of the early z buffer reject.


That's not true occlusion culling. In fact, you aren't culling anything, you are just rejecting pixels. A speedup of 30% is nice, but with true occlusion culling, you can easily get speedups of 1000% or even more (depending on your scene). So the performance hit of reading back the zbuffer (or SW rendering it) might very well be worth it. And keep in mind, that we are talking about a fallback option here: on newer 3D cards, the whole process is HW assisted.

I just run a little test on our engine. I forced fallback mode (SW rendering on the CPU (handoptimized ASM), 500 occlusion faces max., 128² occlusion map, non-hierarchic), and timed a test scene (5 million faces). Without occlusion culling, I have around 1.25M faces in the view. With the culling, it gets down to 155k. That's 8 times less.

There is perhaps one additinal thing to mention: the results of occlusion culling *highly* depend on the nature of your scene geometry. If you are already using a portal engine, in a closed environment (Quake style), then occlusion culling won't be very effective at all, and the overhead might outweight the benefits. But on complex, open and/or outdoor scenes, scenes with lots of large (but complex) features (mountains, buildings, ...), occlusion culling can be incredibly efficient.

/ Yann

[edited by - Yann L on October 2, 2002 8:13:26 AM]

Share this post


Link to post
Share on other sites
Mark Duffill    156
Hi,
I cant copy paste quotes due to using a *gasp* webTV to get to the internet at the moment. (should have brought my computer with me

So :-

"Getting the Zbuffer in 32 bits." Yup I also found OpenGL could do this, but not DX8! (What a pain )

"GeForce 3+ have ocluusion culling" Didn''t know that, but at the time of my reaserch I only had a Geforce2. But that seems good.

However the other points are vaild, but developing view culling architecture is not that easy for my situation. This is since it has to run on consoles, ( the pc is just a test bed for design ideas ). And as such some things which are dead quick on a pc are very slow on a ps2 for instance. ( not having a l2 cache for instance ).

As for the speed increase it would not be that high for our current system as it is a heavly portaled system. But I''m researching ways to remove portal systems even though they are good, due to the fact it can be very time consuming to generate "correct" geometry and portals. Esp if its artists doing most of the work!

Anyhow an interesting thread.

(If I could only get to a proper computer, damn been Ill

Share this post


Link to post
Share on other sites
duhroach    225
Heya guys, thanks for the input on the topic.
I''ve got the theory down, now to just figure out it''s code implimentation

thanks again!
~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
Yann L    1802
quote:

"Getting the Zbuffer in 32 bits." Yup I also found OpenGL could do this, but not DX8! (What a pain )


I didn''t know that. Quite surprising that DX8 lacks such an important feature (eg. for image compositing).

quote:

However the other points are vaild, but developing view culling architecture is not that easy for my situation. This is since it has to run on consoles, ( the pc is just a test bed for design ideas ). And as such some things which are dead quick on a pc are very slow on a ps2 for instance. ( not having a l2 cache for instance ).


OK, in that case you''ll be obviously more limited in your choice. I guess that SW rendering the occlusion map is no option on a PS2. But I don''t know enough about the architecture and performance considerations of your target consoles to really be of any help here.

quote:

As for the speed increase it would not be that high for our current system as it is a heavly portaled system. But I''m researching ways to remove portal systems even though they are good, due to the fact it can be very time consuming to generate "correct" geometry and portals. Esp if its artists doing most of the work!


Yep, I''ve also come to a point, where portals are no option anymore. The concept is nice, but besides the drawbacks you mentioned, they are just to restrictive in the type of geometry they accept. Good news is, that they can definitely be replaced by a good occlusion culling system, which will yield similar culling results, but on a wider range of geometry (no cells/portal required) and without preprocessing.

/ Yann

Share this post


Link to post
Share on other sites
duhroach    225
PVS is a pre-compution algorithm. IE you have to construct your PVS before you can render your map. So it could only work in portal/bsp/cell type engines.

Outdoor map formats (octree/heightmap) can't really do a pvs calculation, as there's no "cap" to where we should stop saying "i can see you." This is because there's no "rooms" in outdoor formats. You've got 2miles worth of polygons, PVS would tell you that you can see every polgyon in that 2 mile radius, so we render all of them.

On another note. Going over the HOM method of things, couldn't we use the OpenGL Generation of MipMaps to creat our Z pyramid of heirchy images? And if we can, can we access them the same as if we made them ourselves?

thanks

~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

[edited by - duhroach on October 4, 2002 2:36:32 AM]

Share this post


Link to post
Share on other sites
Mark Duffill    156
Hi,
I Did look into PVS, and it can be a nice system. I considered dividing up the world into or squares, then find the visible set for each section to its negibours. Basically "voxelise" the source section, do ray casts out from each "PSV-voxel" to each of its 8 neighbours, and marking down which ones it could see into. This could be stored in a byte, with each bit for a given neighbour.

The resolution of your PSV grid for each section depends on the size/complexity of your scene. Then to render you see what secton and PSV-voxel(s) you are in. Then "Flood fill" outwards, from these, clipping with the view frustrum as well.

The only problem was memory, on a pc with 128MB+ not really an issue, but on a console that only as ~32MB it was to much overhead, or the resolution so low, not worth it.

Anyhow the next big bottle neck on games is more the collision testing an AI, drawing the scene is less of an issue in comparison. (i.e. %60-%80 of cpu time is or the game code not the draw code!

roll on hardware intersection routines

Share this post


Link to post
Share on other sites
Fidelio_    122
quote:
Original post by Anonymous Poster
why don''t u guys think about PVS


It''s traditionally used as a precomputed visibility map for indoor levels. However, if you use heightmapped terrain, it might be interesting to divide the terrain into 16x16 patches, compute visibility and use this for occlusion culling. It speeds it up and uses no cpu time ingame.

Share this post


Link to post
Share on other sites