Occlusion cull - Reloaded

Started by
6 comments, last by solenoidz 11 years, 6 months ago
Hi,
I have a simple hardware occlusion culler implemented, but it stalls my pipleline forcing me to fetch the results in the next frame, that leads to latency.
I heard people are using other types of occlusion culling methods, like for exapmle - Software occlusion culling on the CPU.
How is that implemented ? A software floating point depth buffer to check things against ? A software rasterizer to render coarse meshes to produce that depth buffer ?
Is there other , pure geometric methods of occluion culling. Some Axis Aligned Bounding Boxes calculation in world, camera, screen space ?
Any hints, links, advices etc. will be appreciated.

Thank you.
Advertisement
There's a Siggraph 09 presentation from DICE about software rasterization-based occlusion culling. It's a very interesting read. Urho3D implements a software rasterization based culler which I think is very close, if not identical to the one in the presentation.

Other CPU methods that don't do rasterization tend to all be a kind of portal or sector-based (PVS sets, BSP trees). They are nice too, but most often restrict to static geometry only.
Thanks.
I've seen that paper and it's pretty much clear how they do it. I'm not really sure I like the idea of keeping a separate low-poly mesh for each occluder to draw triangles of that mesh to a software depth buffer on the CPU.
The general approach with hardware occlusion queries is: if the latest set of results for an object are not ready yet (your API will provide a means for testing this without actually fetching the results and having to stall) you just reuse the last result that you got. Otherwise the results are ready so you can go ahead and fetch them without stalling. This exploits temporal coherence - the assumption that even in a quite dynamic scene, on a raw frame-to-frame basis things don't really change too much.

There are a couple of edge cases where you need to short-circuit this - e.g., if an object is new in the scene then you should assume that it's visible by default until a fetched query result tells you otherwise. Likewise, if an object moves out of the scene (e.g. by frustum culling) then you must also do the same when it next moves back in (being careful to make sure that you let any outstanding query on it run to completion anyway, otherwise Interesting Things might happen).

In general it works quite well and can give good results, but is a little more complex to set up and manage than a scheme that just always fetches the results.

The software-based method involves an obvious tradeoff, and relies on the cost of keeping and updating a software z-buffer (which is generally of much lower resolution than your real hardware one) being less than the cost of just drawing the objects anyway.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Thanks. I'm in a process of finishing a software renderer for a small z-buffer.
I want to ask another question.
I have some problems with aggressive culling now.
Let me explain. I have frustum culling and then occlusion culling. I keep pointers to mesh and other objects in terrain patch nodes.
The terrain is space partitioned via quadtree and when occlusion culling is enabled it culls large portions of the terrain geometry and contained objects in those nodes.
The problem is that my FPS fluctuates very much. If camera is looking at a wall that occludes large portions of the scene the FPS is around 200 fps, but when camera turns fast, lots of geometry comes into view causing awful lag for a fraction of a second, or even for a whole second the game freezes. Then frame rate stabilizes again, but drops real bad if camera makes another move in certain direction.

What can I do about such a problem ?
Sometimes I prefer to render everything every frame to keep the frame rate constant, not matter how low it is.
It's very annoying to have smooth rendering and all of a sudden everything to freeze for e sec, and when the game continues again, you find yourself five feet away from the place you have been, missing a several frames of the simulation...
Are you creating GPU resources (vertex buffers, textures etc.) or doing other heavy operations in response to objects becoming visible? If that's the case, can you rather pre-create everything at scene load time? Or do you have such large amount of textures that they don't fit into your GPU memory at once?
You can actually even use occlusion query results in the same frame without pulling the result by using conditional rendering. I have an OpenGL example for that here: https://github.com/progschj/OpenGL-Examples/blob/master/10queries_conditional_render.cpp

I get significant speed up on modern hardware (like factor 5 or so on my GTX560TI and up to factor 10 or so on a AMD 7730M). The older GTX260M actually lost performance from doing that though.
Thanks guys.
Well, I did some testing and it seems that my for the amount of mesh and texture data I use, my video memory struggles. When I resized all of my textures by half - diffuse, normal maps, specular maps etc. everything runs smoothly.
I guess, when certain limit is exceeded for my video card, it tries to put things in system memory or something which is slowing thing down.

This topic is closed to new replies.

Advertisement