Jump to content

  • Log In with Google      Sign In   
  • Create Account


Occlusion cull - Reloaded


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
7 replies to this topic

#1 solenoidz   Members   -  Reputation: 519

Like
0Likes
Like

Posted 16 June 2012 - 03:39 AM

Hi,
I have a simple hardware occlusion culler implemented, but it stalls my pipleline forcing me to fetch the results in the next frame, that leads to latency.
I heard people are using other types of occlusion culling methods, like for exapmle - Software occlusion culling on the CPU.
How is that implemented ? A software floating point depth buffer to check things against ? A software rasterizer to render coarse meshes to produce that depth buffer ?
Is there other , pure geometric methods of occluion culling. Some Axis Aligned Bounding Boxes calculation in world, camera, screen space ?
Any hints, links, advices etc. will be appreciated.

Thank you.

Sponsor:

#2 clb   Members   -  Reputation: 1780

Like
0Likes
Like

Posted 16 June 2012 - 04:13 AM

There's a Siggraph 09 presentation from DICE about software rasterization-based occlusion culling. It's a very interesting read. Urho3D implements a software rasterization based culler which I think is very close, if not identical to the one in the presentation.

Other CPU methods that don't do rasterization tend to all be a kind of portal or sector-based (PVS sets, BSP trees). They are nice too, but most often restrict to static geometry only.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#3 solenoidz   Members   -  Reputation: 519

Like
0Likes
Like

Posted 16 June 2012 - 07:41 AM

Thanks.
I've seen that paper and it's pretty much clear how they do it. I'm not really sure I like the idea of keeping a separate low-poly mesh for each occluder to draw triangles of that mesh to a software depth buffer on the CPU.

#4 mhagain   Crossbones+   -  Reputation: 7610

Like
1Likes
Like

Posted 16 June 2012 - 01:34 PM

The general approach with hardware occlusion queries is: if the latest set of results for an object are not ready yet (your API will provide a means for testing this without actually fetching the results and having to stall) you just reuse the last result that you got. Otherwise the results are ready so you can go ahead and fetch them without stalling. This exploits temporal coherence - the assumption that even in a quite dynamic scene, on a raw frame-to-frame basis things don't really change too much.

There are a couple of edge cases where you need to short-circuit this - e.g., if an object is new in the scene then you should assume that it's visible by default until a fetched query result tells you otherwise. Likewise, if an object moves out of the scene (e.g. by frustum culling) then you must also do the same when it next moves back in (being careful to make sure that you let any outstanding query on it run to completion anyway, otherwise Interesting Things might happen).

In general it works quite well and can give good results, but is a little more complex to set up and manage than a scheme that just always fetches the results.

The software-based method involves an obvious tradeoff, and relies on the cost of keeping and updating a software z-buffer (which is generally of much lower resolution than your real hardware one) being less than the cost of just drawing the objects anyway.

Edited by mhagain, 16 June 2012 - 01:40 PM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#5 solenoidz   Members   -  Reputation: 519

Like
0Likes
Like

Posted 09 October 2012 - 08:58 AM

Thanks. I'm in a process of finishing a software renderer for a small z-buffer.
I want to ask another question.
I have some problems with aggressive culling now.
Let me explain. I have frustum culling and then occlusion culling. I keep pointers to mesh and other objects in terrain patch nodes.
The terrain is space partitioned via quadtree and when occlusion culling is enabled it culls large portions of the terrain geometry and contained objects in those nodes.
The problem is that my FPS fluctuates very much. If camera is looking at a wall that occludes large portions of the scene the FPS is around 200 fps, but when camera turns fast, lots of geometry comes into view causing awful lag for a fraction of a second, or even for a whole second the game freezes. Then frame rate stabilizes again, but drops real bad if camera makes another move in certain direction.

What can I do about such a problem ?
Sometimes I prefer to render everything every frame to keep the frame rate constant, not matter how low it is.
It's very annoying to have smooth rendering and all of a sudden everything to freeze for e sec, and when the game continues again, you find yourself five feet away from the place you have been, missing a several frames of the simulation...

#6 AgentC   Members   -  Reputation: 1262

Like
0Likes
Like

Posted 10 October 2012 - 03:39 AM

Are you creating GPU resources (vertex buffers, textures etc.) or doing other heavy operations in response to objects becoming visible? If that's the case, can you rather pre-create everything at scene load time? Or do you have such large amount of textures that they don't fit into your GPU memory at once?

Every time you add a boolean member variable, God kills a kitten. Every time you create a Manager class, God kills a kitten. Every time you create a Singleton...

Urho3D (engine)  Hessian (C64 game project)


#7 japro   Members   -  Reputation: 887

Like
0Likes
Like

Posted 10 October 2012 - 04:33 AM

You can actually even use occlusion query results in the same frame without pulling the result by using conditional rendering. I have an OpenGL example for that here: https://github.com/progschj/OpenGL-Examples/blob/master/10queries_conditional_render.cpp

I get significant speed up on modern hardware (like factor 5 or so on my GTX560TI and up to factor 10 or so on a AMD 7730M). The older GTX260M actually lost performance from doing that though.

#8 solenoidz   Members   -  Reputation: 519

Like
0Likes
Like

Posted 12 October 2012 - 09:07 AM

Thanks guys.
Well, I did some testing and it seems that my for the amount of mesh and texture data I use, my video memory struggles. When I resized all of my textures by half - diffuse, normal maps, specular maps etc. everything runs smoothly.
I guess, when certain limit is exceeded for my video card, it tries to put things in system memory or something which is slowing thing down.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS