good generalized frustum cull algo

Graphics and GPU Programming Programming

Started by Norman Barrows April 11, 2014 11:56 PM

10 comments, last by Norman Barrows 10 years ago

Norman Barrows

7,180

Author

April 11, 2014 11:56 PM

i've found some cases where my current frustum cull algo is a bit too aggressive.

is there a good / best generalized frustum cull algo that works pretty well for all cases, even things like large finite planes that span the frustum ?

preferably on the simpler side to implement. (of course! <g>)

worst case, i'd be doing maybe 50,000 culls per frame. a typical complex scene is currently about 15,000 cull calls per frame.

the current algo does some trivial rejection of stuff "behind" the camera, a far clip range rejection test, then does bounding sphere vs near, far, left and right planes, but not upper or lower (as there's usually little to cull above or below). it was originally designed for 1st person view, and works quite well there.

but i've found that in 3rd person view, zoomed all the way out (about ten meters), looking straight down, it appears that the terrain chunks "behind" the camera can sometimes get clipped when they should not. most likely when near a chunk edge.

so before i crack open the hood and start ripping at the guts of this puppy to fix it, i thought i'd see what the pros has to say.

i'm thinking bounding spheres vs all 6 planes, but they could be imprecise for large (~300 unit) planes (terrain chunks). also, i'm not sure about special cases like spanning and such. i recall some stuff like that from the last time i played with the frustum cull. the current algo is the 5th in the game so far.

so is there a handy silver bullet algo out there?

bspheres and special cases is my best bet?

reading this over, it just occurred to me that the cliprange and far plane tests are redundant. but a horizontal range clip upstream of the frustum cull is of little use looking up or down.

so i guess add arbitrary camera angles to the list of desirable features, along with all shapes, and all cases. i hope for too much, i suspect....

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

kalle_h

2,470

April 13, 2014 04:49 PM

http://pastebin.com/3uJmYXPT

Object space bouding box test. Accurate and fast.

Yourself

1,962

April 14, 2014 07:36 AM

http://pastebin.com/3uJmYXPT

Object space bouding box test. Accurate and fast.

might be accurate but can be a lot faster.

see http://dice.se/wp-content/uploads/CullingTheBattlefield.pdf

21st Century Moose

13,459

April 14, 2014 10:34 PM

I don't think you're going to get a good totally generalized solution here. Frustum culling is a matter of trading the CPU time to do the cull versus the CPU and GPU time to just draw the object, which obviously implies that a class of really simple object exists for which it's not worth bothering to do a cull test. How many of the objects you're drawing will fall into that class is one of those "it depends" answers, unfortunately: it depends on the objects, it depends on how complex your shaders are, it depends on your hardware.

15,000 cull tests, with worst case 50,000, per frame seems like a lot. That suggests to me that either (1) you haven't sorted your scene into some kind of tree structure, or (2) you're doing something crazy like frustum culling individual particles in a particle system. If (1) start doing it; if (2) stop doing it. :)

The key to performance with (1) is knowing that much of the time you don't need to actually run a cull test at all. The fastest code is the code you don't run. The benefit of a hierarchical tree is that if a parent node is outside of the frustum, then all of it's child nodes are also absolutely 100% guaranteed to also be outside of the frustum. Don't bother testing them: just reject them. Likewise if a parent node is fully inside of the frustum: don't bother testing: just accept. Only nodes which intersect the frustum need testing.

Using this approach, the actual culling algorithm you use for individual nodes is not so hugely important: you're going to be trivially rejecting or trivially accepting so much stuff that the number of actual cull tests you need to do is significantly lower: it becomes a micro-optimization.

Don't worry too much if you accept objects that are borderline, particularly if doing so allows you to get better batching going on. The GPU is going to be doing it's own frustum culling too (after the vertex shader) and adding 3, 4 or 5 objects to a batch of 50 or so isn't such a big deal. If the alternative is tighter culling but having to break that batch into multiple draw calls it's likely to end up more expensive. Sometimes sloppy can be the most efficient route.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

kalle_h

2,470

April 14, 2014 11:41 PM

http://pastebin.com/3uJmYXPT

Object space bouding box test. Accurate and fast.

might be accurate but can be a lot faster.

see http://dice.se/wp-content/uploads/CullingTheBattlefield.pdf

I have branchless simd version of that algorithm too and its not much slower than BF3 version but accuracy is way better. Its general algorithm that just work and speed is good enough for thousands of objects.

Norman Barrows

7,180

Author

April 16, 2014 02:04 PM

15,000 cull tests, with worst case 50,000, per frame seems like a lot. That suggests to me that either (1) you haven't sorted your scene into some kind of tree structure, or (2) you're doing something crazy like frustum culling individual particles in a particle system. If (1) start doing it; if (2) stop doing it.

actually, you might say i'm doing both! : P <g>

terrain is divided up into chunks 300x300 in size. chunks are generated on the fly as needed from underlying world map data structures. a chunks is essentially an indexed render queue of all the meshes and textures in the chunk. when i render the scene, not only do i cull entire chunks, but each mesh in a chunk as well (IE each plant, rock, tree, and bush). i also cull stuff like clouds, sun/moon, objects and entities, etc. 15,000 culls is a complex jungle scene. 50,000 is a complex jungle scene with 35,000 individually culled rain particles.

as you say, the chunk cull alone should be sufficient. and it sounds as though i should not frustum cull rain particles. i suppose by the same token, i should do a single cull for multi-mesh models, instead of culling each mesh in the model.

i switched to a straight up b-sphere 6 plane frustum cull test. so far, no problems - even with arbitrary object sizes and camera orientations. the trick seems to be to get the b-sphere just big enough, and use the test: "if entirely outside all six planes, cull". tests like "if inside all size planes, don't cull" don't seem to cover all cases, such as spanning spheres.

guess i need to turn off all those unnecessary calls to superclip5() ! <g>.

thanks for the tip, i probably would have never noticed it. it was only the false postiives and not the speed of superclip4 that i noticed.

cool! more clock cycles to play with! <g>.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

21st Century Moose

13,459

April 16, 2014 02:32 PM

Those rain particles are almost certainly a "hang 'em all and let god sort 'em out" case. If you're not culling them at all you can put them into a nice big static vertex buffer, billboard them and animate them in your VS based on time passed, and draw them all in a single batch too.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Norman Barrows

7,180

Author

April 16, 2014 02:57 PM

Those rain particles are almost certainly a "hang 'em all and let god sort 'em out" case.

that's the kind of algo i like! but substitute the zbuffer for god <g>.

If you're not culling them at all you can put them into a nice big static vertex buffer, billboard them and animate them in your VS based on time passed, and draw them all in a single batch too.

well, so far, i haven't had to write a single line of shader code. been able to get by with fixed function so far. there's no techno or magic in the game, so mass quantities of whizz-bang special effects graphics are not required. so the graphics requirements are not that heavy. aniso mipmapping, alpha test, a little alpha blending for clouds, sun, and flames, one two stage texture blend for snow, that's it. i use render queues that use a 3 dimensional index, which essentially does an in-order bucket sort insertion of a mesh into the render queue. a state manager handles drawing the render queues. i generate 4 interleaved static ground meshes per terrain chunk when i generate a chunk. this avoids slow dynamic buffers. so by optimizing my data into a form which dx9 fixed function likes best, i've been able to get acceptable performance with no shaders.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

21st Century Moose

13,459

April 16, 2014 07:06 PM

You don't need to have "whizz-bang special effects" to use shaders. You can use them for basic stuff too, sometimes very basic stuff, such as interpolated keyframe animation from static vertex buffers.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Norman Barrows

7,180

Author

April 16, 2014 07:25 PM

You don't need to have "whizz-bang special effects" to use shaders. You can use them for basic stuff too, sometimes very basic stuff, such as interpolated keyframe animation from static vertex buffers.

oh yes, i know. what can i say? i'm lazy. and perhaps too busy cutting to sharpen the saw? <g>.

when i started on caveman 3.0 in 2012, i hadn't written a line of code in 6 years, and my last directx experience was with dx8 as i recall. i was going to have to learn a new 3d modeling package, and a new paint program, and new audio tools, and some newer directx api. i figured i'd avoid adding shaders to all that if possible. so far so good.

i do realize i could probably draw a bit more with shaders.

part of me is reluctant to get into it for a very different reason:

back when i started, it was all custom software renderers, party on the bitmap. allocate a big array of unsigned chars, do your thing, then wait for vsync and memcpy to vidbuf (as i recall). so "partying on the bitmap" is a familiar occupation to me. as a perfectionist when on the clock, i also find it somewhat addictive. you can do anything you can come up with an algo for that you have hardware fast enough for. i'm afraid if i get into shaders, i'll like it too much, and playing with shaders will distract from building games and producing product for the company's lineup.

sooner or later i'll have to make that big move. i have yet to check into it, but i'm hoping the basic shader code i need is trivial to write or readily available. the next two titles planned after caveman have graphics requirements no greater than caveman - just a lot more alpha blending in SIMSpace (lasers and shields, you know <g>).

i _have_ made a directx 11 version of the basic startup and shutdown code from my game library, but clearscreen is about the limits of my directx11 experience so far. <g>.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php