Why is collision cast expensive?

Started by
32 comments, last by EWClay 11 years, 2 months ago


dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).


the trick here is figuring out how to quickly rebuild the BSP in real-time

BSP... BSP... BSP... what have you got dudes?
Realtime BSP. Are you serious?
Trying to make BSP realtime is like trying to adapt an internal combustion engine to run on electricity only. It's just easier to use an electric motor. Bullet uses SAP and AABB trees. The latter is quite faster when dealing with a lot of moving geometry.


with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).


I have typically also added in AABB tests for the coarser checks...
though, it is also notable how, for the most part, I still most often end up using simple AABBs.

I mean, that says all!


not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.
Advertisement

with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).

I have several doubts about this. ID software had to deal with this problem for at least three product iterations. I apologize, but I have a lot of trouble understanding what you mean by "sensibly written construction algorithm".

Reiterating the concept: AABB tree. Proven to work!

I don't understand you all. Why to try making BSP a dynamic algorithm when you have AABB trees?

Algorithmic complexity means nothing in the real world. It's not like two algorithms are equivalent because they have the same algorithm complexity.

not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.

Yes, and by extension, do we need to crack our head open in trying to make a dynamic structure out of something which is well understood to be static in nature?

What's the deal with BSPs? What have they done so good to you all we cannot just move on?

What's the deal with the "center of mass" and BSP?

Previously "Krohm"


with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).

I have several doubts about this. ID software had to deal with this problem for at least three product iterations. I apologize, but I have a lot of trouble understanding what you mean by "sensibly written construction algorithm".
Reiterating the concept: AABB tree. Proven to work!
I don't understand you all. Why to try making BSP a dynamic algorithm when you have AABB trees?
Algorithmic complexity means nothing in the real world. It's not like two algorithms are equivalent because they have the same algorithm complexity.

not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.

Yes, and by extension, do we need to crack our head open in trying to make a dynamic structure out of something which is well understood to be static in nature?

What's the deal with BSPs? What have they done so good to you all we cannot just move on?

What's the deal with the "center of mass" and BSP?


if you don't know how the "center of mass" fits into the BSP building, then do you have any idea how the construction algorithm actually works in this case?...


a traditional BSP, like the ones id has often used, look for the "optimal" division plane, generally by checking against "every possible plane within the given set". this is where the big cost comes in, as a person generally has to check a large number of planes and evaluate their relative goodness, working out to roughly O(n^3 log2 n).

but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math.

but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math.

I'm about to implement some broad phase checks into my engine, and I'm curious about the approach you've listed here. In practice, how much better is it to "sensibly" construct a BSP-tree over "naively" dividing the scene by some notion of a midpoint (as in a typical quad- or oct-tree)?

So far, I've been leaning towards a a purely naive oct-tree construction. Has your experience with the "sensible" BSP-tree determined that it's been worth it over a simple oct-tree?


but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math.




I'm about to implement some broad phase checks into my engine, and I'm curious about the approach you've listed here. In practice, how much better is it to "sensibly" construct a BSP-tree over "naively" dividing the scene by some notion of a midpoint (as in a typical quad- or oct-tree)?

So far, I've been leaning towards a a purely naive oct-tree construction. Has your experience with the "sensible" BSP-tree determined that it's been worth it over a simple oct-tree?


I haven't really done any specific checks, but making an "educated guess", I would guess they would probably be fairly comparable performance wise.


the main difference is that an oct-tree divides the space into 8 regions in each step, so requires less recursion, but has to do a few more checks at each step (since there are 8 possible regions to step into). dividing the space in a binary-fashion allows simpler logic, but a larger number of internal nodes.

some forms of oct-tree also always divide the region into 8 uniformly sized spaces, which may lead to inefficient partitioning (vs the use of an explicit center point, which works better with typical "lumpy" distributions).


with a naive dividing-axis calculation, the BSP tended mostly to divide along "similar" plane vectors (generally somewhere in the (1, 1, 1) direction), making the dividing planes worse than with an oct-tree or KD-tree (basically cutting the world into a bunch of diagonal slices).

later on, I developed some slightly better calculations, causing it to use more axis-aligned partitioning when this was more appropriate.

the main difference vs this and something like a KD-tree was more that it can more nicely align with various distributions.


FWIW though, for the main server-side of the engine though, I don't use a single giant BSP though, but more divide it into "regions" which exist on roughly a uniform grid (512 meter / 16384 units), mostly to simplify incremental loading/unloading (FWIW, each region also partially has its own local coordinate space as well). (partly in my experiments because for distances measured in km or so, the precision of floats actually starts to matter).


or such...

I'm sorry man. I clearly don't understand. I'm having a lot of trouble fitting what you say toghether.

Previously "Krohm"

It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

One thing I've learned about collision and pretty much anything in games is that there is no one bill fits all way of doing things. A lot of things have cost vs precision as well as just differing in ways that apply better to a type of game, there is no real "standard" way of checking collision that is good for every type of object or gameplay. Bounding boxes are a big one but they obviously have precision issues in many cases and can have tunneling problems.

Anyway the broad phase and narrow phase go hand in hand, if you eliminate a lot of objects but then do a thousand ray casts you're still gonna get awful performance, especially since the way broad phase algorithms tend to work is that the collision detection multiplies in complexity as more objects move within collision range of each other, that's rather unavoidable unfortunately.

Generally cheaper is better unless it causes major glitches, thats why a lot of games especially older ones have special areas on maps you can abuse the collision and get through the geometry, n64 was big on that.

Satharis, on 16 Feb 2013 - 06:07, said:
It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

One thing I've learned about collision and pretty much anything in games is that there is no one bill fits all way of doing things. A lot of things have cost vs precision as well as just differing in ways that apply better to a type of game, there is no real "standard" way of checking collision that is good for every type of object or gameplay. Bounding boxes are a big one but they obviously have precision issues in many cases and can have tunneling problems.

yep.

luckily, in an FPS like setting or similar, "most" objects work pretty well as bounding-boxes (and fancier objects are usually mostly cosmetic).

Quote
Anyway the broad phase and narrow phase go hand in hand, if you eliminate a lot of objects but then do a thousand ray casts you're still gonna get awful performance, especially since the way broad phase algorithms tend to work is that the collision detection multiplies in complexity as more objects move within collision range of each other, that's rather unavoidable unfortunately.

bounding boxes are often used in my case for first-pass elimination for more expensive collision checks (following the broad BSP query or similar).

most of the time, the bounding box can eliminate the collision well before having to doing any more expensive checks, and is one of the computationally cheaper checks (along with a sphere, but typically an AABB is a better fit than a sphere, so will eliminate a lot more "false positives").

like, you don't need to check collisions against an object precisely, when it is across the room, only to quickly eliminate it.
even in a large pile (say, a pile of bricks or similar), usually each brick is only in close proximity only to those bricks it is touching, so, something like an AABB can still separate each brick from most other bricks in the pile.

the BSP will not often clearly separate individual bricks, but more just eliminate checking between bricks in one brick-pile, and another similar pile elsewhere in the map.


luckily there is also a trick that if an object's net velocity gets low enough, an option is to "freeze" it and take away its internal velocity (it becomes non-moving until interacted with again by a faster-moving object), which both saves CPU cycles, and helps avoids some ugly effects (things just endlessly wobbling around, and piles of things just randomly "exploding"), but does make a slight issue if things can sometimes settle in an obviously "not physically plausible" manner (can happen if this freezing occurs too aggressively, an object can freeze leaning over an edge, or result in stacks which "should" reasonably fall over, ...).

Quote
Generally cheaper is better unless it causes major glitches, thats why a lot of games especially older ones have special areas on maps you can abuse the collision and get through the geometry, n64 was big on that.

yep, and also why I typically very rarely use meshes for non-static objects. mesh/mesh collision checks are very expensive.
most other checks against meshes are also expensive, but them being static means they can be excluded from "dynamic" checks (*1).


so, a ranking I think (for straight collision checks):
sphere: cheap (single compares), often a poor fit, usually better for "worst case"(but with a few partial exceptions);
AABB: still cheap (6 compares), often a much better fit than a sphere;
OBB: costs more than an AABB (up to 12 axis projections/24 compares), but can rotate freely (very useful for things like crates, bricks, ...);
convex polyhedra: typically cost a little more than an OBB (n*m projections, 2*n*m compares, where usually n,m>=6), but can match arbitrary convex shapes, and can usefully approximate most small non-convex objects (both work on the "separating axis theorem");
mesh object: expensive, but can represent arbitrary shapes.

so, with these later forms, a person can keep a radius and a bounding-box around, for first-pass elimination.

there are also cylinders and capsules, but they are rarely used in my case, and not particularly stable (it is easier to get good contact behavior from face-based solids, like OBBs or polyhedra, via the use of CSG operations). (mesh/mesh object collisions are also a little problematic, as I am not currently aware of a particularly good way to calculate contact forces, so IIRC they just directly push away from each other, more like spheres).


*1: in my engine:
"static" means "object will not move". it is not checked against anything else, but moving (dynamic) objects may check against it (typically, this is for things like world-geometry).
"semi-static" means "object does not move on its own", but rather its movement is controlled externally (by program code), and typically will only check against dynamic objects (most typical entities fall in this camp, which currently includes pretty much everything from players/AIs/doors/platforms/rockets/... which are instead under the control of their "movetype", *2);
"dynamic" means "object moves freely and responds to physical forces".

*2: the "movetype" system is a nasty jury-rigged mess, but ironically, was written much later than the fancier physics (simpler Quake-like physics works a lot better / more-reliably for a lot of things, where originally I tried using fancy-physics for pretty much everything), and depends a lot more on the server-side game logic (not currently available to the physics engine). note that "solidtype" deals with both sets of solids (as far as the server-end is concerned).


sadly, my fancy physics code (its own evil mess of sorts) hasn't been updated to deal with voxel terrain, so ATM it isn't really all that useful (try to drop a brick or crate directly onto voxel terrain, and it will just keep on going). I have considered fixing this, but haven't gotten to it (hasn't seemed like a huge priority). likely, as-is, it would require API-level modifications to the physics engine (API is, errm, sort-of structured in mostly the same sort of way as legacy OpenGL, as it "seemed like a good idea at the time").

admittedly, I am almost tempted more to just rewrite it all into something hopefully less of a nasty mess, like a single unified physics engine that both deals with both types of physics, and hopefully improve the API design some (great evils lurk here...), but am presently torn some between the possible options (and, otherwise, just haven't really gotten around to it).

It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

May you elaborate more in detail and gives example pertaining to CPU time? Interesting stuff


It's not about memory it's about how much CPU time doing a lot of ray casting takes really.



May you elaborate more in detail and gives example pertaining to CPU time? Interesting stuff


if your scene has 10000 objects:
you can check the line against each object (simple linear search), so, 10000 checks;
or, a tree-structure (BSP or oct-tree or similar) may get it down to around 10.

what if 1000 of these does a line-trace in a tick?
first option: 10000000 line/box checks;
second option: 10000.

what if each of these 1000 is an enemy, and performs a radius query, and each finds 100 objects in range, and line-traces to each?
first option: 1000000000 line/box checks;
second option: 1000000.

now what, if for reliable detection, the visibility is performed via multiple traces, say, 9 (eyes trace to the center and each corner point of their visible bounding-box)? ...

these sorts of costs can add up fairly quickly (and this is just with all the enemies standing around doing nothing).

you don't really want to be doing 1000000000 of *anything* in a tick (and even 1000000 still isn't a good number).


games have often used tricks here, like only "activating" enemies AIs when a player would pass through a certain door or collide with a certain trigger-region (or come within a certain range), and only checking for the current player (AI doesn't look for targets in-range, but simply checks "can I trace a line to the player?", possibly simply rotating between players for multiplayer games).

likewise, early elimination and aggressive culling here can often mean the difference between frames-per-second, and seconds-per-frame.

likewise in a 3D renderer: much of the work isn't so much about getting things drawn quickly, but quickly figuring out what needs to be drawn.

This topic is closed to new replies.

Advertisement