• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Nicholas Kong

Why is collision cast expensive?

33 posts in this topic

Why is collision cast expensive and why not do collision cast
on every frame? Why does this bring the ps3 to its knees? Does
this apply to pc games too? When they say expensive, are they
saying the cpu uses a lot of memory? 
 
Collision Cast is bought up at 9:38 of the video
 
 
20 Guass? This is mentioned at 9:48 
 
Collision Cast: Tracing a line between points to see
if anything collides with the line
 
Can I assume other collision techniques like bounding box collisions are
just as expensive?
Edited by warnexus
0

Share this post


Link to post
Share on other sites

It's not the cost of individual collision checks you have to worry about - if you are performing just one check, ray-vs-sphere is only a little more expensive than box-vs-box.

 

What you need to do worry about is checking against every other object. If you have N objects in your scene, it requires N-1 ray-vs-sphere checks for each ray you cast, and at you will need to cast multiple rays for each player and AI character in the level...

1

Share this post


Link to post
Share on other sites

If you want a normal vector like the one he describes, then you need to know the polygon that the player is "standing" on.  This requires a much finer calculation than a bounding box, as you need the exact polygon to get that normal vector.  How expensive it is depends on the complexity of the world, but when compared to the super-cheap alternative of  subtraction of two vectors, it's extremely expensive.

 

It's not that you shouldn't do collision casts per frame, as you do these all the time for things like shooting projectiles and such.  It's just that you only have so much processing power to spare if you want your game to run smoothly, so you'd rather save calculations for other things.  One of the guys mentions that the reason why there are only so many monsters in the game is because more would be too much for the hardware.

 

Also, keep in mind that this hardware is the PS1 (I believe) so we're talking about much worse hardware than today.

Edited by Polarist
2

Share this post


Link to post
Share on other sites

Just a little note - the game above was originally released on PS2 (on which the developers probably play the game), but eventually got it's release on PS3 in HD as well.

0

Share this post


Link to post
Share on other sites
ok, I personally call this operation a 'traceline', but either way.


the main cost has to do with finding everything that might be located along the line between the two points (likewise for the box case), which includes both the world geometry and other objects.

dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).


the trick here is figuring out how to quickly rebuild the BSP in real-time, which in my case mostly consists of a naive algorithm for finding the "center of mass" (average of the origins) and relative mass distribution relative to the origin. a naive solution is simply to take the sum of the square of the distances from the center, but a slightly more advanced algorithm instead produces a 3x3 matrix, then one can look for the vector with the greatest length. at each step, the set is split in half, and repeated for the halves, generally with any objects that cross the plane being included inside the node in question.


otherwise, the cost of doing these sorts of line-check operations can easily go up in an O(n^2) relation to how many objects exist, whereas with a BSP tree it is more around O(n log2 n).

and, part of this is because my AIs are a bit lazy and do generally do tracelines and similar checks (bounding-box move checks, ...) pretty much every tick. another solution here was that any idle AIs past a certain distance will "freeze" until a/the player moves within a certain distance of them, at which point their AIs will reactivate (they also go onto a slower 4Hz tick, *1).

for example, a moving AI may take into account the direction they are walking, applied gravity, ... and this will be fed into a function that tries to figure out where they will be able to move to (generally with some amount of probing), causing any number of collision queries to be performed.

also, AI enemy targeting is pretty much querying anything within their visibility area (a big sphere query), determining whether or not it is a potential enemy, and whether or not it is inside their view-frustum and they have a clear line-of-sight (more tracelines, it is considered a pass if any of them are not blocked), ...

so, these sorts of (small) costs can easily add up.


*1: different sets of 'ticks' happen at different rates.
4Hz: idle tick, for things which aren't really speed-critical (like activation/deactivation based on distance, ...);
10Hz: AI tick, generally used for enemy AI checks (checking for targets, movements, ...);
16Hz: movement physics tick, used for player movements and projectiles, and usual (non-fancy) physics;
100Hz (virtual): used by fancy physics (rigid body dynamics), only exists when entities using fancy-physics are active (*2).

*2: this is a little off, but related. doing collision checks/... at 100Hz is expensive, even with a BSP, so it is not done unless specially needed (usually because objects which use rotational movement are not exactly stable at much lower tick-rates). nearly everything else is either resting or sliding AABBs (normal bounding-boxes) though, which don't really require any fancy checks (for the most part).

certain bounding-box objects may still require subdivision in some cases though (mostly sub-dividing the movement of rockets or other fast-moving projectiles), to make sure they actually collide with things in their path. even at 16Hz a rocket can still easily exist on both sides of an enemy or a wall without otherwise registering a collision, so a subdivided tick is used to minimize this (at the cost of making rockets a bit more expensive).

16Hz was generally chosen for movement and physics mostly as it is a little less "jittery" for movement than 10Hz, and partly because my client-side LERP'ing was resulting in nasty "rubbery" interpolation effects at 10Hz, which were greatly reduced at 16Hz.


or such...
2

Share this post


Link to post
Share on other sites

If you want a normal vector like the one he describes, then you need to know the polygon that the player is "standing" on.  This requires a much finer calculation than a bounding box, as you need the exact polygon to get that normal vector.  How expensive it is depends on the complexity of the world, but when compared to the super-cheap alternative of  subtraction of two vectors, it's extremely expensive.

 

It's not that you shouldn't do collision casts per frame, as you do these all the time for things like shooting projectiles and such.  It's just that you only have so much processing power to spare if you want your game to run smoothly, so you'd rather save calculations for other things.  One of the guys mentions that the reason why there are only so many monsters in the game is because more would be too much for the hardware.

 

Also, keep in mind that this hardware is the PS1 (I believe) so we're talking about much worse hardware than today.

Great explanation. You got a point there.

0

Share this post


Link to post
Share on other sites

dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).

 

 

BSP as In Binary space partitioning? Im reading up on it right now. So that tree structure is suppose to focus on the area that requires collision testing and obscure the areas that do not need collision testing?

0

Share this post


Link to post
Share on other sites


dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).

BSP as In Binary space partitioning? Im reading up on it right now. So that tree structure is suppose to focus on the area that requires collision testing and obscure the areas that do not need collision testing?


Yes, it is one of several spatial partitioning techniques. Quadtrees and Loose Quadtrees are also useful for items placed around a terrain.

Most major collision or physics systems will use two phases.

First is called a broad phase. This uses a BSP or Quadtree or some other spatial system. That way instead of testing against every item in the world, you are only testing against objects that are very likely to collide. The only compute cost involved is navigating a binary spatial tree, so it is possible to eliminate hundreds or thousands of items with just ten or so comparisons.

The second pass is called a narrow phase. This is where it tests the details of the model. It may check against the actual 3D mesh, or against a physics object like an invisible sphere.


As an example... Imagine casing a ray across a level with 800 objects on it. First comes the broad phase. It searches the binary tree very quickly to discover only six items are close to the ray. Next comes the narrow phase. Those six items discovered in the broad phase are tested with a more precise mesh collision test. This phase may discover that the bullet went under the arm of one actor, over the shoulder of another actor, and was a head-shot on a third actor.
2

Share this post


Link to post
Share on other sites



dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).

BSP as In Binary space partitioning? Im reading up on it right now. So that tree structure is suppose to focus on the area that requires collision testing and obscure the areas that do not need collision testing?


Yes, it is one of several spatial partitioning techniques. Quadtrees and Loose Quadtrees are also useful for items placed around a terrain.

Most major collision or physics systems will use two phases.

First is called a broad phase. This uses a BSP or Quadtree or some other spatial system. That way instead of testing against every item in the world, you are only testing against objects that are very likely to collide. The only compute cost involved is navigating a binary spatial tree, so it is possible to eliminate hundreds or thousands of items with just ten or so comparisons.

The second pass is called a narrow phase. This is where it tests the details of the model. It may check against the actual 3D mesh, or against a physics object like an invisible sphere.


As an example... Imagine casing a ray across a level with 800 objects on it. First comes the broad phase. It searches the binary tree very quickly to discover only six items are close to the ray. Next comes the narrow phase. Those six items discovered in the broad phase are tested with a more precise mesh collision test. This phase may discover that the bullet went under the arm of one actor, over the shoulder of another actor, and was a head-shot on a third actor.


yep. the main notable thing is that for real-time tree-building, a quick-and-dirty BSP building strategy is preferable over a slower but more balanced BSP algo (these slower algos are often used for off-line BSP building).

I have typically also added in AABB tests for the coarser checks, since, even if we know a collision is likely, it is generally cheaper to eliminate possible collisions using AABB checks first, rather than using more expensive OBB/polyhedron/mesh checks up-front.


typically, for physics ticks, an entire movement will be boxed. for example, the starting and ending positions will be boxed (usually conservatively, assuming radius=box-size), then a bounding box will be made which includes the boxes both the start and end positions. if the two movement boxes don't collide, then there is no real chance of the objects themselves colliding.

after this point, it may be possible to instead treat the movement as a pair of moving spheres. if at the CPA (closest point of approach) there is no collision (ADD: "distance>(radius1+radius2)"), then the collision can also be eliminated.

then, more precise collision checks can be done by checking the objects at the positions they would be in at the CPA, and if they still collide, then any collision handling can be done (such as applying contact forces and similar).


though, it is also notable how, for the most part, I still most often end up using simple AABBs. Edited by cr88192
0

Share this post


Link to post
Share on other sites

dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).


the trick here is figuring out how to quickly rebuild the BSP in real-time

BSP... BSP... BSP... what have you got dudes?

Realtime BSP. Are you serious?

Trying to make BSP realtime is like trying to adapt an internal combustion engine to run on electricity only. It's just easier to use an electric motor. Bullet uses SAP and AABB trees. The latter is quite faster when dealing with a lot of moving geometry.

I have typically also added in AABB tests for the coarser checks...
though, it is also notable how, for the most part, I still most often end up using simple AABBs.

I mean, that says all!
0

Share this post


Link to post
Share on other sites


dunno about their case, but my engine dynamically builds a BSP-like tree for sorting all this out (mostly by eliminating nearly everything that can be determined not to collide).


the trick here is figuring out how to quickly rebuild the BSP in real-time

BSP... BSP... BSP... what have you got dudes?
Realtime BSP. Are you serious?
Trying to make BSP realtime is like trying to adapt an internal combustion engine to run on electricity only. It's just easier to use an electric motor. Bullet uses SAP and AABB trees. The latter is quite faster when dealing with a lot of moving geometry.


with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).


I have typically also added in AABB tests for the coarser checks...
though, it is also notable how, for the most part, I still most often end up using simple AABBs.

I mean, that says all!


not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.
0

Share this post


Link to post
Share on other sites

with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).

I have several doubts about this. ID software had to deal with this problem for at least three product iterations. I apologize, but I have a lot of trouble understanding what you mean by "sensibly written construction algorithm".

Reiterating the concept: AABB tree. Proven to work!

I don't understand you all. Why to try making BSP a dynamic algorithm when you have AABB trees?

Algorithmic complexity means nothing in the real world. It's not like two algorithms are equivalent because they have the same algorithm complexity.

 

not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.

Yes, and by extension, do we need to crack our head open in trying to make a dynamic structure out of something which is well understood to be static in nature?

 

What's the deal with BSPs? What have they done so good to you all we cannot just move on?

 

What's the deal with the "center of mass" and BSP?

0

Share this post


Link to post
Share on other sites


with a sensibly written construction algorithm, the rebuild time is O(n log2 n), and it is generally "plenty fast enough" for typical use cases (in my tests, it holds up fairly well with thousands of objects in a scene, ...).

I have several doubts about this. ID software had to deal with this problem for at least three product iterations. I apologize, but I have a lot of trouble understanding what you mean by "sensibly written construction algorithm".
Reiterating the concept: AABB tree. Proven to work!
I don't understand you all. Why to try making BSP a dynamic algorithm when you have AABB trees?
Algorithmic complexity means nothing in the real world. It's not like two algorithms are equivalent because they have the same algorithm complexity.
 

not because of performance, but because, for the most part, is much beyond an AABB for most objects actually needed?...

do we need a precise bounding volume for a flying rocket, or a dropped item? not really.

Yes, and by extension, do we need to crack our head open in trying to make a dynamic structure out of something which is well understood to be static in nature?
 
What's the deal with BSPs? What have they done so good to you all we cannot just move on?
 
What's the deal with the "center of mass" and BSP?


if you don't know how the "center of mass" fits into the BSP building, then do you have any idea how the construction algorithm actually works in this case?...


a traditional BSP, like the ones id has often used, look for the "optimal" division plane, generally by checking against "every possible plane within the given set". this is where the big cost comes in, as a person generally has to check a large number of planes and evaluate their relative goodness, working out to roughly O(n^3 log2 n).

but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math. Edited by cr88192
0

Share this post


Link to post
Share on other sites

but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math.

 

 

 

I'm about to implement some broad phase checks into my engine, and I'm curious about the approach you've listed here.  In practice, how much better is it to "sensibly" construct a BSP-tree over "naively" dividing the scene by some notion of a midpoint (as in a typical quad- or oct-tree)?

 

So far, I've been leaning towards a a purely naive oct-tree construction.  Has your experience with the "sensible" BSP-tree determined that it's been worth it over a simple oct-tree?

Edited by Polarist
0

Share this post


Link to post
Share on other sites


but, if a person instead simply calculates a plane that "roughly divides things in half" (while not giving a crap how "good" the division plane is), then it is *much* faster (as the whole *expensive* part is simply removed).

and, it only takes O(n) to calculate an average and a relative distribution (or a 3D analogue of a standard deviation), and a single O(n) pass to divide the set, ...

stated another way: a few linked-list walks, and some vector math.

 
 
 
I'm about to implement some broad phase checks into my engine, and I'm curious about the approach you've listed here.  In practice, how much better is it to "sensibly" construct a BSP-tree over "naively" dividing the scene by some notion of a midpoint (as in a typical quad- or oct-tree)?
 
So far, I've been leaning towards a a purely naive oct-tree construction.  Has your experience with the "sensible" BSP-tree determined that it's been worth it over a simple oct-tree?


I haven't really done any specific checks, but making an "educated guess", I would guess they would probably be fairly comparable performance wise.


the main difference is that an oct-tree divides the space into 8 regions in each step, so requires less recursion, but has to do a few more checks at each step (since there are 8 possible regions to step into). dividing the space in a binary-fashion allows simpler logic, but a larger number of internal nodes.

some forms of oct-tree also always divide the region into 8 uniformly sized spaces, which may lead to inefficient partitioning (vs the use of an explicit center point, which works better with typical "lumpy" distributions).


with a naive dividing-axis calculation, the BSP tended mostly to divide along "similar" plane vectors (generally somewhere in the (1, 1, 1) direction), making the dividing planes worse than with an oct-tree or KD-tree (basically cutting the world into a bunch of diagonal slices).

later on, I developed some slightly better calculations, causing it to use more axis-aligned partitioning when this was more appropriate.

the main difference vs this and something like a KD-tree was more that it can more nicely align with various distributions.


FWIW though, for the main server-side of the engine though, I don't use a single giant BSP though, but more divide it into "regions" which exist on roughly a uniform grid (512 meter / 16384 units), mostly to simplify incremental loading/unloading (FWIW, each region also partially has its own local coordinate space as well). (partly in my experiments because for distances measured in km or so, the precision of floats actually starts to matter).


or such...
0

Share this post


Link to post
Share on other sites

It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

 

One thing I've learned about collision and pretty much anything in games is that there is no one bill fits all way of doing things. A lot of things have cost vs precision as well as just differing in ways that apply better to a type of game, there is no real "standard" way of checking collision that is good for every type of object or gameplay. Bounding boxes are a big one but they obviously have precision issues in many cases and can have tunneling problems.

 

Anyway the broad phase and narrow phase go hand in hand, if you eliminate a lot of objects but then do a thousand ray casts you're still gonna get awful performance, especially since the way broad phase algorithms tend to work is that the collision detection multiplies in complexity as more objects move within collision range of each other, that's rather unavoidable unfortunately.

 

Generally cheaper is better unless it causes major glitches, thats why a lot of games especially older ones have special areas on maps you can abuse the collision and get through the geometry, n64 was big on that.

1

Share this post


Link to post
Share on other sites

Satharis, on 16 Feb 2013 - 06:07, said:
It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

One thing I've learned about collision and pretty much anything in games is that there is no one bill fits all way of doing things. A lot of things have cost vs precision as well as just differing in ways that apply better to a type of game, there is no real "standard" way of checking collision that is good for every type of object or gameplay. Bounding boxes are a big one but they obviously have precision issues in many cases and can have tunneling problems.

yep.

luckily, in an FPS like setting or similar, "most" objects work pretty well as bounding-boxes (and fancier objects are usually mostly cosmetic).

Quote
Anyway the broad phase and narrow phase go hand in hand, if you eliminate a lot of objects but then do a thousand ray casts you're still gonna get awful performance, especially since the way broad phase algorithms tend to work is that the collision detection multiplies in complexity as more objects move within collision range of each other, that's rather unavoidable unfortunately.

bounding boxes are often used in my case for first-pass elimination for more expensive collision checks (following the broad BSP query or similar).

most of the time, the bounding box can eliminate the collision well before having to doing any more expensive checks, and is one of the computationally cheaper checks (along with a sphere, but typically an AABB is a better fit than a sphere, so will eliminate a lot more "false positives").

like, you don't need to check collisions against an object precisely, when it is across the room, only to quickly eliminate it.
even in a large pile (say, a pile of bricks or similar), usually each brick is only in close proximity only to those bricks it is touching, so, something like an AABB can still separate each brick from most other bricks in the pile.

the BSP will not often clearly separate individual bricks, but more just eliminate checking between bricks in one brick-pile, and another similar pile elsewhere in the map.


luckily there is also a trick that if an object's net velocity gets low enough, an option is to "freeze" it and take away its internal velocity (it becomes non-moving until interacted with again by a faster-moving object), which both saves CPU cycles, and helps avoids some ugly effects (things just endlessly wobbling around, and piles of things just randomly "exploding"), but does make a slight issue if things can sometimes settle in an obviously "not physically plausible" manner (can happen if this freezing occurs too aggressively, an object can freeze leaning over an edge, or result in stacks which "should" reasonably fall over, ...).

Quote
Generally cheaper is better unless it causes major glitches, thats why a lot of games especially older ones have special areas on maps you can abuse the collision and get through the geometry, n64 was big on that.

yep, and also why I typically very rarely use meshes for non-static objects. mesh/mesh collision checks are very expensive.
most other checks against meshes are also expensive, but them being static means they can be excluded from "dynamic" checks (*1).


so, a ranking I think (for straight collision checks):
sphere: cheap (single compares), often a poor fit, usually better for "worst case"(but with a few partial exceptions);
AABB: still cheap (6 compares), often a much better fit than a sphere;
OBB: costs more than an AABB (up to 12 axis projections/24 compares), but can rotate freely (very useful for things like crates, bricks, ...);
convex polyhedra: typically cost a little more than an OBB (n*m projections, 2*n*m compares, where usually n,m>=6), but can match arbitrary convex shapes, and can usefully approximate most small non-convex objects (both work on the "separating axis theorem");
mesh object: expensive, but can represent arbitrary shapes.

so, with these later forms, a person can keep a radius and a bounding-box around, for first-pass elimination.

there are also cylinders and capsules, but they are rarely used in my case, and not particularly stable (it is easier to get good contact behavior from face-based solids, like OBBs or polyhedra, via the use of CSG operations). (mesh/mesh object collisions are also a little problematic, as I am not currently aware of a particularly good way to calculate contact forces, so IIRC they just directly push away from each other, more like spheres).


*1: in my engine:
"static" means "object will not move". it is not checked against anything else, but moving (dynamic) objects may check against it (typically, this is for things like world-geometry).
"semi-static" means "object does not move on its own", but rather its movement is controlled externally (by program code), and typically will only check against dynamic objects (most typical entities fall in this camp, which currently includes pretty much everything from players/AIs/doors/platforms/rockets/... which are instead under the control of their "movetype", *2);
"dynamic" means "object moves freely and responds to physical forces".

*2: the "movetype" system is a nasty jury-rigged mess, but ironically, was written much later than the fancier physics (simpler Quake-like physics works a lot better / more-reliably for a lot of things, where originally I tried using fancy-physics for pretty much everything), and depends a lot more on the server-side game logic (not currently available to the physics engine). note that "solidtype" deals with both sets of solids (as far as the server-end is concerned).


sadly, my fancy physics code (its own evil mess of sorts) hasn't been updated to deal with voxel terrain, so ATM it isn't really all that useful (try to drop a brick or crate directly onto voxel terrain, and it will just keep on going). I have considered fixing this, but haven't gotten to it (hasn't seemed like a huge priority). likely, as-is, it would require API-level modifications to the physics engine (API is, errm, sort-of structured in mostly the same sort of way as legacy OpenGL, as it "seemed like a good idea at the time").

admittedly, I am almost tempted more to just rewrite it all into something hopefully less of a nasty mess, like a single unified physics engine that both deals with both types of physics, and hopefully improve the API design some (great evils lurk here...), but am presently torn some between the possible options (and, otherwise, just haven't really gotten around to it). Edited by cr88192
0

Share this post


Link to post
Share on other sites

It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

 

 

May you elaborate more in detail and gives example pertaining to CPU time? Interesting stuff
0

Share this post


Link to post
Share on other sites


It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

 
 
May you elaborate more in detail and gives example pertaining to CPU time? Interesting stuff


if your scene has 10000 objects:
you can check the line against each object (simple linear search), so, 10000 checks;
or, a tree-structure (BSP or oct-tree or similar) may get it down to around 10.

what if 1000 of these does a line-trace in a tick?
first option: 10000000 line/box checks;
second option: 10000.

what if each of these 1000 is an enemy, and performs a radius query, and each finds 100 objects in range, and line-traces to each?
first option: 1000000000 line/box checks;
second option: 1000000.

now what, if for reliable detection, the visibility is performed via multiple traces, say, 9 (eyes trace to the center and each corner point of their visible bounding-box)? ...

these sorts of costs can add up fairly quickly (and this is just with all the enemies standing around doing nothing).

you don't really want to be doing 1000000000 of *anything* in a tick (and even 1000000 still isn't a good number).


games have often used tricks here, like only "activating" enemies AIs when a player would pass through a certain door or collide with a certain trigger-region (or come within a certain range), and only checking for the current player (AI doesn't look for targets in-range, but simply checks "can I trace a line to the player?", possibly simply rotating between players for multiplayer games).

likewise, early elimination and aggressive culling here can often mean the difference between frames-per-second, and seconds-per-frame.

likewise in a 3D renderer: much of the work isn't so much about getting things drawn quickly, but quickly figuring out what needs to be drawn. Edited by cr88192
0

Share this post


Link to post
Share on other sites

It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

 

 

May you elaborate more in detail and gives example pertaining to CPU time? Interesting stuff

As cr88192 pointed out it's really just the CPU having to do "more stuff." CPU's aren't magic, they work in linear time and the only limit to them really is how many things they can do over a period of time.

 

Often in games you have something like the logic processing stage of the game loop, this is where the CPU takes control, doing mathematical comparisons, collision detection, AI choices, ray casting in this case, so on. Basically it does everything to make game objects interact, subsequently the more time the CPU spends doing tasks the less time is available for other tasks. If you do 100k checks compared to 10k checks you are spending 10x as much CPU time on that.

 

CPU's these days are pretty robust, it takes a lot of math work to bog them down but in higher end games it certainly becomes a problem, if you think about something like an AAA fps you may have an entire level loaded with geometry, AI soldiers or something running around and interacting, and the game is constantly checking their collision with things near them as well as every other task.

 

This is why games are often considered "CPU" or "GPU" bound, because it is a linear step(one game may have very little for the video card to process but a ton of CPU work to process), for simplicity sake most games do the logic processing then the graphic updates. If the CPU is slow it may take a long time to do all the processing required, and then the GPU only has so much time to render to maintain the framerate. But that's what it comes down to, the two real ways to optimize when coding are to do something that takes -less- CPU time, or to do less things in general. As an example, one simple addition statement takes a lot less cycles than casting and checking 10k rays every logic update(which may be every 20 ms or something.)

Edited by Satharis
0

Share this post


Link to post
Share on other sites
In a modern engine, ray casts are much more affordable, but often they have to be deferred.

Take a typical game logic thread. Usually it is single threaded and it has a budget to stick to. Doing thousands of ray casts would slow the thread down; they are not that cheap. But suppose the code can request a ray cast ahead of time? Then another thread can pick up the request, do the calculations and store the result to be used later.

Pretty much every system has multiple cores these days. It's not just how much work you can do, it's whether you can parallelise it.
0

Share this post


Link to post
Share on other sites

In a modern engine, ray casts are much more affordable, but often they have to be deferred.

Take a typical game logic thread. Usually it is single threaded and it has a budget to stick to. Doing thousands of ray casts would slow the thread down; they are not that cheap. But suppose the code can request a ray cast ahead of time? Then another thread can pick up the request, do the calculations and store the result to be used later.

Pretty much every system has multiple cores these days. It's not just how much work you can do, it's whether you can parallelise it.

Unfortunately that entirely depends on the thing in question and if it can run concurrently, there actually are a lot more things that can't run concurrently than can do to the simple fact that even if you thread them they just end up doing a task that the main thread is waiting to be completed, so it actually ends up worse performance due to context changes ontop of the same wait.

 

Of course some things can be run concurrently but that's where smart optimization comes in, probably one of the worst things people do that discover the power of threads is to just try and thread everything and then end up with worse performance, deadlocks or memory corruption, so on.

 

Doing less of something is always a straight performance gain unless it somehow causes more work later on(which doesn't happen very often.) Threading is a lot more.. dependant.

0

Share this post


Link to post
Share on other sites
"Suppose the code can request a ray cast ahead of time"

It doesn't sit there and do nothing, it gets on with other work until the result is ready. Yes, it's a bit more complicated, but that's the world we live in, and that's what games have to to to get the best performance.

An old article, but it describes the current state of affairs:

http://www.gotw.ca/publications/concurrency-ddj.htm
0

Share this post


Link to post
Share on other sites

In a modern engine, ray casts are much more affordable, but often they have to be deferred.

Take a typical game logic thread. Usually it is single threaded and it has a budget to stick to. Doing thousands of ray casts would slow the thread down; they are not that cheap. But suppose the code can request a ray cast ahead of time? Then another thread can pick up the request, do the calculations and store the result to be used later.

Pretty much every system has multiple cores these days. It's not just how much work you can do, it's whether you can parallelise it.

while partly true, there are limits:
if a person manages to fill up all the cores with work, they will not get more performance out of more threads;
inter-thread communication typically introduces a certain level of latency (with tradeoffs regarding how things are organized, ...).

for a CPU-bound program, this means around a 2x-4x max speedup on a quad-core.
IMO, threads are generally better for coarse-grained asynchronous work, rather than fine-grained or synchronous tasks.

...
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0