# Large scale particle collision systems

This topic is 4513 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

What do people use to speed up large-scale particle collision, like thousands of particles? Something that is better than:
for(i=0; i < num_particles; i++) {
for(k = (i + 1); k < num_particles; k++) {
// if particle is within distance to particle[k], then repel them.
}
}

That's strange. I can't see my plus signs in the 'Show Preview'.

##### Share on other sites
I believe I'm looking for a type of optimization. Did some searching and found a list of some
interesting structures to use:

Octree
KD Tree
BSP
OBB tree (BVH)
K-dop tree
Uniform grid
Hashing
Dimension reduction

If my collisions are purely particle collisions, is there any one type of optimization that is
best to use?

##### Share on other sites
If we can assume that the particles share the same (or similar) radii, I would definitely go for a uniform grid and hashing hybrid.

Consider an infinite uniform grid and a mapping function:
f(x,y,z) -> grid's element.

One way to define f is:
f(x,y,z) = floor( x/sx, y/sy, z/sz ),
where sx, sy and sz are the grid's element size.

In this case, each grid element is represented by three integers.
Moreover, if the particle's radius is R, a good choice for the grid's element size would be:
sx = sy = sz = R.

Now, for each non empty element, we keep an entry inside a hash table. In our example, since we define each element as a triplet (a,b,c), a reasonable hashing function would be something like:
hfunc(a,b,c) = prime0*a + prime1*b + c.

Given some particle at (x,y,z), we can retrieve potentially colliding particles by looking up the appropriate grid's elements in the vicinity of f(x,y,z).

If all particles have the radius R and sx = sy = sz = R, we have to check at most 27 elements inside the hash table.

Of course we don't have to repeat the same process for each particle. Consider the case where several particles are located inside f(x,y,z). Repeating the same process for each particle means that we are virtually doing the same thing all over again.

This method is SLOWER than a regular uniform grid approach because of the overhead in using the hash table. We do have two very important advantages though:
1) The memory usage is reduced dramatically.
2) This approach allows us to work with infinite grids which is a big plus
if we are working inside a large environment.

##### Share on other sites
Thanks, I think I need to brush up my knowledge on hashing vertex techniques.

##### Share on other sites
Quote:
 If we can assume that the particles share the same (or similar) radii, I would definitely go for a uniform grid and hashing hybrid.

This will work fine when particles can collide with eachother ONLY.
If they need to collide against meshes and other collision primitives of very different sizes a grid/cell based approach might not be the best choise.

I'm using a dynamic loose aabb tree for all collision queries, to see what I'm talking about I've got some test exe's here (the text below is shamelessly copied from a previous post of mine):

Be aware that the fps is quite bad, mostly thanks to the slow/bad drawing of my debug objects.
The interseting numbers are the "Update took" and "Query took" counter, they are all in micro seconds (10e-6).

Press and hold the right mouse button to activate mouse look. W/A/S/D to move around (or arrows).
Press space to start/pause animation.
Press T to toggle tree hierarchy view on/off.

* FrustumTest_10000.exe, shows 10000 aabb's that moves around (Update took), aabb's that are within one of the three "frustums" are solid (Query took).
Tetrahedral frustums performs similar (just to lazy to draw them :)

* OverlapTest_2500.exe, shows 2500 aabb's that moves around (Update took), aabb's that overlaps ANY other aabb are solid (Query took).
Using bruteforce this would require over 3 million aabb vs aabb tests.
The tests takes about 12ms on my machine = 60+ fps.

* SimpleFrustumTest_16.exe, shows 16 aabb's that moves around (Update took), aabb's that overlaps the frustum (cube) are solid (Query took).
This sample is basicly where you'd like to hit T to see how the tree is created/modified.

I don't remember the hardware requirements for these demos, it's DX9 and uses SSE anyway (think you need a P4, using the Intel compiler).

Also note that the code isn't super optimized, but a rather flexible templated class.
You could probably optimize it a bit more by removing some templating stuff.
Also consider using spheres instead of aabb's.

##### Share on other sites
Quote:
 This will work fine when particles can collide with eachother ONLY.If they need to collide against meshes and other collision primitives of very different sizes a grid/cell based approach might not be the best choise.

I did mention that my system consisted only of particles. They all will be the same size, so it is ok.

##### Share on other sites
Quote:
Original post by eq
Quote:
 If we can assume that the particles share the same (or similar) radii, I would definitely go for a uniform grid and hashing hybrid.

This will work fine when particles can collide with eachother ONLY.
If they need to collide against meshes and other collision primitives of very different sizes a grid/cell based approach might not be the best choise.

You are correct. One way to handle widely varying radii, is to use several hashing tables with different resolutions. A good algorithm that does that was proposed by M. Overmars. Tell me if you want to hear more details.

I, for one, never had the chance to test both AABB trees and spatial hashing in the same environment so I don't really know which is better or faster.
I'd like to hear from someone who did.

##### Share on other sites
There are other considerations in your search, for example you will probably want to assign a range of characteristics to each particle:

Velocity
Direction
Orientation
Energy State
Temperature (K)
Neighbouring Entities (optimisation trick)
Charge
Spin

I would be more inclined towards a modified BSP format as it allows for conditional analysis and order sorting. A highspeed database essentially.

##### Share on other sites
Quote:
Original post by eq
Quote:
 If we can assume that the particles share the same (or similar) radii, I would definitely go for a uniform grid and hashing hybrid.

This will work fine when particles can collide with eachother ONLY.
If they need to collide against meshes and other collision primitives of very different sizes a grid/cell based approach might not be the best choise.

I'm using a dynamic loose aabb tree for all collision queries, to see what I'm talking about I've got some test exe's here (the text below is shamelessly copied from a previous post of mine):

Be aware that the fps is quite bad, mostly thanks to the slow/bad drawing of my debug objects.
The interseting numbers are the "Update took" and "Query took" counter, they are all in micro seconds (10e-6).

Press and hold the right mouse button to activate mouse look. W/A/S/D to move around (or arrows).
Press space to start/pause animation.
Press T to toggle tree hierarchy view on/off.

* FrustumTest_10000.exe, shows 10000 aabb's that moves around (Update took), aabb's that are within one of the three "frustums" are solid (Query took).
Tetrahedral frustums performs similar (just to lazy to draw them :)

* OverlapTest_2500.exe, shows 2500 aabb's that moves around (Update took), aabb's that overlaps ANY other aabb are solid (Query took).
Using bruteforce this would require over 3 million aabb vs aabb tests.
The tests takes about 12ms on my machine = 60+ fps.

* SimpleFrustumTest_16.exe, shows 16 aabb's that moves around (Update took), aabb's that overlaps the frustum (cube) are solid (Query took).
This sample is basicly where you'd like to hit T to see how the tree is created/modified.

I don't remember the hardware requirements for these demos, it's DX9 and uses SSE anyway (think you need a P4, using the Intel compiler).

Also note that the code isn't super optimized, but a rather flexible templated class.
You could probably optimize it a bit more by removing some templating stuff.
Also consider using spheres instead of aabb's.

eq - do you know of any links of information on how to implement the "dynamic loose aabb tree" you mention? I've read lot's of posts here that show how to create trees for static geometry/object, but no info for dynamic objects.

thanks
-Steve

##### Share on other sites
Quote:
 Original post by MooMansunI would be more inclined towards a modified BSP format as it allows for conditional analysis and order sorting. A highspeed database essentially.

What do you mean by "a modified BSP format"?

I have a very little experience with BSPs so please forgive me if what I am about to say is nonsense.

Classic BSP is used to partition space, so I guess that in order to incorporate BSP in a particle collision scheme you should enclose the particles with some sort of a convex polyhedron, say AABB.

(1) If we choose AABB, we have 6 partitioning planes for every particle.
(2) While traversing the BSP, we have to perform some (relatively fast)computation for each plane that we encounter, in order to determine where the colliding object stands relative to the plane.
(3) When the colliding object intersects with the partitioning plane, we have to traverse the BSP in both directions.
(4) Classic BSP was not designed to handle dynamic data. In fact, you would have to reconstruct it all over again for each frame.

Also, consider the following example:
All of your particles lie on the ground which is aligned with the X-Z plane.
If your particles have the same radius and we use AABB to encapsulate them,
then ALL of the particles have two identical planes!!! This means that performing the collision will be roughly a O(n^2) process because of (3).

I am not sure how well the algorithm scales because of (1), (2) and (3) as the number of particles become larger when compared to loose AABB and spatial hashing. Any thoughts on that?
Moreover, one has to understand that (3) might have a considerable impact in the event where we have dense clustering of the particles.

As for my example. Unless you have a solution to that, I wouldn't touch BSP with a 10 inch stick. :)

(4) is a very important issue since the construction of a BSP is not linear,
while both loose AABB and spatial hashing are. I guess that you probably have an answer to (4) so I would love to hear about it.

Tell me what you think.

##### Share on other sites
Quote:
 eq - do you know of any links of information on how to implement the "dynamic loose aabb tree" you mention? I've read lot's of posts here that show how to create trees for static geometry/object, but no info for dynamic objects.

Unfortunatly no :(
I got the idea from some sphere tree demo I saw (just an executable). It had a fixed number of levels in the tree, i.e two supernodes and leaf nodes.
I also didn't like the fact that finding the minimaly enclosing sphere for a set of spheres is a very complicated thing, that's why I choose aabb's.

My approach is totally dynamic, I've seen 20+ tree levels.
There are a few heuristics that can be modified, i.e how much to expand the nodes depending on an aabb's velocity etc, when to remove an aabb from the parent and so on.
As you can guess from the test exe's, my heuristics are based on the aabb volumes. If the ratio between the sum of the contained aabb's and the parent aabb, violates some threshold then do this or that and so on..
It was quite a challenge to implement the system from scratch but I liked the end result, it's quite clean and the expensive operations are used very seldom.

The "No update" in the demos is the number of times I didn't have to do anything when an aabb was updated (moved or resized). This is a very fast operation (basically: test if my new aabb is "visible" outside the parent aabb).
The "Grow node" is the number of times I had to grow an existing node in the tree (the tree structure is intact). This is also a quite fast operation, just calculating the "expansion" velocity of the updated aabb and merge it with the parent aabb.
The "Reinsert" is the number of times I need to remove the aabb from the tree, then reinsert it again from the root.
This is the only operation that reorganizes the tree structure and is relatively expensive.
In the 10000 aabb test, less then 2% of the updates performs the expensive path.

Sorry for not being able to point out some good papers on the subject :(

##### Share on other sites
Thanks for the info eq.

One more question - how do you find/update the overlapping pairs of AABBs? Is it similar to sweep and prune where you use frame coherency or do you rebuild from scratch each frame?

##### Share on other sites
In the samples I don't use any frame coherence.
For every aabb, I do an aabb query on the structure, this returns all overlapping aabb's in the tree.
In the example I have 2500 aabb queries, one for each aabb (not entirely true since if aabb X overlaps aabb Y, I mark Y as tested).

In my real applications I've only used the aabb tree for frustum culling and object picking (ray casting).
I haven't got the need for real collisions and physics at the moment, so I don't know how good it work in a real game situation.

To handle swept objects I guess you could store the swept aabb in the tree, I.e the aabb for the object that surronds the object at all timesteps from the previous frame to the next. In it's simplest form the aabb that encapsuls the aabb of the previous frame and the current.
Then for every object that moves, you input the swept aabb instead of the static aabb to get all overlapping aabb over the time.

##### Share on other sites
Quote:
 What do you mean by "a modified BSP format"?

BSP Particle Collision Format. It would adhere to most of what I describe in this post.

Quote:
 I have a very little experience with BSPs so please forgive me if what I am about to say is nonsense.Classic BSP is used to partition space, so I guess that in order to incorporate BSP in a particle collision scheme you should enclose the particles with some sort of a convex polyhedron, say AABB.

No, its not nonsense, very good questions.

Quote:
 (1) If we choose AABB, we have 6 partitioning planes for every particle.

I am considering that each of the following should be attributed to a dimension, or aspect, of each particle. I feel that the node based structure of BSP lends to that application. Saving state information, from a particular point of view (i.e. a particular electron) can be achieved using the 'Neighbouring Entities'. One thing I think is not studied enough in partical collisions is relativity and this method just seems nice and flexible for that application.

Velocity
Direction
Orientation
Energy State
Temperature (K)
Neighbouring Entities (optimisation trick)
Charge
Spin
(Numerous Others...)

"The partitioning can be represented as a tree, which has the partitioning planes at the internal nodes and objects of interest at the leaves."
http://66.102.9.104/search?q=cache:peBGp8xJ6CkJ:www.cc.gatech.edu/classes/AY2006/cs4451b_fall/bsp.pdf+partitioning+planes&hl=en

Quote:
 (2) While traversing the BSP, we have to perform some (relatively fast)computation for each plane that we encounter, in order to determine where the colliding object stands relative to the plane.

I did briefly consider this during my post, however, I felt that most simulations are pre-computed to provide accuracy and speed. I also felt that considering the number of dimensions, or aspects, to each particles current state, that the node structure would allow flexibility to alter, add or remove those. Finally, it also allows flexibility to increase the physical dimensions, 4, 10 or 26.

Quote:
 (3) When the colliding object intersects with the partitioning plane, we have to traverse the BSP in both directions.

Actually, with Neighbouring Entities, the BSP would be traversed in both directions multiple times. Again, I see this as acceptable for pre-computed data, especially because of the flexibility.

Quote:
 (4) Classic BSP was not designed to handle dynamic data. In fact, you would have to reconstruct it all over again for each frame.

Most of the dynamic aspect would be carried out in main memory, much like a standard BSP. The way I was thinking about this implementation was that the BSP file would be loaded into memory and manipulated by another file to achieve pre-computation of a desired reaction/collision. This would allow separation of the simulated test environment and the data used to manipulate it. Otherwise, a new test environment would need to be composed each and every time a new simulation was run. I don't view that lack of re-usability as acceptable.

Quote:
 Also, consider the following example:All of your particles lie on the ground which is aligned with the X-Z plane.If your particles have the same radius and we use AABB to encapsulate them,then ALL of the particles have two identical planes!!! This means that performing the collision will be roughly a O(n^2) process because of (3).

I'm afraid this is one of the problems with collision simulation. This is why most of it, even with super-computers, is pre-computed as much as possible. If you do not perform those calculations, your accuracy is questionable.

Quote:
 I am not sure how well the algorithm scales because of (1), (2) and (3) as the number of particles become larger when compared to loose AABB and spatial hashing. Any thoughts on that?Moreover, one has to understand that (3) might have a considerable impact in the event where we have dense clustering of the particles.

Neither am I, that is why I used the term 'modified' loosely.

Quote:
 As for my example. Unless you have a solution to that, I wouldn't touch BSP with a 10 inch stick. :)

There's life in that old BSP horse yet... :)

##### Share on other sites
I don't mind anyone hijacking my thread, as long as it addresses my specific problem,
and I don't see what this has to do with my problem.

##### Share on other sites
Quote:
 What do people use to speed up large-scale particle collision, like thousands of particles?

You never specified what exactly you wanted to monitor/alter. How you speed it up, depends on what information is being processed, as much as how it should be presented to the end user.

##### Share on other sites
An alternate approach, which would cache very well, is to sort along one axis, and march along, collecting duplicates. The sort is a fast N lg N, the marching is "approximately" N lg N as well.

Assuming all the particles have the same radius, you can sort on the midpoint; else you must sort on the left edge of the bounding volume of each particle.

// pseudo-codestruct ParticleInfo {  float x, y, z;  ActualParticle * p;};std::vector< ParticleInfo > particles;void check_collisions() {  // assume the particles have some spread along the x axis  sort_particles_on_x( particles );  std::deque< ParticleInfo * > active;  foreach ( ParticleInfo & pi in particles ) {    retire_active( active, pi.x-radius );    check_collisions( active, pi );    active.push_back( &pi );  }}void retire_active( deque active, float minx ) {  foreach( iterator i in active ) {    if( i->x <= minx ) {      active.erase( i );    }  }}void check_collisions( deque active, ParticleInfo & pi ) {  foreach( iterator i in active ) {    if( distance( pi, *i ) < 2*radius ) {      got_a_collision( pi, *i );    }  }}

Instead of a deque, you can use a list or a vector if the number of expected size of the container is small. In fact, vector may perform the best there, too, for many cases.

If you find that most of your particles bunch in the middle, you might want to do a secondary sort along another axis for the active list, and do a binary search when looking for collisions when adding the new particleinfo; this will reduce the average-case number of tests a little bit. However, it should only be necessary with thousands of particles that intersect a lot.

Note that the worst case is always N * N, because each particle may actually collide with every other particle, and thus you need to generate N * N / 2 actual collision pairs, thus that's the runtime of the algorithm in that case.

##### Share on other sites
Quote:
Original post by eq
Quote:
 eq - do you know of any links of information on how to implement the "dynamic loose aabb tree" you mention? I've read lot's of posts here that show how to create trees for static geometry/object, but no info for dynamic objects.

Unfortunatly no :(
I got the idea from some sphere tree demo I saw (just an executable). It had a fixed number of levels in the tree, i.e two supernodes and leaf nodes.
I also didn't like the fact that finding the minimaly enclosing sphere for a set of spheres is a very complicated thing, that's why I choose aabb's.

My approach is totally dynamic, I've seen 20+ tree levels.
There are a few heuristics that can be modified, i.e how much to expand the nodes depending on an aabb's velocity etc, when to remove an aabb from the parent and so on.
As you can guess from the test exe's, my heuristics are based on the aabb volumes. If the ratio between the sum of the contained aabb's and the parent aabb, violates some threshold then do this or that and so on..
It was quite a challenge to implement the system from scratch but I liked the end result, it's quite clean and the expensive operations are used very seldom.

The "No update" in the demos is the number of times I didn't have to do anything when an aabb was updated (moved or resized). This is a very fast operation (basically: test if my new aabb is "visible" outside the parent aabb).
The "Grow node" is the number of times I had to grow an existing node in the tree (the tree structure is intact). This is also a quite fast operation, just calculating the "expansion" velocity of the updated aabb and merge it with the parent aabb.
The "Reinsert" is the number of times I need to remove the aabb from the tree, then reinsert it again from the root.
This is the only operation that reorganizes the tree structure and is relatively expensive.
In the 10000 aabb test, less then 2% of the updates performs the expensive path.

Sorry for not being able to point out some good papers on the subject :(

I've thought of another question: How do you initially setup the tree? Do you create it as if you were creating a tree for static objects ie. perform a one step creation (slow) once all of your world objects are known? Also - once you have the tree running, how do you add new created objects to it? Do they have to go into an existing node? This maybe related, but how do you "reinsert" an object? I guess I'm confused as how you create new parts of the tree when removing/adding objects. Oh and if you haven't guessed by now I'm very curious about possibly using such a system to replace the sweep and prune system I currently have in my ongoing physics engine.

thanks!

##### Share on other sites
Quote:
 Original post by hplus0603An alternate approach, which would cache very well, is to sort along one axis, and march along, collecting duplicates. The sort is a fast N lg N, the marching is "approximately" N lg N as well.

If I went with an uniform grid with rectangular cells, how would I resolve collisions on
the boundaries of these cells? Putting the particles in the grid cells is fast, I just cast
their floating-point position to an integer, so no sorting is necessary.

I was thinking maybe of having overlapping rectangular cells or maybe overlapping
circular cells, but not quite sure how to place the particles in them.

##### Share on other sites
Quote:
 Original post by JakeMIf I went with an uniform grid with rectangular cells, how would I resolve collisions on the boundaries of these cells? Putting the particles in the grid cells is fast, I just cast their floating-point position to an integer, so no sorting is necessary.I was thinking maybe of having overlapping rectangular cells or maybe overlapping circular cells, but not quite sure how to place the particles in them.

If you do that, then you are better of using a loose AABB tree.

Uniform grids are usually used somewhat differently. Suppose that we enclose a particle with a box. After "placing" it in a grid cell we can find potentially intersecting particles by looking up all neighboring cells that intersect with our box.
In a perfect situation, where the cells are bigger than the particle radius, you have to test at most 27 cells, (1 for the cell that you are in and 26 for its neighbors). As suggested in my previous post, the process can be optimized. In fact, an approach similar to the one proposed by hplus0603 can be devised, only without the worst case scenario.

http://erinhastings.net/papers/hashing-sim.zip
http://www.merl.com/reports/docs/TR97-23.pdf

##### Share on other sites
Quote:
 I've thought of another question: How do you initially setup the tree? Do you create it as if you were creating a tree for static objects ie. perform a one step creation (slow) once all of your world objects are known?

These are the only methods that manage the tree:
Handle insert(const T& t, const Vec3& aabbMin, const Vec3& aabbMax);  // Inserts a new object of type T into the aabb tree, return a handle to the object.void update(const Handle& it, const Vec3& aabbMin, const Vec3& aabbMax); // Updates an existing object.void erase(const Handle& it); // Removes an object.

The handle works like an interator, except that you can't increase or decrease it (that's why I'm not calling it an iterator :).

When I load an instance of an "item", I simply call the insert method with the items aabb.
So, yes upon a "level" load, all loaded instances are inserted into the tree (just another cost of the load level phase).
It's not terrible slow (FrustumTest_10000.exe takes less then an second to start on my computer, including font loading, dx setup, 10000 tree insertions etc).
One thing to note is that the aabb tree is NOT "loose" for static aabb's.
This is important since you don't want to get too many false "hits" when do queries on the tree.
Try the SimpleFrustumTest_16.exe, and hit 'T' berfore animation is started and you'll see that the parent aabb's fit their children exactly.
It's not until one aabb moves that I increase it's parent size (if necesary).

These are some query methods:
template <class O> void testPolyhedra(const Vec4 planes[], const uint planeCount, O& overlapper); // For every aabb that's behind or intersection all planes, the overlapper functor is called.template <class O> void testAabb(const Vec3& aabbMin, const Vec3& aabbMax, O& overlapper); // For every aabb that's visible within the aabb, the overlapper functor is called.template <class O> void testLine(const Vec3& lineStart, const Vec3& lineEnd, O& overlapper); // For every aabb that's intesect the line, the overlapper functor is called.// A typical "overlapper"struct AddItemToRenderQueue{	AddItemToRenderQueue(Engine* const engine) : m_engine(engine)	{	}	inline bool operator()(EngineTree::Handle it, const Vec3& aabbMin, const Vec3& aabbMax)	{		return m_engine->addItemToRenderQueue(engine);	}	Engine* m_engine;};// Typical usageEngine::render(void){	m_aabbTree.testPolyhedra(m_frustumPlanes, 5, AddItemToRenderQueue(this));}

I've got other more specialized methods as well, that performs the same tests as above but sorts the objects in respect to a point.
I.e, for every node in the tree that's being traversed, list all childrens that's not rejected by the test, sort this list with respect to the point, traverse children in that order.
So my frustum test traverses the tree and gives me a list of all visible objects in roughly front to back order.
The extra sort costs nearly nothing and ray-cast are quite fast.
(I'd always liked to do a "ray tracer" that traces just the aabb's to see how fast it is, haven't profiled the early out ray test yet :).

##### Share on other sites
eq, I'm confused - I've only seen how to create a tree once you have all the known objects - I don't understand how you add items to an initially empty tree an object at a time to build it. Maybe you can shine some light on the inner workings of your "insert" function.

thanks

##### Share on other sites
I'm tired but, basicly I do:

Node = Root.loop:Merge Node aabb with insert aabb.Best child node = nullFor each child node  Calculate the score for a potential merge with this node (the node to insert and the child aabb).  If this score is the best so far     Best child node = child nodeIf the "best" child node is "good enough" to use  set Node = best child node  goto loopIf the best node is a leaf node  Replace leaf node with a tree node containing the previous child node and the node to insert.  Else  No nodes where good enought to use so insert node at this level

Try it out on paper.

The heuristics I use for how good a node is:
Sum of the volume of contained leaf nodes / Volume of aabb containing all leafs.
If this value is getting below a certain threshold it's considered "bad".

##### Share on other sites
Thanks eq - that's exactly what I was after. Much clearer in my head now. Rating += 10 for you :)