"Clumpiness" metric

Started by
9 comments, last by Emergent 12 years, 9 months ago
Hi,
I have a bunch of objects/particles and need to process every pair that is closer than some interaction radius (so basically a spherical collision problem). At the moment I sort the objects along some axis to cut down the n^2 iteration. I now want to determine the optimal axis (one of the coordinate axes, not arbitrary ones). Optimally I'd get the one along which there is the least "clumping" of objects, but how do I measure the "clumpiness"?
At the moment I just calculate the variance for each axis and take the one with the highest value. But this still has the disadvantage that two clumps that are far away also produce high variance, so I'm looking for something better.
Furthermore the method should be "fast" (preferrably O(n), and at least O(n*ln(n))) since everything has to be pretty general purpose and I cannot assume temporal coherence.

Are there any known/clever methods for that? (googling "clumpiness metric" wasn't very useful :D)
Advertisement

Hi,
I have a bunch of objects/particles and need to process every pair that is closer than some interaction radius (so basically a spherical collision problem). At the moment I sort the objects along some axis to cut down the n^2 iteration. I now want to determine the optimal axis (one of the coordinate axes, not arbitrary ones). Optimally I'd get the one along which there is the least "clumping" of objects, but how do I measure the "clumpiness"?
At the moment I just calculate the variance for each axis and take the one with the highest value. But this still has the disadvantage that two clumps that are far away also produce high variance, so I'm looking for something better.
Furthermore the method should be "fast" (preferrably O(n), and at least O(n*ln(n))) since everything has to be pretty general purpose and I cannot assume temporal coherence.

Are there any known/clever methods for that? (googling "clumpiness metric" wasn't very useful :D)

Did you consider to use the sweep'n'prune algorithm ? In fact objects are sorted on all three axis, so searching for an object is done on all three axis in O(log n), sums up to O(n*log n)

Did you consider to use the sweep'n'prune algorithm ? In fact objects are sorted on all three axis, so searching for an object is done on all three axis in O(log n), sums up to O(n*log n)

To be honest, I don't really know how to implement that efficiently (also in regards to memory). The only more concrete explanation I found filled a n*n matrix with bool values (overlap or not) which might be fast but is for my case unacceptable since the amount of particles can be quite excessive and allocating that matrix is completely out of the question.
The most reasonable implementation I can think of would be:

1. Sort along each axis (n log(n))
2. for each particle: (n* the following)
a. binary search on each index array and find the "neighborhood" (log(n)+m, where m is the size of the neighborhood along that axis)
b. find the intersection of the neighborhoods (m*log(m))

Is that how it is done?
If you bin your data in say 100 bins, you can compute a measure of clumpiness that is easy to understand: variance/mean (of the counts per bin). When particles are distributed randomly in the range, the counts follow a Poisson distribution, and then this measure has an expected value of 1. A lower value means the particles are distributed more uniformly than random, a higher value means they are more clumped in some places.
[EDIT: Now I see that you want something fast, so I'm on the sweep-'n-prune boat. My original post follows...]

What about Shannon entropy?

Examples:
Case A.) 100 particles located all at the same position would have 0 bits entropy.
Case B.) If you put 50 particles at one position and 50 particles at another, you'll get 1 bit entropy -- not much more than Case A.
Case C.) If you put 100 particles uniformly distributed along an interval such that they have the same variance as case B, you'll get log2(100) ~= 7 bits entropy (well, sort of; this is given assumptions we won't actually want; the point is it's more than case B.)

However, the word "entropy" by itself isn't an answer, because you're working with points in a continuous space, not symbols from a finite set, and it matters how close different points are. Here are a few ways to deal with that:
1.) bin your points and use histograms, much like alvaro's suggestion, after which you'd just compute the discrete entropy.
2.) choose a smoothing kernel (e.g., Gaussian), and then compute the differential entropy of the distribution.
3.) An unclear idea: I think rate-distortion theory has some principled ways to deal with this.

I think method #2 is fairly pretty, but it'll take some thought. After a quick google, it looks like you'd need to turn to approximation algorithms to compute the entropy of a Gaussian mixture, which seems annoying. Maybe you can just throw a simple Monte Carlo method at the differential entropy integral...

Of course, the things I'm suggesting are expensive, so if this isn't a precomputation then you'd probably be advised to do something simpler. [EDIT: Actually, I see you are looking for something fast...] So, going back to your original problem, something like sweep 'n prune seems better.
Sweep'n' prune is very fast.


1. Sort along each axis (n log(n))

The basic idea is, that your particles move relative small distances per frame, that means that you only "resort" your axis. When done with insertion sort, this could be done in O(n).


2. for each particle: (n* the following)
a. binary search on each index array and find the "neighborhood" (log(n)+m, where m is the size of the neighborhood along that axis)

The other way around. I think that you want to consider only other particles which are in a certain bounding box(AABB). In this case you need to find the upper and lower bound of your AABB on the axis which is O(6* log n) = O(log n).


b. find the intersection of the neighborhoods (m*log(m))

You interlink your axis arrays and your objects. That is all object knows its position in all three axis arrays (updated when at sorting/insertion/deletion time).
After doing the AABB bound binary search on all three axis, it is quite easy to get the intersection:


1. get the axis with the smallest number of objects between lower and upper bound.
2. for each object on this axis range do:
check if index of other axis-array is inside lower/upper bound of the according axis

Sweep'n' prune is very fast.
...


Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?
I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle, so techniques that rely on the assumptions that collisions are more the exception than the norm or that the world is generally "sparsely populated" don't apply. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition and some interaction (gravitational/coulomb-force etc.).

[quote name='Ashaman73' timestamp='1310539978' post='4834645']
Sweep'n' prune is very fast.
...


Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no "memory" and O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?
I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition.
[/quote]

Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?

With snp you end up with n^2 iterations only if you have all particles inside your searching volume (AABB), else you would take the axis with the least hits and start from there on to limit it further. With the assumption that you have less then 2^32 particles you got the following memory footprint:

array holding all unsorted particles (6-8 bytes for pointer ? )
3 axis: index of particle, 3*4 bytes
particle holding index of axis location: 3*4 bytes
that is 24 bytes if you already got a indexed particle array
else ~32 bytes per particle



I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle, so techniques that rely on the assumptions that collisions are more the exception than the norm or that the world is generally "sparsely populated" don't apply. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition and some interaction (gravitational/coulomb-force etc.).

If m is the average number of particles in the influence range you would end of with O(m*n * log n). In my opinion snp has a quite small memory footprint and it is very fast (binary search on array) even when sorting the array without benefiting from temporal coherence. But on the other hand a scientific particle simulation has other requirements :wink:

it's for scientific particle simulations on clusters. [...] a distance condition and some interaction (gravitational/coulomb-force etc.).


Have you looked at the Fast Multipole Method and similar?

This topic is closed to new replies.

Advertisement