Sign in to follow this  
japro

"Clumpiness" metric

Recommended Posts

japro    887
Hi,
I have a bunch of objects/particles and need to process every pair that is closer than some interaction radius (so basically a spherical collision problem). At the moment I sort the objects along some axis to cut down the n^2 iteration. I now want to determine the optimal axis (one of the coordinate axes, not arbitrary ones). Optimally I'd get the one along which there is the least "clumping" of objects, but how do I measure the "clumpiness"?
At the moment I just calculate the variance for each axis and take the one with the highest value. But this still has the disadvantage that two clumps that are far away also produce high variance, so I'm looking for something better.
Furthermore the method should be "fast" (preferrably O(n), and at least O(n*ln(n))) since everything has to be pretty general purpose and I cannot assume temporal coherence.

Are there any known/clever methods for that? (googling "clumpiness metric" wasn't very useful :D)

Share this post


Link to post
Share on other sites
Ashaman73    13715
[quote name='japro' timestamp='1310473178' post='4834261']
Hi,
I have a bunch of objects/particles and need to process every pair that is closer than some interaction radius (so basically a spherical collision problem). At the moment I sort the objects along some axis to cut down the n^2 iteration. I now want to determine the optimal axis (one of the coordinate axes, not arbitrary ones). Optimally I'd get the one along which there is the least "clumping" of objects, but how do I measure the "clumpiness"?
At the moment I just calculate the variance for each axis and take the one with the highest value. But this still has the disadvantage that two clumps that are far away also produce high variance, so I'm looking for something better.
Furthermore the method should be "fast" (preferrably O(n), and at least O(n*ln(n))) since everything has to be pretty general purpose and I cannot assume temporal coherence.

Are there any known/clever methods for that? (googling "clumpiness metric" wasn't very useful :D)
[/quote]
Did you consider to use the [url="http://en.wikipedia.org/wiki/Sweep_and_prune"]sweep'n'prune[/url] algorithm ? In fact objects are sorted on all three axis, so searching for an object is done on all three axis in O(log n), sums up to O(n*log n)

Share this post


Link to post
Share on other sites
japro    887
[quote name='Ashaman73' timestamp='1310476325' post='4834275']
Did you consider to use the [url="http://en.wikipedia.org/wiki/Sweep_and_prune"]sweep'n'prune[/url] algorithm ? In fact objects are sorted on all three axis, so searching for an object is done on all three axis in O(log n), sums up to O(n*log n)
[/quote]
To be honest, I don't really know how to implement that efficiently (also in regards to memory). The only more concrete explanation I found filled a n*n matrix with bool values (overlap or not) which might be fast but is for my case unacceptable since the amount of particles can be quite excessive and allocating that matrix is completely out of the question.
The most reasonable implementation I can think of would be:

1. Sort along each axis (n log(n))
2. for each particle: (n* the following)
a. binary search on each index array and find the "neighborhood" (log(n)+m, where m is the size of the neighborhood along that axis)
b. find the intersection of the neighborhoods (m*log(m))

Is that how it is done?

Share this post


Link to post
Share on other sites
alvaro    21247
If you bin your data in say 100 bins, you can compute a measure of clumpiness that is easy to understand: variance/mean (of the counts per bin). When particles are distributed randomly in the range, the counts follow a Poisson distribution, and then this measure has an expected value of 1. A lower value means the particles are distributed more uniformly than random, a higher value means they are more clumped in some places.

Share this post


Link to post
Share on other sites
Emergent    982
[EDIT: Now I see that you want something fast, so I'm on the sweep-'n-prune boat. My original post follows...]

What about Shannon entropy?

Examples:
Case A.) 100 particles located all at the same position would have 0 bits entropy.
Case B.) If you put 50 particles at one position and 50 particles at another, you'll get 1 bit entropy -- not much more than Case A.
Case C.) If you put 100 particles uniformly distributed along an interval such that they have the same variance as case B, you'll get log2(100) ~= 7 bits entropy (well, sort of; this is given assumptions we won't actually want; the point is it's more than case B.)

However, the word "entropy" by itself isn't an answer, because you're working with points in a continuous space, not symbols from a finite set, and it matters how close different points are. Here are a few ways to deal with that:
1.) bin your points and use histograms, much like alvaro's suggestion, after which you'd just compute the discrete entropy.
2.) choose a smoothing kernel (e.g., Gaussian), and then compute the differential entropy of the distribution.
3.) An unclear idea: I think rate-distortion theory has some principled ways to deal with this.

I think method #2 is fairly pretty, but it'll take some thought. After a quick google, it looks like you'd need to turn to approximation algorithms to compute the entropy of a Gaussian mixture, which seems annoying. Maybe you can just throw a simple Monte Carlo method at the differential entropy integral...

Of course, the things I'm suggesting are expensive, so if this isn't a precomputation then you'd probably be advised to do something simpler. [EDIT: Actually, I see you are looking for something fast...] So, going back to your original problem, something like sweep 'n prune seems better.

Share this post


Link to post
Share on other sites
Ashaman73    13715
Sweep'n' prune is very fast.

[quote name='japro' timestamp='1310478927' post='4834294']
1. Sort along each axis (n log(n))
[/quote]
The basic idea is, that your particles move relative small distances per frame, that means that you only "resort" your axis. When done with insertion sort, this could be done in O(n).

[quote name='japro' timestamp='1310478927' post='4834294']
2. for each particle: (n* the following)
a. binary search on each index array and find the "neighborhood" (log(n)+m, where m is the size of the neighborhood along that axis)
[/quote]
The other way around. I think that you want to consider only other particles which are in a certain bounding box(AABB). In this case you need to find the upper and lower bound of your AABB on the axis which is O(6* log n) = O(log n).

[quote name='japro' timestamp='1310478927' post='4834294']
b. find the intersection of the neighborhoods (m*log(m))
[/quote]
You interlink your axis arrays and your objects. That is all object knows its position in all three axis arrays (updated when at sorting/insertion/deletion time).
After doing the AABB bound binary search on all three axis, it is quite easy to get the intersection:

[code]
1. get the axis with the smallest number of objects between lower and upper bound.
2. for each object on this axis range do:
check if index of other axis-array is inside lower/upper bound of the according axis
[/code]

Share this post


Link to post
Share on other sites
japro    887
[quote name='Ashaman73' timestamp='1310539978' post='4834645']
Sweep'n' prune is very fast.
...
[/quote]

Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?
I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle, so techniques that rely on the assumptions that collisions are more the exception than the norm or that the world is generally "sparsely populated" don't apply. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition and some interaction (gravitational/coulomb-force etc.).

Share this post


Link to post
Share on other sites
japro    887
Hidden
[quote name='japro' timestamp='1310546048' post='4834686']
[quote name='Ashaman73' timestamp='1310539978' post='4834645']
Sweep'n' prune is very fast.
...
[/quote]

Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no "memory" and O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?
I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition.
[/quote]

Share this post


Link to post
Ashaman73    13715
[quote name='japro' timestamp='1310546048' post='4834686']
Hmm, thanks for the explanation. Still there is one thing I'm kinda unsure of. At the moment I do pretty much the same thing sweep and prune does except I only consider one axis and don't build any datastructures (I cannot assume temporal coherence, so no O(n) sorting for me, also memory is an issue). Sweep and prune has to construct all the possible "candidates" along each axis at some point and then intersect those sets, right? So In case the configuration is degenerate along one axis (i.e. excessive bunching) I still end up with a potential n² iteration?
[/quote]
With snp you end up with n^2 iterations only if you have all particles inside your searching volume (AABB), else you would take the axis with the least hits and start from there on to limit it further. With the assumption that you have less then 2^32 particles you got the following memory footprint:
[code]
array holding all unsorted particles (6-8 bytes for pointer ? )
3 axis: index of particle, 3*4 bytes
particle holding index of axis location: 3*4 bytes
that is 24 bytes if you already got a indexed particle array
else ~32 bytes per particle
[/code]

[quote name='japro' timestamp='1310546048' post='4834686']
I guess I should have mentioned that this isn't for a game, it's for scientific particle simulations on clusters. So "collision" aren't resolved. And there will always be many other particles on the "interaction radius" of each particle, so techniques that rely on the assumptions that collisions are more the exception than the norm or that the world is generally "sparsely populated" don't apply. Also, there isn't any expensive narrow phase SAT or similar, only a distance condition and some interaction (gravitational/coulomb-force etc.).
[/quote]
If m is the average number of particles in the influence range you would end of with O(m*n * log n). In my opinion snp has a quite small memory footprint and it is very fast (binary search on array) even when sorting the array without benefiting from temporal coherence. But on the other hand a scientific particle simulation has other requirements :wink:

Share this post


Link to post
Share on other sites
Emergent    982
[quote name='japro' timestamp='1310546048' post='4834686']
it's for scientific particle simulations on clusters. [...] a distance condition and some interaction (gravitational/coulomb-force etc.).
[/quote]

Have you looked at the Fast Multipole Method and similar?

Share this post


Link to post
Share on other sites
japro    887
[quote name='Emergent' timestamp='1310584565' post='4834953']
[quote name='japro' timestamp='1310546048' post='4834686']
it's for scientific particle simulations on clusters. [...] a distance condition and some interaction (gravitational/coulomb-force etc.).
[/quote]

Have you looked at the Fast Multipole Method and similar?
[/quote]
Kinda. It's an existing framework that right now mostly uses particle in cell methods and my job at the moment is to implement a P^3M solver (particle particle - particle mesh). Where long range interactions are calculated by scattering the density to a mesh and solving poissons equation with an FFT method. The short range interactions are calculated directly with a n^2 iteration.

Share this post


Link to post
Share on other sites
Emergent    982
[quote name='japro' timestamp='1310590104' post='4834990']
Kinda. It's an existing framework that right now mostly uses particle in cell methods and my job at the moment is to implement a P^3M solver (particle particle - particle mesh). Where long range interactions are calculated by scattering the density to a mesh and solving poissons equation with an FFT method. The short range interactions are calculated directly with a n^2 iteration.
[/quote]

Oh, cool. Googling quickly... something like [url="http://hipacc.ucsc.edu/html/HIPACCLectures/lecture_gravity.pdf"]this[/url] ? Very interesting; I'll be looking at this some more...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this