Selecting evenly distributed samples in a 10-dimensional space?

Started by
16 comments, last by glaskows 18 years ago
Hi all, I'm sure this shouldn't be too hard, but for whatever reason my brain just isn't clicking today [headshake] I have a 10 component vector where each element is a floating point value ±X (X could be 1.0, 3.0, 10.0, 500.0 etc..) I want to pick N evenly distributed samples as a seed for a searching algorithm. The idea being that if they're evenly distributed then I should have a fairly unbiased search that can cover most (or all) of the search space. In context, I think my current randomly sampled approach introduces a bias if/when it clusters the initial samples too close together. Anyone got any ideas? Cheers, Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Advertisement
Assume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet

Quote:Original post by jjd
Assume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.
Thats pretty much what I have already. I'm using a Mersenne Twister for PRNG's, so it should be uniformly distributed.

As an example of what I'm thinking...

If you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

#-------#| + + + || + + + || + + + |#-------#


It's expanding that concept to N samples over a 10D space [smile]

Cheers,
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Quote:Original post by jollyjeffers
Quote:Original post by jjd
Assume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.
Thats pretty much what I have already. I'm using a Mersenne Twister for PRNG's, so it should be uniformly distributed.

As an example of what I'm thinking...

If you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

#-------#| + + + || + + + || + + + |#-------#


It's expanding that concept to N samples over a 10D space [smile]

Cheers,
Jack



Ah, I see. Well, that's trickier [wink]

Perhaps you could use a Sobol sequence.

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet

Quote:Original post by jjd
Ah, I see. Well, that's trickier [wink]
[headshake] Why doesn't that surprise me!

Although, based on your first reply, would it be a reasonable assertion that for good quality uniformly distributed numbers that you would get the pattern I desire?

I'm just wondering if using the Sobol Sequence (or whatever else) might actually be a whole lot of effort for minimal reward...

Cheers,
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

A Sobol sequence would be sexy... [smile] but my first port of call would be the simple random search. Having said that, you might be able to improve on a simple random search like that by using adaptive monte carlo or markov-chain approaches. They're a bit more complicated, but not much. However, it depends upon whether you can get some kind of likelihood function to assist in the search, i.e. some measure of whether the thing you are searching for is likely to be closer or further away.

What are you trying to compute?

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet

Quote:Original post by jollyjeffers
If you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

#-------#| + + + || + + + || + + + |#-------#


It's expanding that concept to N samples over a 10D space [smile]


That concept in reverse is like saying "I want 3 samples on each axis, and to fill a 2-dimensional space". The number of samples you need is 3^2 = 9.

If you have a 10-dimensional space, and you want 3-samples on each axis, you'll need 3^10 = 59049 samples in total.

Going the other way, from a number of samples to where to place them, is obviously much harder, but the dificulty does not lie in the fact that you are dealing with a space of greater dimensionality. What would you do if, to copy your earlier example - "you had a two-dimensional search space and wanted 7 evenly distributed samples".

Put another way, is there any flexibility on the number of samples? If so, you could neep a number of "patterns" and choose that which had a number of samples closest to that requested. If you are wanting a solution that will work for any arbitrary number of samples, I beleive the problem is too general - your best bet will almost certainly be sticking to random samples.
Quote:Original post by jollyjeffers
In context, I think my current randomly sampled approach introduces a bias if/when it clusters the initial samples too close together.


If you are sampling an area for surface normal construction data, one method is to use a sphere of radius r in Rn as the guide for your sample point distribution guide.

The most basic sample would be a cross-shaped grid in Rn (the same as used for linear interpolation texture sampling).

Sample points in Rn (based at origin):
( 0,  0,  0,  0) // centre point( 1,  0,  0,  0) // R1(-1,  0,  0,  0)( 0,  1,  0,  0) // R2( 0, -1,  0,  0)( 0,  0,  1,  0) // R3( 0,  0, -1,  0)( 0,  0,  0,  1) // R4( 0,  0,  0, -1)...(0, 0, 0, 0, 0, 0, 0, 0, 0,  1) // R10(0, 0, 0, 0, 0, 0, 0, 0, 0, -1)


Uniformly distributing points along a sphere surface / inside a sphere volume is a sticky problem which involves stochastic methods of creation.

An intermediate step between the cross structure and the stochastic methods would be to use the vertices from a unit icosahedron in Rn, radius scaled to r. The distribution is perfect.

To take the icosahedron from a 2D surface in R3 to a 3D space in R3, one can make a duplicate set of vertices and scale them to, say, r/2. Any number of layers (hulls) can be used.

Linear distribution of the hull radii isn't necessary either. A weighting function can be used. Where r <= 1.0, and 0.0 < scale_coeff <= 1.0; f(r, scale_coeff) = r*sqrt(scale_coeff)

[Edited by - taby on April 19, 2006 11:58:36 AM]
Quote:Original post by bakery2k1
...the dificulty does not lie in the fact that you are dealing with a space of greater dimensionality.


Conceptually I would agree, but performance-wise it is better to use random sampling that creating a grid-like pattern. This is the so-called "curse or dimensionality."

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet

Perhaps I'm missing something, but doesn't selecting N evenly distributed samples in a 10D space means that
n := ⌊N1/10⌋
samples per dimension are available if all dimensions show the same range and N is a upper bound, or
n := ⌈N1/10⌉
if N is an lower bound?

Thinking of each sample to "cover" the same area, the range
R
of each dimension is to be divided up into just n slices, so that each sample is in the center of that slice:
si := ( 0.5 + i ) R / n, 0 <= i < n

(That would become interesting if the ranges of each dimension are not the same...)

Quote:Original post by jjd
Conceptually I would agree, but performance-wise it is better to use random sampling that creating a grid-like pattern. This is the so-called "curse or dimensionality."

I've understood the OP asking for a grid due to:
Quote:Original post by jollyjeffers
The idea being that if they're evenly distributed then I should have a fairly unbiased search that can cover most (or all) of the search space.

In context, I think my current randomly sampled approach introduces a bias if/when it clusters the initial samples too close together.

This topic is closed to new replies.

Advertisement