• 15
• 15
• 11
• 9
• 10

kd-tree building

This topic is 4589 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I was just wondering if anyone has any ideas for improving the performance of my kd-tree building method. I think I may be able to avoid sorting every time, but I'm not sure: Node* buildkdtree(vector<Point> P, int depth) { if (P.size()==0) return NULL; if (P.size()==1) return new Node(P[0]); Point L; vector<Point> P1, P2; if (depth%2 == 0) { // sort by x sortmethod=0; sort(P.begin(), P.end()); int n = P.size(); int medianindex = (n-1)/2; L = P[medianindex]; vector<Point>::iterator i = P.begin(); while (i != P.end()) { Point curp = *i++; if (curp <= L) P1.push_back(curp); else P2.push_back(curp); } } else { // sort by y sortmethod=1; sort(P.begin(), P.end()); int n = P.size(); int medianindex = (n-1)/2; L = P[medianindex]; vector<Point>::iterator i = P.begin(); while (i != P.end()) { Point curp = *i++; if (curp <= L) P1.push_back(curp); else P2.push_back(curp); } } Node *v = new Node(L); v->left = buildkdtree(P1, depth+1); v->right = buildkdtree(P2, depth+1); return v; } Here are some related classes: int sortmethod = 0; // sort by x struct Point { float x, y; bool operator<(Point &p2) { if (sortmethod == 0) { if (x != p2.x) return (x < p2.x); else return (y < p2.y); } else { if (y != p2.y) return (y < p2.y); else return (x < p2.x); } } bool operator==(Point &p2) { return (x == p2.x && y == p2.y); } bool operator<=(Point &p2) { return ((*this < p2) || (*this == p2)); } Point(float x, float y) { this->x = x; this->y = y; } Point() { } }; mike http://www.coolgroups.com/

Share on other sites
Actually, here's a less convoluted building method:

Node* buildkdtree(vector<Point> P, int depth)
{
if (P.size()==0) return NULL;

if (P.size()==1) return new Node(P[0]);

Point L;
vector<Point> P1, P2;

sortmethod=depth%2;
sort(P.begin(), P.end());
int n = P.size();
int medianindex = (n-1)/2;
L = P[medianindex];

vector<Point>::iterator i = P.begin();
while (i != P.end())
{
Point curp = *i++;
if (curp <= L) P1.push_back(curp);
else P2.push_back(curp);
}

Node *v = new Node(L);
v->left = buildkdtree(P1, depth+1);
v->right = buildkdtree(P2, depth+1);

return v;
}

mike
http://www.coolgroups.com/

Share on other sites
Short answer: no, I don't think you can avoid sorting every time, since presumably the positions of the points on each axis are uncorrelated. One thing though - I really wouldn't use a global variable to control sorting direction (that's what it looks like you're doing). I'd do this:

if (depth%2 == 0) std::sort(P.begin(), P.end(), PointSortX());else std::sort(P.begin(), P.end(), PointSortY());

with:

struct PointSortX{ bool operator () (Point& p1, Point &p2) {  return p1.x < p2.x; }};struct PointSortY{ bool operator () (Point& p1, Point &p2) {  return p1.y < p2.y }};

Share on other sites
You can sprt once for each axis at the beginning and keep, instead of one sorted list period, one sorted list for each axis. Each primitive must contain its index in the lists. That will allow you, if you are careful when you perfomr the split, to maintain sorted lists at eah level so as to avoid sorting each time.

Moreover, median split is NOT the best split method ; it's actually far from it. Read the "balancing considered harmful" paper on CiteSeer. It's from Ingo wal I believe.

Share on other sites
Thanks for the tips. BTW, janos, can you give me some quick intuition behind why splitting by the median isn't so good? Intuitively, it seems like the best way to me.

mike
http://www.coolgroups.com/

Share on other sites
well, it really depends what you want to do with a kd tree. but if you want to find close by neighbors, or if you want to raytrace, then median split is bad. here's why :
imagine you plan to sample your scene (by sending rays, for example), you are going to have to travel through your kdtree. You are going to visit mostly empty space, and in the end, maybe hit a surface. So if your empty space is consolidated (eg : you have empty nodes close to the root of the tree), then travelling through that empty space is going to be fast.

Now imagine you have a median split algorithm that you run on a dataset composed of two polygonal spheres (the more polygons the worse:) far away from each other.
Now draw what your algorithm will do. And look at what happens with all that empty space between the spheres : it keeps getting split. you end up with leaves that can be very long, with just a few primitives on one side. when you hit those boxes, you are going to have to test all the primitives in the box.

I can make you a drawing if it still does not sound clear.

J

Share on other sites
Okay. Also, can you explain a little more about sorting by x and y in the beginning? I realize this will theoretically work, but there seem to be a lot of details that complicate things.

For instance, let's say you're doing the first step, which is dividing the list based on x coordinates. How exactly do you divide it into two lists sorted by y coordinates? One way might be to delete the unused half from the sorted y list element by element, but is this the best way?

mike
http://www.coolgroups.com/

Share on other sites
Try this:
use D sorted lists, one for each coord (or dimension). : Lx, Ly, Lz, L...
Each list will be filled by your N primitives.
Create a storage that maps (somehow, there are many ways) each primitive with a integer vector of dimension D, that has, for every coordinate, the position in the list associated with the coordinate.
eg, for your ith primitive, you have a vector Vi(Px, Py, Pz, ...)
where Px is the position if the primitive i in the list Lx and so forth.

now consider you splitting recursive function. it will take, among other, two params, that will be startPrimitiveIndex and primitiveCount.

At the first level, you will call

RecursiveSplit(0, PrimitiveCount, [Lx, Ly, Lz])

The function will look like
RecursiveSplit(startPrimitiveIndex, primitiveCount, sortedListArray){   if (primitiveCount = 0) return;   (splitIndex, splitDimension) = FindSplitIndex(startPrimitiveIndex, primitiveCount)   Ls = sortedListArray[splitDimension]   for (d = 0; d < DIMENSION_COUNT; ++d)   {      if (d != splitDimension)       {         L[d].StablePartition(startPrimitiveIndex, splitIndex, primitiveCount}         // this function will perfom the following, it will spplit L[d] such that the left part of L[d] is sorted and matches the indices startPrimitiveIndex..splitIndex for splitDimension, and the right part is also sorted, and matches the indices splitIndex+1 .. startPrimitiveIndex+primitiveCount-1      }   }   RecursiveSplit(startPrimitiveIndex, splitIndex-startPrimitiveIndex, sortedListArray)   RecursiveSplit(splitIndex, startPrimitiveIndex+primitiveCount-splitIndex, sortedListArray)}

Stable Partition is O(n), as you will find out

Share on other sites
The whole trick is to maintain at one recursion level,
for the current range [startPrimitiveIndex, startPrimitiveIndex+PrimitiveCount-1]
the D lists sorted, and pointing toward the same subset of elements.
That will yield a O(PrimCount*ln(PrimCount)) algo instead of a O(PrimCount*ln(PrimCount)^2)

Share on other sites
Ok, I made the changes, and it seems to work. It's a little faster, but not drastically.

Can you take a look and see if I did it right? If you have any other speed improvements, I'd like to hear them.

Thanks.

Node* buildkdtree2main(const vector<Point> &P)
{
if (P.size()==0) return NULL;
if (P.size()==1) return new Node(P[0]);
vector<Point> xsorted(P);
sort(xsorted.begin(), xsorted.end(), PointSortX());
vector<Point> ysorted(P);
sort(ysorted.begin(), ysorted.end(), PointSortY());
return buildkdtree2(xsorted, ysorted, 0);
}

// depth is 0 for an x level
// depth is 1 for a y level
Node* buildkdtree2(vector<Point> &xsorted, vector<Point> &ysorted, int depth)
{
if (xsorted.size()==0) return NULL;
if (xsorted.size()==1) return new Node(xsorted[0]);

Point L;

int n = xsorted.size();
int medianindex = (n-1)/2;

if (!depth) {
L = xsorted[medianindex];
vector<Point> P1newxsorted(xsorted.begin(), xsorted.begin()+medianindex+1);
vector<Point> P2newxsorted(xsorted.begin()+medianindex+1, xsorted.end());
vector<Point> P1newysorted, P2newysorted;
vector<Point>::iterator i = ysorted.begin();
while (i != ysorted.end())
{
Point p = *i++;
if (lessthanx(p, L) || p == L) P1newysorted.push_back(p);
else P2newysorted.push_back(p);
}

Node *v = new Node(L);
v->left = buildkdtree2(P1newxsorted, P1newysorted, !depth);
v->right = buildkdtree2(P2newxsorted, P2newysorted, !depth);
return v;
}

else {
L = ysorted[medianindex];

vector<Point> P1newysorted(ysorted.begin(), ysorted.begin()+medianindex+1);
vector<Point> P2newysorted(ysorted.begin()+medianindex+1, ysorted.end());
vector<Point> P1newxsorted, P2newxsorted;

vector<Point>::iterator i = xsorted.begin();
while (i != xsorted.end())
{
Point p = *i++;
if (lessthany(p, L) || p == L) P1newxsorted.push_back(p);
else P2newxsorted.push_back(p);
}
Node *v = new Node(L);
v->left = buildkdtree2(P1newxsorted, P1newysorted, !depth);
v->right = buildkdtree2(P2newxsorted, P2newysorted, !depth);
return v;
}
}

mike
http://www.coolgroups.com/