Fast & Efficient Way to Find Closest Pnt

Started by
14 comments, last by thatguyfromthething 14 years, 2 months ago
Quote:Original post by thatguyfromthething

For a 2d object this is the natural way to store things anyway, in an array/bitmap/array of arrays/whatever. Like say if you have a bar graph (maybe for a graphing app). It's even kind of hard to wind up with something else to start with, or so I'd think.


You're describing the problem of 'Picking'. There are other, even hardware-supported solutions to that.
Advertisement
It's not clear whether he is. He's either describing rasterizing just the points to a grid, in which case the worst-case performance for finding the nearest point is O(n^2) in the size of the grid, or he's suggesting rasterizing the entire voronoi tessellation, which would be O(n^2) to build the grid. And, of course, either method would either take O(n^2) space to build, or use sparse structures which would make lookup performance even worse.

Either way, a bad idea overdefended. kD-trees have better performance in almost any way you'd care to measure, take less space, and there's several good premade implementations out there.
There are many things with the original code that can be optimised a lot. First thing to do though is to pass the vector by const-reference to avoid the unnecessary and expensive copy-construction of the vector.
Next, one can compare the distances squared instead of just the distances. You can do an early-out for one of the axes. I.e. if it's too far away horizontally, don't bother with the vertical distance.
We also don't need the range checking provided by .at() here because we're specifically stepping over exactly the valid range for the vector.
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.
I also threw in a little const usage for good measure.

Here is the optimised result:
POINT findNearestPnt(const vector <POINT*> &v, int mouse_x, int mouse_y) {    // Post: Finds & returns the nearest point in vector v to the mouse position    // This algorithm more efficiently searches every element of v to find the closest point    POINT nearestPoint;    const POINT *nearestPointP = &nearestPoint;    int distSqr = INT_MAX;    for (size_t i = 0; i < v.size(); i++) {        const POINT* tempP = v;        int xSqr = (mouse_x - tempP->x) * (mouse_x - tempP->x);        if (xSqr > distSqr)            continue;        int ySqr = (mouse_y - tempP->y) * (mouse_y - tempP->y);        int myDistSqr = xSqr + ySqr;        if (myDistSqr < distSqr) {            distSqr = myDistSqr;            nearestPointP = tempP;        }             }    return *nearestPointP;  // return closest point to mouse position}

Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.
I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.
Sure it's still O(n), but this version should be significantly faster than your original one. Try it first before making other more drastic changes.

I do have to question why you store pointers to points in the vector instead of just the points themselves though.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by iMalc
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.


That's a little awkward. But we can make it make sense interface-wise by returning the actual point instead of a copy of its value. Or, more generally, a handle for it.

Quote:Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.


Then you might as well write a function for the ugly part and expect it to be inlined. :) I'm doubtful that complicating the logic in order to test one dimension separately is a net win, for that matter. After all, there's an extra branch that way and I can't imagine it being terribly well-predicted. (Of course, the existing branch isn't terribly well-predicted either :) So maybe it wouldn't matter. I don't know.)

Quote:I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.


The version I propose below returns something unusable that has to be checked for. That's better IMO than replacing garbage with a "valid" but wrong value (i.e. a point at the origin). Of course, the calling code should check for this anyway.

Quote:I do have to question why you store pointers to points in the vector instead of just the points themselves though.


I expect this to make a bigger difference than many of the other changes. It costs memory, too, to store the actual pointers in addition to the POINTs.

The only reason you'd want to set up a vector of pointers like this, to something as simple as a POINT structure, would be for some kind of "index buffer" technique, where the points are potentially shared or repeated. But in that case, the points still have to be stored somewhere, and you should be searching that storage (since it doesn't have the duplicates).

Accordingly, I wrote the example assuming that this is fixed. ;)

typedef vector<POINT> points_t;int squaredDistanceBetween(const Point& a, const Point& b) {  int dx = a.x - b.x;  int dy = a.y - b.y;  return dx*dx + dy*dy;}// "mouse_" doesn't belong in parameter names here, because the function// doesn't care that the point in question came from a mouse.points_t::iterator nearestPointTo(const points_t& v, int x, int y) {  return nearestPointTo(v, POINT(x, y));}points_t::iterator nearestPointTo(const points_t& v, const Point& p) {  points_t::iterator result = v.begin(), end = v.end();  if (result == end) { return result; }  int bestSquaredDistance = squaredDistanceBetween(*result, p);  for (points_t::iterator it = ++result, end = v.end(); it != end; ++it) {    int squaredDistance = squaredDistanceBetween(*it, p);    if (squaredDistance < bestSquaredDistance) {      bestSquaredDistance = squaredDistance;      result = it;    }  }  return result;}


(And before you ask why I don't get really fancy and use std::min_element: the custom comparator supplied for that would end up evaluating the squared-distance twice for (almost) each Point.)
Quote:Original post by Zahlman
Quote:Original post by iMalc
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.


That's a little awkward. But we can make it make sense interface-wise by returning the actual point instead of a copy of its value. Or, more generally, a handle for it.
If this were a vector of POINTs rather than pointers to them then I would have stored the iterator as you have. By remembering the pointed to value instead, you can see that I was merely saving having to re-obtain the pointer again.

Quote:
Quote:Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.


Then you might as well write a function for the ugly part and expect it to be inlined. :) I'm doubtful that complicating the logic in order to test one dimension separately is a net win, for that matter. After all, there's an extra branch that way and I can't imagine it being terribly well-predicted. (Of course, the existing branch isn't terribly well-predicted either :) So maybe it wouldn't matter. I don't know.)
I really should have written a 'square' function for it. I was just being lazy I guess.

I'm actually reasonably confident that the early-out test for one axis only would give a net win, but only because I believe the early-out would be taken much more often than not. I don't have his point data sets so I can't see how likely it is of course.
Think of a square filled with randomly generated points in it from a uniform distribution, and our mouse click is in the centre. It usually wouldn't take many iterations over the randomly ordered points before we find a point that is close-ish to the centre. Say within 10% of the width of the square from the mouse click after looking at around 5 points.
Draw two vertical lines down at say 40% and 60% of the way across the square, and 80% of the points are now outside of these lines, and rejected early. Of course this is all guessing here, but my point is that I do believe that such an early-out test is still usually a win and I would be somewhat surprised if it was slower. With today's processor pipelines though I can certainly appreciate that one would be best to try it out first, and you've probably caught me applying premature optimisation there anyway.

Quote:
Quote:I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.


The version I propose below returns something unusable that has to be checked for. That's better IMO than replacing garbage with a "valid" but wrong value (i.e. a point at the origin). Of course, the calling code should check for this anyway.
I certainly agree. I was definitely trying to get the original poster to be awoken to this possibility more than I was to show what the best way to handle it was. Returning the iterator is certainly what I'd consider the best option for handling it.
It's also possible that the code calling this function has already checked for v being empty (as it may have needed to do for other reasons), in which case a simple assert that the vector is not empty might suffice, or an assert that the returned iterator is not the end().

Quote:
Quote:I do have to question why you store pointers to points in the vector instead of just the points themselves though.


I expect this to make a bigger difference than many of the other changes. It costs memory, too, to store the actual pointers in addition to the POINTs.

The only reason you'd want to set up a vector of pointers like this, to something as simple as a POINT structure, would be for some kind of "index buffer" technique, where the points are potentially shared or repeated. But in that case, the points still have to be stored somewhere, and you should be searching that storage (since it doesn't have the duplicates).
Perhaps. It's possible that there is a large buffer of all points somewhere, and several such index lists, that each point to a subset of points from within that buffer. He may only want the closest point out of one of these subsets at a time. In that case he would need to do things in much the way he has. I was (perhaps foolishly) giving him the benefit of the doubt there.

I do like the way you question everything about the way something is written though, when many of us just try and patch up what has been provided. I really hope people appreciate the extra effort taken to do so.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by Antheus
You're describing the problem of 'Picking'. There are other, even hardware-supported solutions to that.


Why yes, I am. Which is the problem at hand here. "I have a mouse and it clicks a point on a 2d graph what is the closest point to my mouse click". And the responses by everyone but me are to have a tree.

Quote:Original post by Sneftel
It's not clear whether he is. He's either describing rasterizing just the points to a grid, in which case the worst-case performance for finding the nearest point is O(n^2) in the size of the grid, or he's suggesting rasterizing the entire voronoi tessellation, which would be O(n^2) to build the grid. And, of course, either method would either take O(n^2) space to build, or use sparse structures which would make lookup performance even worse.

Either way, a bad idea overdefended.

That's what I keep thinking, too.

Quote:
kD-trees have better performance in almost any way you'd care to measure, take less space, and there's several good premade implementations out there.

Eh, what?

voronoi tesselation? Are you just pulling random terms out of nowhere? A grid is just a grid, like an array.

And regardless of dimension, building grid is nlogn. That is, sorting. Performance is not the big issue, it's just crazy overkill for this problem at hand, but a tree with one or more allocations per node is not going to have better performance than a flat structure, especially not a flat structure with o(1) lookup and a tree with at best o(log n) lookup just to get the APPROXIMATE nearest neighbor.

This is my thread. There are many threads like it, but this one is mine.

This topic is closed to new replies.

Advertisement