Sign in to follow this  
gretty

Fast & Efficient Way to Find Closest Pnt

Recommended Posts

Hello I am trying to write/find a fast & efficient way to find the closest POINT to the mouse position in my graph application. I have an iterative method but to me it seems too inefficient & will be slow for lrg numbers. I have thought of a binary search but I am not sure it will work when I am trying to find the Closest Point to the mouse position, I know it will work if I am searching to see if the Exact Mouse Point is in the vector but not for what I am trying to do. Any suggestions for a fast way to do what I am doing? Here is my iterative method & the POINTS stored in the vector are not sorted or anything either:
POINT findNearestPnt(vector <POINT*> v, int mouse_x, int mouse_y) {
      
      // Post: Finds & returns the nearest point in vector v to the mouse position
      // This algorithm searches EVERY elements of v to find the closest pnt which isn't very 
      // efficient do you know of a faster way to do what I am trying to do?
      
      POINT nearestPoint;
      int dist = INT_MAX;
      
      for (int i=0; i<v.size(); i++) {
          
          POINT* tempP = v.at(i);
          int myDist = pointDistance(tempP->x,tempP->y,mouse_x,mouse_y); // finds distance tween pnts 
          
          if (myDist < dist) {
             dist = myDist;
             nearestPnt.x = tempP->x;
             nearestPnt.y = tempP->y;
          }         
      }
      
      return nearestPnt;  // return closest pnt to mouse position
}

Share this post


Link to post
Share on other sites
This is known as Nearest Neighbor search. Standard approach is a kD-tree, which takes O(n log n) time to build the structure, and O(log n) to query. If you cannot build a structure ahead of time, but have points sorted along an axis, you can use binary search to find an initial guess, and then use that guess as a bound on the interval to linearly search.

Share this post


Link to post
Share on other sites
Thanks, do I need a special library to use a NNS or is it like a binary search where I just write it myself or use algorithm library?

Any other suggestions of searching methods would be helpful also :)

Share this post


Link to post
Share on other sites
Quote:
Original post by gretty
Since this is in 2d it is really overkill to use a tree.

Heh, what? Spatial partitioning trees are intended specifically for, and only useful in, lower-dimensional situations. (Once you get into the hundreds of dimensions, they don't work so well.)

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
Since this is in 2d it is really overkill to use a tree. You should be able to get a bounds for the graph and calculate an offset into it. If you don't have the points in any sort of order, well, you're going to have to for any algorithm you use.


It's not overkill if the OP is talking about a large number of points.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by gretty
Since this is in 2d it is really overkill to use a tree.

Heh, what? Spatial partitioning trees are intended specifically for, and only useful in, lower-dimensional situations. (Once you get into the hundreds of dimensions, they don't work so well.)


A tree requires sorting; it could even be called a sorting. So why not just sort in the first place and avoid need for a tree? If it is dynamic somehow, like it rotates on the screen, a tree is necessary. If not it's overkill.

Quote:
Original post by jwezorek

It's not overkill if the OP is talking about a large number of points.


It's actually more overkill, because then you have a tree with a large number of branches.

If this is a simple 2d object drawn on the screen that doesn't rotate, really it's unnecessary. I find it hard to even see how someone could wind up with a bunch of random 'points' that stored unordered in a vector. But maybe I am misreading problem or it was misstated.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
A tree requires sorting; it could even be called a sorting. So why not just sort in the first place and avoid need for a tree? If it is dynamic somehow, like it rotates on the screen, a tree is necessary. If not it's overkill.
Sorting doesn't address the problem. A sorted list of points will be sorted along one dimension, allowing you to find the closest point to a given point along that dimension, but that's not what the OP was asking for.

You should take a look at this article and see how a spatial partitioning tree differs from a linear sorting.

[Edited by - Sneftel on February 11, 2010 8:16:43 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by thatguyfromthething
A tree requires sorting; it could even be called a sorting. So why not just sort in the first place and avoid need for a tree? If it is dynamic somehow, like it rotates on the screen, a tree is necessary. If not it's overkill.
Sorting doesn't address the problem. A sorted list of points will be sorted along one dimension, allowing you to find the closest point to a given point along that dimension, but that's not what the OP was asking for.

You should take a look at this article and see how a spatial partitioning tree differs from a linear sorting.


I have some idea of nearest neighbor problem and trees already (pretend for a minute I once wrote some graphics processing app or something). For sorting you have a concept of postman's sort and index.

For a 2d object this is the natural way to store things anyway, in an array/bitmap/array of arrays/whatever. Like say if you have a bar graph (maybe for a graphing app). It's even kind of hard to wind up with something else to start with, or so I'd think.

I guess for pie chart it might seem wasteful but that's how it is done in say like a picture or something, or any kind of 2d anything that makes much sense. Which is why I wonder about the vector of points thing. Not that I'm judging, and maybe there's some good reason for that.

But anyway it's like a big array. So once it's sorted you just look up the answer directly. If it's a pie chart instead of a bar graph, you probably seeth in rage at the very idea of the wasted array space but the array might have something crazy like run length encoding to deal with that, and it's still an order of magnitude less resource intensive than 99% of all tree implementations.

In fact this works pretty good for 3d, too, and will work quickly on meshes that would be impossible to put in a standard tree due to sheer size. But the problem is when you change perspectives. Then you need to have something like a tree to deal with that.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething

For a 2d object this is the natural way to store things anyway, in an array/bitmap/array of arrays/whatever. Like say if you have a bar graph (maybe for a graphing app). It's even kind of hard to wind up with something else to start with, or so I'd think.


You're describing the problem of 'Picking'. There are other, even hardware-supported solutions to that.

Share this post


Link to post
Share on other sites
It's not clear whether he is. He's either describing rasterizing just the points to a grid, in which case the worst-case performance for finding the nearest point is O(n^2) in the size of the grid, or he's suggesting rasterizing the entire voronoi tessellation, which would be O(n^2) to build the grid. And, of course, either method would either take O(n^2) space to build, or use sparse structures which would make lookup performance even worse.

Either way, a bad idea overdefended. kD-trees have better performance in almost any way you'd care to measure, take less space, and there's several good premade implementations out there.

Share this post


Link to post
Share on other sites
There are many things with the original code that can be optimised a lot. First thing to do though is to pass the vector by const-reference to avoid the unnecessary and expensive copy-construction of the vector.
Next, one can compare the distances squared instead of just the distances. You can do an early-out for one of the axes. I.e. if it's too far away horizontally, don't bother with the vertical distance.
We also don't need the range checking provided by .at() here because we're specifically stepping over exactly the valid range for the vector.
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.
I also threw in a little const usage for good measure.

Here is the optimised result:
POINT findNearestPnt(const vector <POINT*> &v, int mouse_x, int mouse_y) {
// Post: Finds & returns the nearest point in vector v to the mouse position
// This algorithm more efficiently searches every element of v to find the closest point

POINT nearestPoint;
const POINT *nearestPointP = &nearestPoint;
int distSqr = INT_MAX;

for (size_t i = 0; i < v.size(); i++) {
const POINT* tempP = v[i];
int xSqr = (mouse_x - tempP->x) * (mouse_x - tempP->x);
if (xSqr > distSqr)
continue;
int ySqr = (mouse_y - tempP->y) * (mouse_y - tempP->y);
int myDistSqr = xSqr + ySqr;
if (myDistSqr < distSqr) {
distSqr = myDistSqr;
nearestPointP = tempP;
}
}

return *nearestPointP; // return closest point to mouse position
}

Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.
I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.
Sure it's still O(n), but this version should be significantly faster than your original one. Try it first before making other more drastic changes.

I do have to question why you store pointers to points in the vector instead of just the points themselves though.

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.


That's a little awkward. But we can make it make sense interface-wise by returning the actual point instead of a copy of its value. Or, more generally, a handle for it.

Quote:
Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.


Then you might as well write a function for the ugly part and expect it to be inlined. :) I'm doubtful that complicating the logic in order to test one dimension separately is a net win, for that matter. After all, there's an extra branch that way and I can't imagine it being terribly well-predicted. (Of course, the existing branch isn't terribly well-predicted either :) So maybe it wouldn't matter. I don't know.)

Quote:
I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.


The version I propose below returns something unusable that has to be checked for. That's better IMO than replacing garbage with a "valid" but wrong value (i.e. a point at the origin). Of course, the calling code should check for this anyway.

Quote:
I do have to question why you store pointers to points in the vector instead of just the points themselves though.


I expect this to make a bigger difference than many of the other changes. It costs memory, too, to store the actual pointers in addition to the POINTs.

The only reason you'd want to set up a vector of pointers like this, to something as simple as a POINT structure, would be for some kind of "index buffer" technique, where the points are potentially shared or repeated. But in that case, the points still have to be stored somewhere, and you should be searching that storage (since it doesn't have the duplicates).

Accordingly, I wrote the example assuming that this is fixed. ;)


typedef vector<POINT> points_t;

int squaredDistanceBetween(const Point& a, const Point& b) {
int dx = a.x - b.x;
int dy = a.y - b.y;
return dx*dx + dy*dy;
}

// "mouse_" doesn't belong in parameter names here, because the function
// doesn't care that the point in question came from a mouse.
points_t::iterator nearestPointTo(const points_t& v, int x, int y) {
return nearestPointTo(v, POINT(x, y));
}

points_t::iterator nearestPointTo(const points_t& v, const Point& p) {
points_t::iterator result = v.begin(), end = v.end();
if (result == end) { return result; }

int bestSquaredDistance = squaredDistanceBetween(*result, p);
for (points_t::iterator it = ++result, end = v.end(); it != end; ++it) {
int squaredDistance = squaredDistanceBetween(*it, p);
if (squaredDistance < bestSquaredDistance) {
bestSquaredDistance = squaredDistance;
result = it;
}
}
return result;
}



(And before you ask why I don't get really fancy and use std::min_element: the custom comparator supplied for that would end up evaluating the squared-distance twice for (almost) each Point.)

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by iMalc
Next, reduce the work done each time we find a point closer than the last one. Instead of dereferencing a pointer twice over two lines of code, I just remember the pointer.


That's a little awkward. But we can make it make sense interface-wise by returning the actual point instead of a copy of its value. Or, more generally, a handle for it.
If this were a vector of POINTs rather than pointers to them then I would have stored the iterator as you have. By remembering the pointed to value instead, you can see that I was merely saving having to re-obtain the pointer again.

Quote:
Quote:
Note that I'm expecting the compiler to perform comon sub-expression elimination for the xSqr and ySqr calculations to avoid the double-dereferences. If not, that can be taken care of manually.


Then you might as well write a function for the ugly part and expect it to be inlined. :) I'm doubtful that complicating the logic in order to test one dimension separately is a net win, for that matter. After all, there's an extra branch that way and I can't imagine it being terribly well-predicted. (Of course, the existing branch isn't terribly well-predicted either :) So maybe it wouldn't matter. I don't know.)
I really should have written a 'square' function for it. I was just being lazy I guess.

I'm actually reasonably confident that the early-out test for one axis only would give a net win, but only because I believe the early-out would be taken much more often than not. I don't have his point data sets so I can't see how likely it is of course.
Think of a square filled with randomly generated points in it from a uniform distribution, and our mouse click is in the centre. It usually wouldn't take many iterations over the randomly ordered points before we find a point that is close-ish to the centre. Say within 10% of the width of the square from the mouse click after looking at around 5 points.
Draw two vertical lines down at say 40% and 60% of the way across the square, and 80% of the points are now outside of these lines, and rejected early. Of course this is all guessing here, but my point is that I do believe that such an early-out test is still usually a win and I would be somewhat surprised if it was slower. With today's processor pipelines though I can certainly appreciate that one would be best to try it out first, and you've probably caught me applying premature optimisation there anyway.

Quote:
Quote:
I hope POINT has an appropriate default constructor because otherwise this function (your version or mine) returns garbage when the vector is empty.


The version I propose below returns something unusable that has to be checked for. That's better IMO than replacing garbage with a "valid" but wrong value (i.e. a point at the origin). Of course, the calling code should check for this anyway.
I certainly agree. I was definitely trying to get the original poster to be awoken to this possibility more than I was to show what the best way to handle it was. Returning the iterator is certainly what I'd consider the best option for handling it.
It's also possible that the code calling this function has already checked for v being empty (as it may have needed to do for other reasons), in which case a simple assert that the vector is not empty might suffice, or an assert that the returned iterator is not the end().

Quote:
Quote:
I do have to question why you store pointers to points in the vector instead of just the points themselves though.


I expect this to make a bigger difference than many of the other changes. It costs memory, too, to store the actual pointers in addition to the POINTs.

The only reason you'd want to set up a vector of pointers like this, to something as simple as a POINT structure, would be for some kind of "index buffer" technique, where the points are potentially shared or repeated. But in that case, the points still have to be stored somewhere, and you should be searching that storage (since it doesn't have the duplicates).
Perhaps. It's possible that there is a large buffer of all points somewhere, and several such index lists, that each point to a subset of points from within that buffer. He may only want the closest point out of one of these subsets at a time. In that case he would need to do things in much the way he has. I was (perhaps foolishly) giving him the benefit of the doubt there.

I do like the way you question everything about the way something is written though, when many of us just try and patch up what has been provided. I really hope people appreciate the extra effort taken to do so.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
You're describing the problem of 'Picking'. There are other, even hardware-supported solutions to that.


Why yes, I am. Which is the problem at hand here. "I have a mouse and it clicks a point on a 2d graph what is the closest point to my mouse click". And the responses by everyone but me are to have a tree.

Quote:
Original post by Sneftel
It's not clear whether he is. He's either describing rasterizing just the points to a grid, in which case the worst-case performance for finding the nearest point is O(n^2) in the size of the grid, or he's suggesting rasterizing the entire voronoi tessellation, which would be O(n^2) to build the grid. And, of course, either method would either take O(n^2) space to build, or use sparse structures which would make lookup performance even worse.

Either way, a bad idea overdefended.

That's what I keep thinking, too.

Quote:

kD-trees have better performance in almost any way you'd care to measure, take less space, and there's several good premade implementations out there.

Eh, what?

voronoi tesselation? Are you just pulling random terms out of nowhere? A grid is just a grid, like an array.

And regardless of dimension, building grid is nlogn. That is, sorting. Performance is not the big issue, it's just crazy overkill for this problem at hand, but a tree with one or more allocations per node is not going to have better performance than a flat structure, especially not a flat structure with o(1) lookup and a tree with at best o(log n) lookup just to get the APPROXIMATE nearest neighbor.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this