[C++] remove duplicates from vector

Started by
6 comments, last by Adam_42 16 years, 1 month ago
I have some code a little like this

class Point2D {
int x,y;
};

vector<Point2D> testPoints;

// add some points......
testPoints.push_back(....);
I end up with a vector<> with many points in it with many duplicates. How can I remove these duplicates in an fast and efficient way? I could loop over the points myself comparing each point to all subsequent points to find duplicates but that seems slow to me. I don't think I can use a set<> or sort() and unique() as I can't define a meaningful less than operator for Point2D. The vector can be checked for duplicates as additional points are added or filtered afterwards. TIA.
Advertisement
You don't need a meaningful less-than operator, you merely need a sound one, which defines a strict order relationship over points.

In this particular case, you have access to lexicographical ordering:

bool operator<(const point &a, const point &b){  return (a.x < b.x) || (a.x == b.x) && (a.y < b.y);}
bool less_than(const Point2D & lhs, const Point2D & rhs) {  if (lhs.x < rhs.x) return true;  if (lhs.x > rhs.x) return false;  return lhs.y < rhs.y;}
Quote:Original post by coordz
I don't think I can use a set<> or sort() and unique() as I can't define a meaningful less than operator for Point2D.

The unique algorithm uses operator==().

You could also consider using std::unique_copy().
  class Point2D {  public:    bool operator==(const Point2D& rhs)    { return x == rhs.x && y == rhs.y; }  // ...  };  // ...  std::vector<Point2D>::iterator e = std::unique(testPoints.begin(), testPoints.end());  testPoints.erase(e, testPoints.end());

Stephen M. Webb
Professional Free Software Developer

unique() also requires the range to be sorted if it is to remove all duplicates.
Big thank you all :) I'm now using a set<> with a less than operator which I think(?) is faster than sort() followed by unique() applied to a vector.
Which is faster depends on your usage patterns. If you're not sure, code it both ways and use a profiler to determine which performs better. There's really no reason to guess.
You might also find that using a hash_map is even quicker (it should be O(N)), but it's also a non-standard stl extension, and a bit more work to implement.

This topic is closed to new replies.

Advertisement