Sign in to follow this  
coordz

[C++] remove duplicates from vector

Recommended Posts

I have some code a little like this
class Point2D {
int x,y;
};

vector<Point2D> testPoints;

// add some points......
testPoints.push_back(....);
I end up with a vector<> with many points in it with many duplicates. How can I remove these duplicates in an fast and efficient way? I could loop over the points myself comparing each point to all subsequent points to find duplicates but that seems slow to me. I don't think I can use a set<> or sort() and unique() as I can't define a meaningful less than operator for Point2D. The vector can be checked for duplicates as additional points are added or filtered afterwards. TIA.

Share this post


Link to post
Share on other sites
You don't need a meaningful less-than operator, you merely need a sound one, which defines a strict order relationship over points.

In this particular case, you have access to lexicographical ordering:

bool operator<(const point &a, const point &b)
{
return (a.x < b.x) || (a.x == b.x) && (a.y < b.y);
}

Share this post


Link to post
Share on other sites

bool less_than(const Point2D & lhs, const Point2D & rhs) {
if (lhs.x < rhs.x) return true;
if (lhs.x > rhs.x) return false;
return lhs.y < rhs.y;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by coordz
I don't think I can use a set<> or sort() and unique() as I can't define a meaningful less than operator for Point2D.

The unique algorithm uses operator==().

You could also consider using std::unique_copy().

class Point2D {
public:
bool operator==(const Point2D& rhs)
{ return x == rhs.x && y == rhs.y; }
// ...
};

// ...

std::vector<Point2D>::iterator e = std::unique(testPoints.begin(), testPoints.end());
testPoints.erase(e, testPoints.end());

Share this post


Link to post
Share on other sites
Big thank you all :) I'm now using a set<> with a less than operator which I think(?) is faster than sort() followed by unique() applied to a vector.

Share this post


Link to post
Share on other sites
Which is faster depends on your usage patterns. If you're not sure, code it both ways and use a profiler to determine which performs better. There's really no reason to guess.

Share this post


Link to post
Share on other sites
You might also find that using a hash_map is even quicker (it should be O(N)), but it's also a non-standard stl extension, and a bit more work to implement.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this