• Advertisement
Sign in to follow this  

C++ vector: removing duplicates

This topic is 2255 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, removing duplicates works fine with a simple vector int:

std::vector<int> vec;
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());


But I am not able to figure out how to do the same when you have a vector pointing to a structure, like this:

std::vector<myObjects::myObjectLib> vec;
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());

typedef struct {
int ID;
float X;
float Y;
float Z;
} myObjectLib;

I get several compilation errors like this:
/Developer/SDKs/MacOSX10.5.sdk/usr/include/c++/4.0.0/bits/stl_heap.h:216:0 /Developer/SDKs/MacOSX10.5.sdk/usr/include/c++/4.0.0/bits/stl_heap.h:216: error: no match for 'operator<' in '__first. __gnu_cxx::__normal_iterator<_Iterator, _Container>::operator+ [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >](((const ptrdiff_t&)((const ptrdiff_t*)(& __secondChild)))).__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >]() < __first. __gnu_cxx::__normal_iterator<_Iterator, _Container>::operator+ [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >](((const ptrdiff_t&)((const ptrdiff_t*)(&(__secondChild - 1l))))).__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >]()'

Share this post


Link to post
Share on other sites
Advertisement

Hi, removing duplicates works fine with a simple vector int:

std::vector<int> vec;
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());


But I am not able to figure out how to do the same when you have a vector pointing to a structure, like this:

std::vector<myObjects::myObjectLib> vec;
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());

typedef struct {
int ID;
float X;
float Y;
float Z;
} myObjectLib;

I get several compilation errors like this:
/Developer/SDKs/MacOSX10.5.sdk/usr/include/c++/4.0.0/bits/stl_heap.h:216:0 /Developer/SDKs/MacOSX10.5.sdk/usr/include/c++/4.0.0/bits/stl_heap.h:216: error: no match for 'operator<' in '__first. __gnu_cxx::__normal_iterator<_Iterator, _Container>::operator+ [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >](((const ptrdiff_t&)((const ptrdiff_t*)(& __secondChild)))).__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >]() < __first. __gnu_cxx::__normal_iterator<_Iterator, _Container>::operator+ [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >](((const ptrdiff_t&)((const ptrdiff_t*)(&(__secondChild - 1l))))).__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = myObjects::myObjectLib*, _Container = std::vector<myObjects::myObjectLib, std::allocator<myObjects::myObjectLib> >]()'


You need to specifiy a way to sort your objects.

try:


bool customCompare(myObject::myObjectLib obj1, myObject::myObjectLib obj2) {
return obj1.ID<obj2.ID;
}

std::sort(vec.begin(), vec.end(),customCompare);



If you want the sorting to work differently you can change the comparison function.

Share this post


Link to post
Share on other sites
std::sort() requires that operator< be defined your type or that you pass it a comparison function. std::unique() likewise requires that operator== be defined your type or that you pass it a comparison function. One admissible set of functions would be:

bool operator<(const myObjectLib & lhs, const myObjectLib & rhs) {
if (lhs.ID < rhs.ID) return true;
if (lhs.ID > rhs.ID) return false;
if (lhs.X < rhs.X) return true;
if (lhs.X > rhs.X) return false;
if (lhs.Y < rhs.Y) return true;
if (lhs.Y > rhs.Y) return false;
return lhs.Z < rhs.Z;
}

bool operator==(const myObjectLib & lhs, const myObjectLib & rhs) {
return (lhs.ID == rhs.ID) && (lhs.X == rhs.X) && (lhs.Y == rhs.Y) && (lhs.Z == rhs.Z);
}

However, given that your structure contains floating point variables this may or may not do what you want or expect.

Share this post


Link to post
Share on other sites
[size="5"]EDIT: no, that doesn't work, I tried with a larger number of items and duplicates are still there. I suppose I did something wrong... :angry:


Thanks for your suggestions, that worked just fine:

bool customCompare(myObjects::myObjectLib obj1, myObjects::myObjectLib obj2){
if (obj1.X==obj2.X && obj1.Y==obj2.Y && obj1.Z==obj2.Z) return true; //I just need to compare X,Y,Z
else return false;
}

bool operator==(const myObjects::myObjectLib & lhs, const myObjects::myObjectLib & rhs) {
return (lhs.X == rhs.X) && (lhs.Y == rhs.Y) && (lhs.Z == rhs.Z);
}


In the main() I call:

std::sort(myObject.begin(), myObject.end(),customCompare);
myObject.erase(std::unique(myObject.begin(), myObject.end()), myObject.end());


When I debug the output, the duplicates are indeed removed.

Share this post


Link to post
Share on other sites
What I don't get is: do I need to iterate from 0 to vector.size()-1 and run the functions:
std::sort(myObject.begin(), myObject.end(),customCompare);
myObject.erase(std::unique(myObject.begin(), myObject.end()), myObject.end());Or I just need to call those once?

Share this post


Link to post
Share on other sites
Your sort comparison function isn't a strict weak ordering. One of the properties that std::sort() expects from a comparison function is that if comp(a, b) then !comp(b, a). Your function doesn't satisfy this condition. Take a look at the operator< function that I posted.

Share this post


Link to post
Share on other sites
Ok, I see the point but the customCompare function just calls the == operator to check if:

obj1.X==obj2.X && obj1.Y==obj2.Y && obj1.Z==obj2.Z

How do I go from there to using the < operator ?

Share this post


Link to post
Share on other sites
Ok, now it works: I misunderstood the usage of the customCompare function. Sicrane wrote "std::unique() likewise requires that operator== be defined your type or that you pass it a comparison function". I was using, wrongly, both the ==operator and a customCompare function (in the std::sort call) at the same time.

So the final code is:

bool operator<(const myObjects::myObjectLib & lhs, const myObjects::myObjectLib & rhs) {
if (lhs.X < rhs.X) return true;
if (lhs.X > rhs.X) return false;
if (lhs.Y < rhs.Y) return true;
if (lhs.Y > rhs.Y) return false;
return lhs.Z < rhs.Z;
}

bool operator==(const myObjects::myObjectLib & lhs, const myObjects::myObjectLib & rhs) {
return (lhs.X == rhs.X) && (lhs.Y == rhs.Y) && (lhs.Z == rhs.Z);
}



std::sort(myObject.begin(), myObject.end());
myObject.erase(std::unique(myObject.begin(), myObject.end()), myObject.end());


Thanks again for your patience and kindness.

Share this post


Link to post
Share on other sites
Is that code really doing what you want it to though? Doing exact comparrison on floats is problematic in all situations except when you know you've directly assigned floatA = floatB. (and even then there are issues). If you want to erase objects that are really close together, there are more efficient and more stable ways of doing so.
If your program is duplicating objects and then later trying to determine which are duplicates, you might want to look into storing pointers and/or reference counting instead.

Share this post


Link to post
Share on other sites
You might also be interested in the following: http://stackoverflow.com/questions/1041620/most-efficient-way-to-erase-duplicates-and-sort-a-c-vector

Share this post


Link to post
Share on other sites

Is that code really doing what you want it to though? Doing exact comparrison on floats is problematic in all situations except when you know you've directly assigned floatA = floatB. (and even then there are issues). If you want to erase objects that are really close together, there are more efficient and more stable ways of doing so.
If your program is duplicating objects and then later trying to determine which are duplicates, you might want to look into storing pointers and/or reference counting instead.


Hi, the program I'm currently developing is actually parsing a wavefront .obj file, collecting vertices data into a vector struct; after that I need to remove duplicate vertices, so that items with identical X,Y and Z values need to be unique. The code does what I need, but of course if there is a better implementation I'm all hears. :)

Share this post


Link to post
Share on other sites

[quote name='taz0010' timestamp='1324147475' post='4894836']
Is that code really doing what you want it to though? Doing exact comparrison on floats is problematic in all situations except when you know you've directly assigned floatA = floatB. (and even then there are issues). If you want to erase objects that are really close together, there are more efficient and more stable ways of doing so.
If your program is duplicating objects and then later trying to determine which are duplicates, you might want to look into storing pointers and/or reference counting instead.


Hi, the program I'm currently developing is actually parsing a wavefront .obj file, collecting vertices data into a vector struct; after that I need to remove duplicate vertices, so that items with identical X,Y and Z values need to be unique. The code does what I need, but of course if there is a better implementation I'm all hears. :)[/quote]


One of the problems I was working on was mesh simplification - the process of joining polygon faces together so the geometry can be expressed with the least amount of redundant faces/vertices possible. My algorithm was designed to fuse vertices which were within a specified distance to each other. If you need to introduce a tolerance factor and regard vertices as equal even if they're not identical, then the vector + unique implementation won't work. But if your duplicate vertices are indeed identical then your solution works fine.


If you need to make it faster, employ an unordered_set instead of a vector. The stackoverflow article mentions sets being faster if the number of vertices exceed a certain threshold. Unordered_set is considerably faster than std::set, in part because it removes the artificial requirement of having to sort the vertices. As you parse the obj file, simply add each vertex to the set and check the return value to see if it's a duplicate.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement