Sign in to follow this  
iMalc

Help with an operation on std::set?

Recommended Posts

iMalc    2466
Hi everyone. This might show my lack of STL knowledge, but here goes... Today, I had a simple operation I wanted to perform. I eventually just wrote it the long way how I knew it would work, but I'm hoping that there's a simpler way (It just feels like there should be). What I have is three std::set's. The first set (lets call it setA), is a master set of all items. The second set (lets call it setB), is a set containing all of the items that I wish to remove from setA. The third set (lets call it setC), is a set containing all of the items that I wish to add to setA. I wish to essentially perform (in psuedocode): setA -= setB setA += setC If you know what I mean. Now, what I first tried (stupid me) was:
setA.erase(setB.begin(), setB.end());
setA.insert(setC.begin(), setC.end());
The second part on it's own is fine (so I don't need help with that part), but it seems you can't use erase with iterators from another set :-(, so the first part is bad. Actually I ended up with a set with less items than size() thought it contained! It would be nice if the documentation I had made it clear that I couldn't do this, I mean the reverse works fine using insert()..., but that's another matter. Alright, now I know about 'set_difference' etc. However that essentially performs the operation: setD = setA - setB After which I have to assign setD to setA. Obviously not very efficient! What I am looking for is something similiar to that, where it simply removes from setA, items that are in setB. My current option is to write a for-loop to iterate through setB, and erase each item from setA. Anyone else have any better ideas, or comments?

Share this post


Link to post
Share on other sites
Sharlin    864
EDIT: Removed a totally unworking piece of code. Apologies for posting something so stupid.

[Edited by - Sharlin on July 1, 2005 4:12:10 AM]

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by Sharlin
Try this out: (untested)

setA.erase(setA.find(*setB.begin()), setA.find(*setB.end()));

Basically, because sets are sorted by definition, the elements corresponding to those of setB form a continuous subsequence inside setA, so they can be erased in one pass.
Oooh, I didn't think of that!

Does anyone have anything to say about how the efficiency of that relates to this?...
for (someTypeOfSet::iterator it = setB.begin(); it != setB.end(); ++it)
setA.erase(*it);
Okay, I'm off to bed. Will check back in the morning.

Share this post


Link to post
Share on other sites
BitMaster    8651
Sharlin, that won't work. Assume A contains the numbers 1 to 10. Assume B only contains the numbers 1 and 10. Your method will remove all numbers from A, while you want to have a set containing the numbers 2 to 9.

Regarding the OP, what's wrong with:

for (MyIterator it = setB.begin(); it != setB.end(); ++it)
setA.erase(*it);


Edit: Apart from the fact that I was a bit too slow, consider the following code as an alternative if setA.size() < setB.size():

for (MyIteratir it = setA.begin(); it != setA.end();)
{
if (setB.find(*it) == setB.end())
++it;
else
it = setA.erase(it);
}

Share this post


Link to post
Share on other sites
Andrew Russell    1394
Sharlin's code absolutly would not work. For starters, you can't dereference the end iterator (use back() on the container to get the last element). Secondly, the logic of what it removes is totally wrong.


What you are looking for are the set algorithms in the STL, in your case I think what you are after is probably set_difference and merge (or inplace_merge). Take a look at what is available.

Alternitivly, you could use remove_if. The performance, however, is going to be pretty terrible.

Share this post


Link to post
Share on other sites
Sharlin    864
Quote:
Original post by BitMaster
Sharlin, that won't work. Assume A contains the numbers 1 to 10. Assume B only contains the numbers 1 and 10. Your method will remove all numbers from A, while you want to have a set containing the numbers 2 to 9.


Woah, indeed. What the hell was I thinking again...

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by iMalc
Obviously not very efficient!


I don't think its as bad as it may appear to be, you can make doing both operations together better & efficient by using a single variable temp, swap (& clear) operation, and a custom pooled allocator to speed up allocations [grin]:


#include <cmath>
#include <algorithm>
#include <iterator>
#include <set>
#include <vector>
#include <iostream>

#include <boost/pool/pool_alloc.hpp>

int main() {

typedef std::set<int, std::less<int>, boost::fast_pool_allocator<int> > seti;

seti i1, i2, i3;

seti::size_type N = 10;

std::generate_n(std::inserter(i1, i1.begin()), N, std::rand);
N = 6;
std::generate_n(std::inserter(i2, i2.begin()), N, std::rand);
N = 4;
std::generate_n(std::inserter(i3, i3.begin()), N, std::rand);

seti temp;

std::set_difference( i1.begin(), i1.end(),
i2.begin(), i2.end(),
std::inserter(temp, temp.begin()));
temp.swap(i1);
temp.clear();

std::set_union(i1.begin(), i1.end(), i3.begin(), i3.end(),
std::inserter(temp, temp.begin()));
temp.swap(i1);

std::copy(i1.begin(), i1.end(),
std::ostream_iterator<int>(std::cout, ", "));
}


Note that you don't need to make the "temp" a set container as the result will be sorted but this way we can use a cheap swap/clear op & possibly reuse of previous allocation so if all is well for doing the difference & intersection together give us a cost of just one extra allocation.

Quote:
Original post by iMalc
What I am looking for is something similiar to that, where it simply removes from setA, items that are in setB.

My current option is to write a for-loop to iterate through setB, and erase each item from setA. Anyone else have any better ideas, or comments?


The other alternative is:


#include <cmath>
#include <algorithm>
#include <iterator>
#include <set>
#include <vector>
#include <iostream>

#include <boost/pool/pool_alloc.hpp>
#include <boost/bind.hpp>

int main() {

typedef std::set<int, std::less<int>, boost::fast_pool_allocator<int> > seti;
typedef seti::size_type (seti::*memfunc)(const seti::key_type&);

seti i1, i2, i3;

seti::size_type N = 10;

std::generate_n(std::inserter(i1, i1.begin()), N, std::rand);
N = 6;
std::generate_n(std::inserter(i2, i2.begin()), N, std::rand);
N = 4;
std::generate_n(std::inserter(i3, i3.begin()), N, std::rand);

std::for_each(i2.begin(), i2.end(),
boost::bind(static_cast<memfunc>(&seti::erase), boost::ref(i1), _1));

seti temp;
std::merge( i1.begin(), i1.end(),
i2.begin(), i2.end(),
std::inserter(temp, temp.begin()));

temp.swap(i1);

std::copy(i1.begin(), i1.end(),
std::ostream_iterator<int>(std::cout, ", "));

}


Profile and see which one is better.

Share this post


Link to post
Share on other sites

std::set<int> tmp( C);
std::set_difference( A.begin(), A.end(), B.begin(), B.end(), std::inserter( tmp, tmp.end()));
tmp.swap( A);


if you're allowed to trash C you can get rid of the temporary.

edit: beaten by the kid

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by snk_kid
Profile and see which one is better.


Okay so i did it myself [grin], its a very brief profile on VC++ 8.0 mind you.

The version with set operations on average was: 0.00513417 (sec or msec i forget heh).

The other version, on average was: 0.00489294 (sec or msec)

So the one using set operations isn't bad at all.

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by DigitalDelusion
@snk_kid: would you mind running my variation to? It does basicly the same thing as your first one without going the roundabout way using two set_functions.


Heh well done, completely missed that [grin].

DigitalDelusion version on average: 0.004123261 (don't ask me the units i've completely forgot lol).

EDIT: note with all the tests i used boost::fast_pool_allocator.

EDIT2: made a slight mistake in the code before, updated results.

[Edited by - snk_kid on July 1, 2005 6:38:36 AM]

Share this post


Link to post
Share on other sites
iMalc    2466
Thanks for al the responses. I realised just after I turned my PC off, that Sharlin's idea wouldn't work.

This is actually for a client/server program. The sever keeps track of a list of items, and the clients send through what items they want to add / remove / update.
I expect setA to be bigger than setB.
Previously I was going to do updates by issueing a remove then an add, however I've realised that I can get rid of the need to transmit the remove data in this case. Add will simply update if the item exists. So basically I have to remove from setA, items in setB or setC. Then add items from setC to setA.

I may use set_difference like DD shows, or maybe stick with the for loop. I'm not so much looking for the fastest solution, but the cleanest or simplest, just as long as it isn't some orders of magnitude slower than another alternative. It would have been so simple if erase could do what I first thought it could do.

I figured there would be a nice boost solution, but I wont be using boost.

My "obviously not very efficient" comment may have been slightly misplaced. I just figured that there was no need to copy all the items into another set, perform a swap and then destroy the original. It's probably not so bad though. It's far more important that it actually works correctly eh!

Thanks again!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this