Representing sets and subsets

Started by
6 comments, last by P2 21 years, 7 months ago
I''m toying around with som graphalgorithms and realized that I need some way to represent sets. But I''m not really sure how to code it... I want to be able to do operations like union,intersection, merge,member and so on. First I was just thinking of a bitvector, a big array that is true if the number is a member. member and insert operations should work on constant time. union, intersection and operations like that would work on linear time, n, if the array is n long. This is fine but it might waste quite a lot of memory since the size is proportional to the sets maxsize. A linked list wouldn''t waste so much memory but it''ll slow down many of the operations and might need to be sorted in some way. Are there other ways to do this? How is this usually done? Thanks in advance, P2
Advertisement
What language are you coding in?
I''m coding in c++

/P2
I would think then that the STL vector class would be a good basis for a set since it is resizable. Create your own class that inherits from the vector class and adds in the methods (set operations) that you need.

Cheers,

Timkin
If you''re using STL, might as well use STL''s set
If you expect your sets to be dense, arrays of bits (std::vector) are a good way to represent them. If you expect them to be sparce, a red-black tree of numbers (std::set) is better. If you expect them to be just a few elements, a sorted vector of ints is better.

S = "size of your range of numbers"
N = "number of elements in the set"

The time required for each operation (in big-O notation) would be:
Operation        vector    set      vectorMembership test      1            log N           log NInsert               1            log N             NDelete               1            log N             NNegation             S              S               SUnion                S      (N1+N2)*log(N1+N2)?   N1+N2Intersection         S      (N1+N2)*log(N1+N2)?   N1+N2 


The times can be affected by very different constants, so every representation has its niche. You will have to time your code.

If you really like trouble (and speed) you can use several representations, using for each particular subset the most convinient. But then union and intersection have to be implemented for the n^2 possibilities. The times for union would be something like
                                        set 2                       ---------------------------------------                       vector    set      vector      | vector        S             N2             N2set 1 | set            N1     (N1+N2)*log(N1+N2)?  N1+N2      | vector         N1           N1+N2          N1+N2 


I am making these tables as I go, so they might be wrong.

Aaargggh! Let''s try using "source" instead of "code".


  Operation        vector<bool>    set<int>      vector<int>Membership test      1            log N           log NInsert               1            log N             NDelete               1            log N             NNegation             S              S               SUnion                S      (N1+N2)*log(N1+N2)?   N1+N2Intersection         S      (N1+N2)*log(N1+N2)?   N1+N2  



                                          set 2                       ---------------------------------------                       vector<bool>    set<int>      vector<int>      | vector<bool>        S             N2             N2set 1 | set<int>            N1     (N1+N2)*log(N1+N2)?  N1+N2      | vector<int>         N1           N1+N2          N1+N2  

Thanks for that very enlightening information alvaro!

Timkin

This topic is closed to new replies.

Advertisement