• Advertisement
Sign in to follow this  

Random Order Iterator for std::set

This topic is 4496 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, so I've got an std::set in which I would like to visit / extract / iterate over some number N elements, without getting any duplicates. The key part is that I need the order (and first element) in which I visit / extract / iterate over these elements to be random, or some reasonable approximation thereof (nothing critical is going on). The set can be modified or not by the visiting process / iteration; it doesn't matter. I could copy everything from the set into a vector, then random_suffle and iterate from start until I've visted N elements, but that would involving copying all M elements in the set to the vector, and a shuffle, just to extract N elements, where N might be much smaller than M in general (though not always). I could also generate some random number Q in the range [0, set.size()) and iterate forward Q times, extract an element, repeat, and loop back to the start if I hit the end of the set, but this seems kind of kludgy as well... It seems like there should be a built-in, relatively simple, and/or obviously elegant but nonetheless eluding me way to do this, but Google hasn't been any help. Suggestions? Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Instead of copying the elements to the vector, you could create a vector of iterators into the set and randomize the vector of iterators.

Share this post


Link to post
Share on other sites
This is an algorithm for choosing N random elements from a set of M items in order.
The description is recursive, but it is easy enough to implement iteratively:

Choose the first element of the set with probability N/M. If it was chosen, then your problem
reduces to finding N-1 random elements in the remaining M-1 elements of the set; if not, then
you'll have to find N elements in the M-1 remaining items.

This will require at most one full pass over the set.
Then shuffle the N random items if you really need random order.



Share this post


Link to post
Share on other sites
Quote:
Original post by Barius
Choose the first element of the set with probability N/M. If it was chosen, then your problem
reduces to finding N-1 random elements in the remaining M-1 elements of the set; if not, then
you'll have to find N elements in the M-1 remaining items.

This will require at most one full pass over the set.
Then shuffle the N random items if you really need random order.

Doing this, I'd have to do an average of rougly M/2 iterations over the set each time I pick an element, and I need to pick N items, so that would be order N*M operations.

I realize I can't pick *one* arbitrary item out of a set in order 1 time without a random access iterator, but I was hoping I could do better than order M * N total if I have to pick multiple items... This would presumably require a nonrecusive algorithm.

Quote:
Original post by SiCrane
Instead of copying the elements to the vector, you could create a vector of iterators into the set and randomize the vector of iterators.

That would require iterating over the whole set to generate all the iterators pointing to each element, then (I believe) the equivalent of another full iteration to do the shuffling in the vector, then as many order 1 operations as needed to use the the resulting de-sorted vector of iterators. I can't do this once then shuffle each time, as the set of elements is variable... but this would be better than copying all elements certainly. I suppose this is order M operations to make iterators, order M again to shuffle the whole vector, and order N to extarct, leaving me with order M+N operations total... which I guess is better than order M*N for the method above. I suppose I could also skip the full order M shuffle and just generate N unique indices in the rane [0,M-1] to pick my iterators.

Does my handwavey analysis make sense here?

Share this post


Link to post
Share on other sites
Quote:
Original post by Geoff the Medio
Doing this, I'd have to do an average of rougly M/2 iterations over the set each time I pick an element, and I need to pick N items, so that would be order N*M operations.


Only if you pick them one at a time. To do so only makes sense if you either don't know
N in advance or the set changes between single element selections.

Share this post


Link to post
Share on other sites
Depending on how stable the set is, you can cache your random permutation so that your initial O(M) setup is distributed over several iterations. This could easily lead to an amortized O(N) algorithm. Inserting elements into a set doesn't invalidate any iterators, so you can simply add a new element to your iterator array. Removal is equally safe, although it does require finding the element being removed. Obviously, the effectiveness of these depend entirely on how exactly you are using your set.

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by Geoff the Medio
Quote:
Original post by Barius
Choose the first element of the set with probability N/M. If it was chosen, then your problem
reduces to finding N-1 random elements in the remaining M-1 elements of the set; if not, then
you'll have to find N elements in the M-1 remaining items.

This will require at most one full pass over the set.
Then shuffle the N random items if you really need random order.

Doing this, I'd have to do an average of rougly M/2 iterations over the set each time I pick an element, and I need to pick N items, so that would be order N*M operations.

I realize I can't pick *one* arbitrary item out of a set in order 1 time without a random access iterator, but I was hoping I could do better than order M * N total if I have to pick multiple items... This would presumably require a nonrecusive algorithm.


Even though the description is most easily stated recursively, the recursive implementation still does only one pass over the data.

Stated iteratively:

Let N = desired number of items.
Let M = length(set).
For item in set:
if random_chance(N/M):
Add item to selection
Decrement N
Decrement M
else:
Decrement M


If N ever gets to equal M, then this will automatically add all remaining items in the set, in order to make up the 'quota'. But because of the 'feedback' in the random probability, the distribution is unbiased overall.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You can make a vector that points to the same elements as the set. Only add an element to the vector if it is successfully added to the set. Random elements can easily be selected from the vector (using shuffle or random access or whatever). If an element is removed from the set, it can be efficiently removed from the vector by swapping the element to the back of the vector since the order doesn't matter.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Oops, I just realized for that to work an element would need to know where it is inside the vector. It can still work, but the structures would need to be more complicated and the elements pointed at by the vector would need to be aware of their position in the vector.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement