Sign in to follow this  

C++ Searching...best STL Algorithm

This topic is 3729 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Trying to locate the most efficient searching algorithm... Looking into the C++ Standard library, the standard sequence container search algorithms all have linear complexity, whereas the associated data container (map) has logarithmic complexity. For basic data searching, is it better to use just to use a map and search the key (leaving the value empty) or better to use a sequence container (vector, list, deque)? --random

Share this post


Link to post
Share on other sites
If you have a sorted sequence container you can use std::lower_bound() or std::upper_bound() which are also log(n) operations. But if you use a std::map without a value, you might as well use a std::set. If you never need the elements to be ordered, you also might look into non-standard hash table containers.

Share this post


Link to post
Share on other sites
What I want to do is to generate a sequence of values one by one and as each new one arrives check it against the others, to see if there are duplicates. There will not be any other use for the data; but I must keep the data in precisely the same order as it is produced (it cannot be sorted).

Otherwise, if more efficient I could keep an 'image' of the original data pattern and use a parallel sorted set. for my purposes.

--random

Share this post


Link to post
Share on other sites
If there is no other use for the data, why keep it in sequence? Anyways, consider using a hash_set, if your compiler comes with an implementation or a set otherwise.

Share this post


Link to post
Share on other sites
Quote:

Otherwise, if more efficient I could keep an 'image' of the original data pattern and use a parallel sorted set. for my purposes.


That sounds like a good idea - you could use a std::vector or std::deque of values for your result and a std::set to check whether the values already exist. If that isn't fast enough you could use a hash table, but there isn't one in the C++ standard library.

Share this post


Link to post
Share on other sites
Quote:
Original post by random_thinker
But is this efficient?

"Efficient" is a relative term. Inserting n values into a std::set takes O(n log n) time. Inserting n values into a std::hash_set takes about O(n) time, assuming an acceptable hash.

Share this post


Link to post
Share on other sites
Si...looking at this a bit more, I probably only need to keep an image of a moving group of 3 to 5 values. But the entire population would be needed in some other form for checking duplicates.

I could simply use:


unsigned long value = 1010101010UL;
std::set<unsigned long> mySet;
if (!mySet.insert(value).second) std::cout << "Duplicate of " << value << " found!";


And then maintain the ordered sequence image of 3 to 5 values in parallel, no?

--random

Share this post


Link to post
Share on other sites
**fanfare for boost::multi_index_container**


#include <algorithm>
#include <iostream>
#include <iterator>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/identity.hpp>

namespace bmi = boost::multi_index;

template<typename Value>
class unique_list
{
private:
typedef bmi::multi_index_container
<
Value,
bmi::indexed_by
<
bmi::ordered_unique<bmi::identity<Value> >,
bmi::sequenced<>
>
>
container_type;

public:
typedef typename container_type::template nth_index<1>::type::const_iterator const_iterator;

unique_list() : container_() { }

void append_if_unique(const Value &element) { container_.template get<0>().insert(element); }

const_iterator begin() const { return container_.template get<1>().begin(); }

const_iterator end() const { return container_.template get<1>().end(); }

private:
container_type container_;
};


int main()
{
unique_list<int> ulist;

ulist.append_if_unique(2);
ulist.append_if_unique(1);
ulist.append_if_unique(2);
ulist.append_if_unique(3);
ulist.append_if_unique(1);
ulist.append_if_unique(-6);

// Should print 2, 1, 3, -6
std::copy(ulist.begin(), ulist.end(), std::ostream_iterator<int>(std::cout, "\n"));

return 0;
}


Share this post


Link to post
Share on other sites
Thx All,

I'll have another look at this problem and also boost::multi_index_container too Edd, I think the best approach is to have the moving 'image' of 3 to 5 sequential values and then a heap or hash containing the population. For the heap or hash (I really don't know the correct term), each value should be unique, but form a reference set for comparison to newly generated values. The most important information from this is how many of the newly-generated values are already within the reference set, and this is really an algorithm speed/efficiency problem. There could be billions of values (integers) within some of these reference sets.

My intent is to use this as another method to test the quality of pseudo-random number algorithms. This approach would allow me to more efficiently spot the actual period length and the uniqueness of the numbers generated within that period.

--random

Share this post


Link to post
Share on other sites
After looking into this info and doing a bit of testing, looks like Si's recommendation for hash_set has won out. I'm using GCC on linux, and for this combo it works something like this:


#include <ext/hash_set>

int main(int argc, const char* argv[])
{
__gnu_cxx::hash_set<unsigned int> myHashSet;
myHashSet.insert(1);
myHashSet.insert(2);
myHashSet.insert(3);
if (!myHashSet.insert(3).second) std::cout << "\nInsert failed!" << std::endl;

return 0;
}



Thx...

--random

Share this post


Link to post
Share on other sites
Well.. If you're only going to use it to verify a RNG then I assume it doesn't have to be completely accurate?
In which case I'd probably go with a bloom filter, if you're going to need to manage "billions" of values anyway. That way you get linear time insertions, plus it's very space efficient.

Share this post


Link to post
Share on other sites

This topic is 3729 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this