Sign in to follow this  
utilae

what's the fastest search algorithm

Recommended Posts

utilae    188
Hi, I am using C++ and vectors. I was wondering, what's is the fastest method of searching for a specific element (eg a=b) in a vector, where the vector is unsorted? Is there a faster way than iterating from start to end?

Share this post


Link to post
Share on other sites
Way Walker    745
Quote:
Original post by utilae
Hi, I am using C++ and vectors.

I was wondering, what's is the fastest method of searching for a specific element (eg a=b) in a vector, where the vector is unsorted?

Is there a faster way than iterating from start to end?


There are really only two search algorithms: linear (O(N)) and binary (O(log N)). The latter only works for sorted vectors. If you're going to be doing a lot of searching, sorting is a good idea (or never letting the vector become "unsorted").

Share this post


Link to post
Share on other sites
Troll    246
You can still get better than O(n) search time. If it's possible given your project, you could create a parallel container containing a sorted version of the data. If it's simple data (say ints), you store the value directly in there. Whether you do or not, hold an index of that item in the original vector. If it's an associative container do a find; otherwise, perform a binary search. Make sure to profile it as your ratio of vector push_backs, inserts, etc. to searches will determine whether or not you're saving time on the whole.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Quote:
Original post by Way Walker
There are really only two search algorithms: linear (O(N)) and binary (O(log N)).
How do you search a phone book? If you have to look for a person who's last name is "X" as in "Mr. X", where would you begin? I hope it's not at the middle.

Hint: You're missing one important search algorithm from your list.

Share this post


Link to post
Share on other sites
iMalc    2466
There are other ways to search an array too:

1. For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.

2. A Probability search. If you do not wish to sort the data, then you can still improve upon the usual linear search. Every time you find the item you are looking for, simply swap it with its predecessor, making it a little faster to find next time (Obviously moving items has to be okay). Eventually given an uneven distribution, you'll find the common items much faster. In fact, with an enourmously skewed distribution, this can become faster on average than a binary search.

Best to either sort the vector, or use a set (internally a red-black tree), or best of all, insert the items into a hash-table.
You only have to want to search the data more than once for these techniques to be worthwhile.

Other options are having multiple CPUs search diferent portions of the data at once.

Or if you're into quantum physics you could look into "Grover's quantum searching"...

Share this post


Link to post
Share on other sites
ttdeath    100
If the vector is completely random then you can opt for simple parsing safely, there is no other faster way, probabilistically speaking. You could try and pick random positions out of the vector and perhaps hit the target sooner, but its all the same with the ordered liniary search, from the probability point of view.

On the other hand, nothing is really random when it comes to computers.
Can't you infere some laws on which to base your search?

In any case, look at the following link for more search types:
http://www.cs.dartmouth.edu/~brd/Teaching/AI/Lectures/Summaries/search.html

Share this post


Link to post
Share on other sites
Extrarius    1412
Quote:
Original post by Anonymous Poster
Quote:
Original post by Way Walker
There are really only two search algorithms: linear (O(N)) and binary (O(log N)).
How do you search a phone book? If you have to look for a person who's last name is "X" as in "Mr. X", where would you begin? I hope it's not at the middle.

Hint: You're missing one important search algorithm from your list.
Hint: Converting between bases when dealing with logarithms is a simple multiplication by a constant. Thus, binary search is algorithmically equivalent to an N-ary search, which is basically the way we search a phonebook (except we aren't good enough to have indexes to each section, so we instead guess, which is essentially just replacing the first many iterations of a binary search with a quick approximation).

Share this post


Link to post
Share on other sites
Quote:

Hint: Converting between bases when dealing with logarithms is a simple multiplication by a constant. Thus, binary search is algorithmically equivalent to an N-ary search, which is basically the way we search a phonebook (except we aren't good enough to have indexes to each section, so we instead guess, which is essentially just replacing the first many iterations of a binary search with a quick approximation).


I think he was referring to hashing where open hashing / separate chaining is usually the best variant for a unknown number if elements where it's in O(n/m) in the average case where n is the effective number of stored elements and m is the number of of slots in your hash table iirc. However I could imagine that finding a good hash-function for vectors could be a mayor pain in the butt.
Sorting vectors probably isn't that easy as well but more feasable - using some self balancing tree structure like AVL trees (moderate implementation effort) one could get pretty good results. Since usually vertices / vectors aren't generated in large numbers during runtime, so building the AVL tree at startup / load time shouldn't be that much of a problem really and you get all the benefits of searching in binary trees (already mentioned: O(log(n)))...

Share this post


Link to post
Share on other sites
oggialli    217
No need to implement it yourself - std::set (and/or std::multiset) are usually very well implemented - better not to reinvent the wheel at all.

Share this post


Link to post
Share on other sites
sanstereo    115
You're talking about two seperate operations, which would be less efficient.
It would have to run through the complete list first to sort & then again to search.
If it was O(n logn) + O(n) or even O(logn) + O(1) it'd still be less effiecient, but the only way you'll get that kind of performance (O(1)) is from something like hashing anyway.
Plus, if it's a tree, if you sort it first it'll simply become a linked list and that's adverse to the goal here.

Most of the searching algorithms I've seen around the net cater to text searches: SFS, Boyer-Moore, Knuth-Morris-Pratt, etc . . .

It'd be best to get some cheap books on Amazon.com that have multiple examples of algorithms. Your best friend in this field (programming, development) is physical, paper text and, luckily, this subject is popular and has aged quite well so you can find a lot of good books cheap.
That was the best lesson learned from a degree in computer science.

EDIT:
Trap beat me to it.
Got my reply . . .

Share this post


Link to post
Share on other sites
Optimal searching is a worst case O(log n) time complexity when we assume a sequential model of computation. A parallel model allows worst case O(1) time complexity. The key thing to keep in mind here is that when people make statements about optimality, they are inherently assuming a model of computation ... typically a sequential, random access machine. So when somebody says that optimal comparison sorts have O(n log n) complexity, this is only true on a sequential machine. Comparison networks can achieve better performance.

On the other hand, for your purposes assuming a sequential RAM computer works just fine.

Share this post


Link to post
Share on other sites
Alex Wallar    98
If you are going to search through the vector many times, it would be better to use Merge Sort and then use Binary Search to find your terms. The thing is that you are only going to need to use Merge Sort once (unless you are going to put more terms into the list). This means that you are going to have a time of O(n lg n) + kO(lg n) where k is the amount of times you search through the list. The time mentioned would be the time for that subdivision of the program to run.

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
Sign in to follow this