what's the fastest search algorithm

Started by
15 comments, last by Brother Bob 12 years, 1 month ago
Hi, I am using C++ and vectors. I was wondering, what's is the fastest method of searching for a specific element (eg a=b) in a vector, where the vector is unsorted? Is there a faster way than iterating from start to end?

HTML5, iOS and Android Game Development using Corona SDK and moai SDK

Advertisement
If the items are unsorted, all you've really got is a linear search. So yes, you just have to iterate from begining to end.

CM
Quote:Original post by utilae
Hi, I am using C++ and vectors.

I was wondering, what's is the fastest method of searching for a specific element (eg a=b) in a vector, where the vector is unsorted?

Is there a faster way than iterating from start to end?


There are really only two search algorithms: linear (O(N)) and binary (O(log N)). The latter only works for sorted vectors. If you're going to be doing a lot of searching, sorting is a good idea (or never letting the vector become "unsorted").
You can still get better than O(n) search time. If it's possible given your project, you could create a parallel container containing a sorted version of the data. If it's simple data (say ints), you store the value directly in there. Whether you do or not, hold an index of that item in the original vector. If it's an associative container do a find; otherwise, perform a binary search. Make sure to profile it as your ratio of vector push_backs, inserts, etc. to searches will determine whether or not you're saving time on the whole.
Quote:Original post by Way Walker
There are really only two search algorithms: linear (O(N)) and binary (O(log N)).
How do you search a phone book? If you have to look for a person who's last name is "X" as in "Mr. X", where would you begin? I hope it's not at the middle.

Hint: You're missing one important search algorithm from your list.
There are other ways to search an array too:

1. For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.

2. A Probability search. If you do not wish to sort the data, then you can still improve upon the usual linear search. Every time you find the item you are looking for, simply swap it with its predecessor, making it a little faster to find next time (Obviously moving items has to be okay). Eventually given an uneven distribution, you'll find the common items much faster. In fact, with an enourmously skewed distribution, this can become faster on average than a binary search.

Best to either sort the vector, or use a set (internally a red-black tree), or best of all, insert the items into a hash-table.
You only have to want to search the data more than once for these techniques to be worthwhile.

Other options are having multiple CPUs search diferent portions of the data at once.

Or if you're into quantum physics you could look into "Grover's quantum searching"...
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
If the vector is completely random then you can opt for simple parsing safely, there is no other faster way, probabilistically speaking. You could try and pick random positions out of the vector and perhaps hit the target sooner, but its all the same with the ordered liniary search, from the probability point of view.

On the other hand, nothing is really random when it comes to computers.
Can't you infere some laws on which to base your search?

In any case, look at the following link for more search types:
http://www.cs.dartmouth.edu/~brd/Teaching/AI/Lectures/Summaries/search.html

Quote:Original post by Anonymous Poster
Quote:Original post by Way Walker
There are really only two search algorithms: linear (O(N)) and binary (O(log N)).
How do you search a phone book? If you have to look for a person who's last name is "X" as in "Mr. X", where would you begin? I hope it's not at the middle.

Hint: You're missing one important search algorithm from your list.
Hint: Converting between bases when dealing with logarithms is a simple multiplication by a constant. Thus, binary search is algorithmically equivalent to an N-ary search, which is basically the way we search a phonebook (except we aren't good enough to have indexes to each section, so we instead guess, which is essentially just replacing the first many iterations of a binary search with a quick approximation).
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:
Hint: Converting between bases when dealing with logarithms is a simple multiplication by a constant. Thus, binary search is algorithmically equivalent to an N-ary search, which is basically the way we search a phonebook (except we aren't good enough to have indexes to each section, so we instead guess, which is essentially just replacing the first many iterations of a binary search with a quick approximation).


I think he was referring to hashing where open hashing / separate chaining is usually the best variant for a unknown number if elements where it's in O(n/m) in the average case where n is the effective number of stored elements and m is the number of of slots in your hash table iirc. However I could imagine that finding a good hash-function for vectors could be a mayor pain in the butt.
Sorting vectors probably isn't that easy as well but more feasable - using some self balancing tree structure like AVL trees (moderate implementation effort) one could get pretty good results. Since usually vertices / vectors aren't generated in large numbers during runtime, so building the AVL tree at startup / load time shouldn't be that much of a problem really and you get all the benefits of searching in binary trees (already mentioned: O(log(n)))...
No need to implement it yourself - std::set (and/or std::multiset) are usually very well implemented - better not to reinvent the wheel at all.
Olli Salli

This topic is closed to new replies.

Advertisement