Faster than Binary Search?

Started by
4 comments, last by GameDev.net 19 years, 7 months ago
Sup, Say i have a list of 10,000,000 sorted integers in a range say between A and B, are there any algorithms that are faster than a good old binary search? i know about hash maps, but is there anything else? Cheers, Danu PS. this is just out of curiosity.
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
Advertisement
Without any specific information, yes binary search is the fastest. Variations on binary search, or totally new searchs can be faster depending on exactly what kind of data will be "normal" in the list, and how often the list will need to change.
When do you ever have (an array not a list I assume) of 10 million integers for any useful purpose?
Surely the bulk of the work would be sorting 10 million integers to begin with?

Anyway...

if A and B are close enough together such that there would be many duplicates, then you could create a lookup table where each entry points to the first occurrence of that value. Though if integers are the only items in the array then the larger array would be pointless as you could store it in a more compressed form.

Your main problem with a dataset that huge isn't the binary search algorithm, but the fact that there will be so many cache misses. With 10M items you only have a maximum of 24 values to look at.

If there is quite an even distribution of values then you could do a linear-interpolation search, though that's at best only ever-so-slightly faster than a binary search.

Otherwise hash tables could be faster. However if you had 10 million integers you probably don't want a hash table that size as well as your memory requirements would be ridiculous.

For a dataset this big the optimal solution might shift from the fastest to the least memory hungry solution. If it's for some kind of server application then you would begin distributing the load amoungst multiple servers. Though you'd probably find that the network traffic was more of a bottleneck anyway.

In the real world you have to think outside the box, and get creative.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
yea it was just a hypothetical question.

i seem to have (re?)discovered a hybrid binary search algorithm
that takes constant time no matter the data size, still working
on the kinks and trying to implement a usefull version of it
ie. strings, or objects etc.

thanks for the replies,

Cheers,
Danu
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
With 10 million entries, a balanced binary tree would require at most 24 comparisons to find an entry. Really, unless you're using iris scans plus fingerprints and image recognition as a 3 part key, the major speed problem with 10 million elements would be getting it between permanent storage, RAM, and the processor cache.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by silvermace
yea it was just a hypothetical question.

i seem to have (re?)discovered a hybrid binary search algorithm
that takes constant time no matter the data size, still working
on the kinks and trying to implement a usefull version of it
ie. strings, or objects etc.

thanks for the replies,

Cheers,
Danu
constant-time and binary-search are contradictory. That reminds me of an outrageous compression claim made about a year ago.
ONLY a hash-table or hashing-function would give you constant time access.
My guess is that you're doing an interpolation search. In the best case it may be constant time, but you seldon hit the best case and it's better overall to optimise for the worst case.

So what exactly are you doing?

This topic is closed to new replies.

Advertisement