Modified Binary Search

Started by
6 comments, last by Zahlman 17 years ago
Say that you have a guess about the area that the item you're looking for is. Is there an existing algorithem that take this guess into account. (It could be exactly at the guess, or around there, but occasionally some where else completely.
Advertisement
You could try searching in a fixed-sized region around where your guess is, and then falling back to regular binary search if that doesn't work. This might be better than doing the full binary search all the time, but its only really helpful if your guesses are fairly accurate.
Joshua Barczak3D Application Research GroupAMD
There is an algorithm, but the math is a little tricky. Basically, you need to have a guess as to where the item is, and you need a variance (the higher the number, the fuzzier your idea of where exactly the item is). This gives you a Gaussian probability distribution over the entire range. Then at each step you split the search area so that instead of an equal number of items being on each side of the split, an equal probability is on each side of the split.
Here's some pseudocode for the basic binary search, appreciatingly derived from that at Wikipedia:Binary Search:
Index BinarySearch(Array array, Value value){  Index low = 0  Index high = array.Size - 1  Index current = low + (high - low) / 2  while (low <= high)  {    if (array[current] > value)    {      high = current - 1    }    else if (array[current] < value)    {      low = current + 1    }    else    {      return current    }    current = low + (high - low) / 2  }  return not_found}

Here's some that is modified that might do what you're wanting, though I don't know for sure. Just came up with it on the spot.
Index AlteredBinarySearch(Array array, Value value, Index estimate, Fraction tightness){  Index low  Index high  Index current  if (array[estimate] > value)  {    low = 0    high = estimate - 1    current = Floor(low + (high - low) * tightness)    while (low <= high)    {      if (array[current] > value)      {        high = current - 1      }      else if (array[current] < value)      {        low = current + 1      }      else      {        return current      }      current = Floor(low + (high - low) * tightness)    }  }  else if (array[estimate] < value)  {    low = estimate + 1    high = array.Size - 1    tightness = 1 - tightness    current = Ceiling(low + (high - low) * tightness)    while (low <= high)    {      if (array[current] > value)      {        high = current - 1      }      else if (array[current] < value)      {        low = current + 1      }      else      {        return current      }      current = Ceiling(low + (high - low) * tightness)    }  }  else  {    return estimate  }  return not_found}

What it does is first check the estimate. If the estimate is the item, then it just returns that. If the estimate is too high, then we need to search the lower portion. And if it's too high, then we need to search the upper portion, and also adjust the tightness parameter that I'll explain next. So after doing the first test on estimate, and setting variables up as necessary, we continue like normal, except when we calculate the new current, we don't multiply by 1/2 (divide by 2), but instead multiply by our tightness value, which should be somewhere between 1/2 and 1. If we're looking in the lower segment, then that means that we want to search sooner near the upper range, since that's where the estimate was. So if our tightness was 3/4, for example, instead of moving halfway down the lower section, we wouldn't move as far from the estimate. Now if we're searching the upper portion, we need the tightness to be adjusted to the range 0 to 1/2, where 0 is the tightest, instead of 1. We do that by subtracting the original tightness from 1. I think we also need to do a Ceiling() instead of a Floor() when doing the upper portion, to avoid getting stuck in an infinite loop. I believe the standard binary search can avoid this due to the way division by 2 works.

The code could probably be reorganized some, but that's not too significant.

So, example. Say we have these items (preceded by element number):
( 0)   4( 1)   7( 2)  13( 3)  42( 4)  43( 5)  44( 6)  45( 7)  59( 8)  63( 9)  64(10) 105(11) 113(12) 132(13) 135(14) 139(15) 141(16) 155(17) 168(18) 190(19) 225

Let's say we're looking for the value 7, and we're pretty sure it's near the front of the list, so we estimate index 5, and we specify a tightness of 3/4.

First we check index 5 itself, see that it is 44, so we need to search the lower portion, indices 0 through 4. Tightness remains the same. Our next current is Floor(low + (high - low) * tightness), or Floor(0 + (4 - 0) * 3/4), which is 3. So we check 3, which is 42. Less, so we reduce high to be 2. Current becomes Floor(0 + (2 - 0) * 3/4), which is 1. Index 1 does contain what we want, so we've found it.

Now let's say that we used the same estimate and tightness, but were searching for value 168. array[5] is 44, so we're too low. Tightness gets reversed to be 1/4 instead of 3/4, low becomes 6, and high is 19. New current becomes Ceiling(6 + (19 - 6) * 1/4), which is 10. Array[10] is 105, still too low. Low becomes 11, and current becomes 13. Array[13] is 135, too low. Low becomes 14, current becomes 16. Array[16] is 155, too low. Low becomes 17, current becomes 18. Array[18] is 190, too high. High becomes 17, current becomes 17. Array[17] is what we were looking for, so we return it.

So when our estimate is reasonable, we find it quicker, but if the estimate is off, the algorithm doesn't move out away from the estimate as quickly, and it can take longer to find it.
"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke
Depending on exactly what it is you want people might be making this a little harder than it needs to be. The classic binary search just takes a range and uses the middle element as a pivot point. Assuming you don't want to do continuous refinement of your guess, you can just modify this by adding one extra step that takes your guess and a variance. This gives you three sub-ranges, the range including the guess, the range below that, and the range above. Do a quick check to see which sub-range the actual value is in and fall back to the classic algorithm.
-Mike
Man what a coincidence, I just finished uploading part of my (unfinished) website two days ago, that contains this very algorithm. I call it Interpolation Search.

Actually, I'd very much welcome any comments on the article.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Sounds similar to whats used in iterative deepening MTD(f) and narrow window Alpha-Beta searches.

Have the core search function return special values for Failed_Low and Failed_High (perhaps simply MinInteger and MaxInteger), and wrap that up in a bit of logic to account for those cases and perhaps provide the initial guess window as well in there.

I could see this sort of thing being a medium sized advantage when recent queries are likely to be near the current query. I would definately be liberal with the size of the initial guess window (perhaps a sqrt(n) window size) and would fall back immediately to the largest possible window matching the fail state (the idea being that failures should be rare, and when a failure occurs that its usualy because we are doing random queries where no approximation is going to be beneficial)

Here's my idea, which does not require a "variance". The idea is to look at the "hint" element, then the range of 3 elements on one side, then the 7 after that, etc. until we figure out which range (if any) the element is in, and then binary search the range normally. Since the subrange size grows exponentially, we will cover the search space in O(lg N) steps, and can thus guarantee O(lg N) running time even if our guess is horribly wrong :) (This approach should also be a fairly good heuristic towards cutting things up into "equal probability" regions, I think.)

(Not tested!)

template <typename RAI>void move_in_range(RAI r, int amount, RAI begin, RAI end) {  // being careful not to create invalid iterators...  if ((amount < 0) && (r - begin) < amount) { return begin; }  if ((amount > 0) && (end - r) < amount) { return end; }  return r + amount;}template <typename RAI, typename T>RAI binary_search_hint(RAI begin, RAI end, RAI hint, const T& element) {  RAI sub_end = hint + 1;  if (element == *hint) { return hint; }  int size = 1;  // Only one of these loops can execute.  while (element > *hint) {    if (hint == end) { return end; } // not found.    move_in_range(hint, size, begin, end);    size = size * 2 + 1;    move_in_range(sub_end, size, begin, end);  }  while (element < *hint) {    if (hint == begin) { return end; } // not found.    move_in_range(sub_end, -size, begin, end);    size = size * 2 + 1;    move_in_range(hint, -size, begin, end);  }  // No need to write the last step ourselves ;)  return std::binary_search(hint, sub_end, element);}


Of course, you could also use a variance and specify it as the initial "size".

("RAI" is of course short for "RandomAccessIterator", because I'm lazy like that. :P)

This topic is closed to new replies.

Advertisement