Binary Search...I'm a little confused (code included)

Started by
4 comments, last by -dcx- 21 years, 6 months ago
#include <string>
#include <iostream>


#define DIM(array)	(sizeof(array) / sizeof(array[0]))


struct Key {
	std::string keyword_;
	unsigned count;
};


// BinarySearch -- returns the index of the keyword if "word" is a keyword found, otherwise -1.  Assumes that
//                 "keys" is sorted in ascending order.
//
int BinarySearch(const Key keys[], const unsigned size, const std::string &word)
{
	int low_index = 0, high_index = size - 1;
	while (low_index <= high_index) {
		int mid_index = (low_index + high_index) / 2;
		if (word < keys[mid_index].keyword_)
			high_index = mid_index - 1;
		else if (word > keys[mid_index].keyword_)
			low_index = mid_index + 1;
		else // We have a match.
			return mid_index;
	}

	// If we make it here, we have no match.
	return -1;
}


int main()
{
	Key keys[] = {
		{ "and",              0 },
		{ "auto",             0 },
		{ "band",             0 },
		{ "bor",              0 },
		{ "break",            0 },
		{ "case",             0 },
		{ "catch",            0 },
		{ "char",             0 },
		{ "class",            0 },
		{ "const",            0 },
		{ "const_cast",       0 },
		{ "continue",         0 },
		{ "default",          0 },
		{ "do",               0 },
		{ "double",           0 },
		{ "dynamic_cast",     0 },
		{ "else",             0 },
		{ "enum",             0 },
		{ "extern",           0 },
		{ "float",            0 },
		{ "for",              0 },
		{ "goto",             0 },
		{ "if",               0 },
		{ "inline",           0 },
		{ "int",              0 },
		{ "long",             0 },
		{ "not",              0 },
		{ "or",               0 },
		{ "private",          0 },
		{ "protected",        0 },
		{ "public",           0 },
		{ "register",         0 },
		{ "reinterpret_cast", 0 },
		{ "return",           0 },
		{ "short",            0 },
		{ "signed",           0 },
		{ "static",           0 },
		{ "static_cast",      0 },
		{ "struct",           0 },
		{ "switch",           0 },
		{ "template",         0 },
		{ "throw",            0 },
		{ "typedef",          0 },
		{ "union",            0 },
		{ "unsigned",         0 },
		{ "virtual",          0 },
		{ "volatile",         0 },
		{ "void",             0 },
		{ "while",            0 },
		{ "xor",              0 },
	};

	for (;; ) {
		std::string word;
		std::cout << "Word? >";
		std::cin >> word;
		if (BinarySearch(keys, DIM(keys), word) == -1)
			std::cout << "\"" << word << "\" is NOT a keyword!\n";
		else
			std::cout << "\"" << word << "\" is a keyword!\n";
	}
}
  
If anyone has the time, can you explain this code to me. One thing I don't understand is how the mid_index works and why the "keywords" are followed by a zero, ie. {"and", 0} But I'll need a good explanation anyways since I'm still relatively new to programming. [edited by - -dcx- on October 18, 2002 8:55:25 PM]
Advertisement
Ok, first I'll try to explain the principal of the Binary Search, then I'll get into the code. So first of all, a Binary search basically uses an ordered set of comparable values. The search then starts in the middle, and determines whether the search item is greater than, equal to, or less than the middle value in the set. If it is equal to, the search is, of course done. If the search item is greater than the middle value, the search is started again, this time starting from the middle of the upper half. The same is true if the search item is less than the middle value: the search would then be repeated from the middle of the lower half. This process is repeated until a value is found, or the algorithm runs out of search space (meaning there are no more middle values.)

So let's look at the code:

First we'll want to look at the data structures, we have a list of Keys, and a Key is just a structure that holds on to a string and a count.
struct Key {	std::string keyword_;	unsigned count;};   

For this algorithm, the count is irrelevant, I imagine that it is needed for a different section of code.
The string keyword is from the standard template library, so it is comparable with other strings with <, >, ==, as opposed to character arrays which depend on things like strcmp(). In this case we are using the keyword string for comparison in the search.

Next bit of code:
	Key keys[] = {		{ "and",              0 },		{ "auto",             0 },		{ "band",             0 },		{ "bor",              0 },etc...   

This is just building the list of Keys into an array called keys[]. the first part is the keyword's value, the second part is the count, which is initialized to zero,though it is not used in this example.

Ok, on to the search algorithm itself:
int BinarySearch(const Key keys[], const unsigned size, const std::string &word){   

the BinarySearch function takes as it's first parameter a set of keys. For the second, it takes the size of the keys array, and for the third the search string.
It returns the index of the search string(if found), or -1 if it wasn't found.
[edit] Note: const means that when the keys[] get passed into the function, they can't be modified by the function, same with the size and word.
	int low_index = 0, high_index = size - 1;   

This sets the lower index to 0, and the higher index to the last element in the array, this guarantees that you will start searching at the middle of the array.

	while (low_index <= high_index) {		int mid_index = (low_index + high_index) / 2;  

A loop is then started, saying that while we haven't exhausted our search space (as long as there is a middle value between the low_index and high_index) we will keep searching.
The mid_index is the middle place of the current search space, to get it, you add the upper and lower index and divide by 2.
Now we just have the basic comparisons to make to determine if we keep searching, or if we've found it:
		if (word < keys[mid_index].keyword_)			high_index = mid_index - 1;   

This statement checks to see if the search item is less than the middle element. If it is, we then know that it's going to be in the bottom half of the current search space, so we can set the upper element to the element right below the middle element.

		else if (word > keys[mid_index].keyword_)			low_index = mid_index + 1;   

If the search string isn't in the middle half, it has to be either at the middle element, or in the upper half, so with this statement, we check to see if it's in the upper half, and adjust the lower index if it is.

		else // We have a match.			return mid_index;   

Finally, if it's equal to the middle element, we have a match, so we can return the middle index as our value, because thats where we found it.

	}	// If we make it here, we have no match.	return -1;}   


This last bit is pretty much self-explanatory, if we get out of the loop without finding a match, we return -1 as the index, meaning there was no match.

Hopefully this is what you're looking for!

[edit - formatting]

[edited by - Tybal on October 18, 2002 10:28:25 PM]

[edited by - Tybal on October 18, 2002 10:30:44 PM]
This binary tree uses arrays instead of linked nodes. The second element of the array is i.e. 1 is left node the next is root node the fourth element is right node. I haven''t done this yet but search for "implementing binary trees with arrays". Or better yet goto oopweb.com and look thru "c++ book for scientists" their tree section explains this algo. A good book on data structures would be nice to have. Can''t recommend any since I don''t have one.
Argg... Ignore my previous post, I looked at binary and immediately I thought of binary trees, I'm just overworked, plus startrek was coming on so I replied in hurry w/o looking at the code in detail. Saw words like binary and index and I thought the topic was binary trees. Anyways, binary search is an algo where the array of items is split in half then you check your item's value against the middle index and if the middle element of the array contains your sought value you're done otherwise you split the bottom half of the array in two and check that half's middle index and continue doing so until you hit only one element then you bottom out of the recursion and start checking the top halfs of the array in similar fashion.

The binary tree algo I was talking about in my previous post uses single array of user objects or built in types instead of linked nodes. The algo uses mathematics indexing into the array i.e. the querying of tree elements is done using math rather than pointers. Imagine a root node that has a left and a right child. Imagine you have an array of integers. Root node is assigned an index of 1 so that the following math works out and you're not getting zero for more than one node. Each node has a unique index into the array.

left = 2i
right = 2i + 1
parent = i/2 rounded down so 1/2 = 0.5 which is 0 or 3/4 = 0.75 or 0 again or 2/2 is 1 not zero, etc.

So, if you want to write root of the tree as 10 for example into the array you write it at index 1 like so:

array[1] = 10;

Then you insert another value say 9 into the tree but before you do you need to figure out whether the 9 should be inserted into left child node or right child node of the root node. If 9 is less than root i.e. 10 then you insert 9 to left child otherwise it goes into right child node. Since you're using one array to store the 9 you need to figure out the index to store 9 at. The values that you're inserting into tree nodes are not placed in any order into the array. You're using the math indexing to get at those values, you can't just read each element of the array in linear fashion in order to get the values in sorted order.

So let's figure out the index the 9 should be inserted at. Let's get root value first. We know root is stored at index 1 so:

if array[1] > 9 get left else if array[1] < 9 get right so: 9 is less than root thus store at left child node of root like so:

left index equals 2i or 2 times root i.e. 1 thus left = 2 so:

array[left index] = 9 or array[2] = 9;

Once the array is filled with your values you can do operations such as: printing values in sorted order or searching the tree for a value. I haven't implemented this yet so I don't know all details however the indexing is the important part of this algo. Replace the 'i' in the math equations with the index of this node so you can figure out the left or right indices. In the array the parent node's index is first followed by left then right childs. Thus in array you have:

array[0] equals 0
array[1] equals root
array[2] equals left child of root
array[3] equals right child of root
array[4] equals left child of root's left child node
array[5] equals right child of root's left child node

You can use hash tables to speed up searching. Hash table is an array of arrays or an array of pointers to lists i.e. an array of lists. The table looks like this:

hashed value->value->value->end
hashed value->end
hashed value->value->end
hashed value->value->value->end
hashed value->value->end
hashed value->end

Say you want to insert 10 into the table. You use hash function to create an index value or hash value for 10 so let's say that index created is 3 the table looks like this now:

3->10->end

Then you insert 11 into list so you hash 11 again to get new index i.e. 3 again in this case so table is:

3->10->11->end

So you do this for all your values you want in the table. Then when you go searching for a value you hash it then using the new index you look into the offset and into the list so your sought value is 10 you hash it and get 3(just like before since hash function doesn't change its output) and then you look thru all the values of list at index 3 of the array for value 10 and sure enough it's the first one in the list. So you're done searching for 10. I haven't implemented hash table nor binary search trees yet since I use STL instead thus there might be some gotchas that I've overlooked but I think this is how they work. Something else though, if you use binary trees then make sure you randomize your values before storing them inside the tree so that the tree gets balanced and your searches don't take long. Otherwise you'll end up with a single list thus searching thru it will take linear time. STL implementation of maps/sets uses red/black trees that use different insertion qualification algos to make sure the tree stays balanced. It uses the nodes colors i.e. black or red to determine if new child nodes need to be built. Despite what the books tell you, you need to know something about the implementation or have a document that tells you the best way the container should be used. This is also because you need to know whether reallocations within the container causes your pointers to become invalid or not. Then there is the issue of having algo complexity values of each containter so you know which one is fastest for your particular case.


[edited by - JD on October 19, 2002 2:58:26 AM]
Thanks for the explanations on both Binary Search and for the extra info on Binary trees. Unfortunately I''m having some difficulties with my current project so I''ll have to concentrate on that first. But I''ll definitely read this over once I finish the project I''m working on now.
I just wanted to show you there are other ways to skin the cat and that some ways are better then others. Good luck in your project.

This topic is closed to new replies.

Advertisement