Archived

This topic is now archived and is closed to further replies.

xe_sphinx

Binary Search

Recommended Posts

Hi Everyone How can I implement A Binary search into a LinkedList. I have done a Binary Search with an array, but I did not manage to do it with a LinkedList. Does anybody know how to do that, or direct me to a place in the net where it is explained. Thanks Windows (N): A 32 Bit patch to a 16 bit graphical interface based on a 8 bit operating system originaly encoded for a 4 bit processor written by a 2 bit company that can''''t stand 1 bit of competition.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Because the nodes of a linked list can reside at any location in memory (as opposed to an array, which uses a contiguous portion), a traditional binary search will fail. Look into skip lists in order to efficiently search a linked list.

Share this post


Link to post
Share on other sites
*throws stuff together*

  struct binNode_s {
dataType data;
struct binNode_s *parent;
struct binNode_s *left;
struct binNode_s *right;
} binNode_t;


hope it helps ;X

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
That''s a binary search tree, not a traditional linked list. Don''t confuse the poor guy!

Share this post


Link to post
Share on other sites
quote:

Create a sorted array of pointers to your linked list nodes


Considering the sort takes an order of magnitude more time than a linear search does, that''s not much of a solution :p but does answer the question.

struct btreeNode
{
btreeNode node[20];
btreeNode* branches[20];
int iUsed;
};

...
You were joking, right xe_sphinx?

Magmai Kai Holmlor
- Not For Rent

Share this post


Link to post
Share on other sites
What I actually mean is. I have a sorted Linkedlist (is sorted all the time) and I want to search for a specific item.
I could use linear search, but i need performance.

This is the binary search for an array:
=======================================
void BinarySearch(int left, int right, int test)
{
while (flag == 0 && left <= right)
{
middle = (left+right) / 2;
if (test == listmain[middle])
{
cout << "match found " << listmain[middle];
flag = 1;
}
else
{
if (test > listmain[middle])
{
left = middle + 1;
}
else
{
right = middle - 1;
}
BinarySearch (left, right, test);
}
}



I have tried to implement it into my LinkedList:
================================================
char * EnemyLinkedList::BinarySearch(int iLow, int iHigh, char * szName)
{
Enemy * pTemp = pHead;
UINT middle;

while (SearchFlag == false && iLow <= iHigh)
{
middle = (iLow+iHigh) / 2;

for (UINT i = 0; i < middle; i++)
pTemp = pTemp->pRight;

if (szName == pTemp->GetName())
{
//cout << "match found " << pTemp->GetName());
SearchFlag = true;
return (pTemp->GetName());
}
else
{
if (szName > pTemp->GetName())
iLow = middle + 1;
else
iHigh = middle - 1;

BinarySearch (iLow, iHigh, szName);
}
}
===================================

But this one does not work, any ideas why?


Windows (N): A 32 Bit patch to a 16 bit graphical interface based on a 8 bit operating system originaly encoded for a 4 bit processor written by a 2 bit company that can''''t stand 1 bit of competition.

Share this post


Link to post
Share on other sites
In general, linked lists are not a good choice of data structure for ANYTHING. That is primarily because of their inherent inefficiency.

If you want fast insert, remove, search, and ordered traversal, which is what it sounds like you want, use some form of balanced binary search tree.

Share this post


Link to post
Share on other sites
quote:
Original post by c_wraith
In general, linked lists are not a good choice of data structure for ANYTHING. That is primarily because of their inherent inefficiency.


Right, that''s why they''re in every data structures book ever written; they just like screwing with our minds. Put down the drugs, c_wraith.

Every kind of data structure has different uses and different attributes. If any two had the same, they''d be the same data structure. There is no magic bullet data structure that is better than all others. There might be data structures that are worse than all others, but they wouldn''t make publication. Linked lists are not useless.

Linked lists rule at splicing. There''s no other data structure that can merge two containers together in constant time as simply as linked lists. They''re extremely useful in many situations.

However, linked lists have just about the worst random access of any data structure, which is what you need if you want to do a search through sorted data. In fact, "sorted" and "linked list" are two phrases that should never appear together in the same sentence. To get proper performance, and since you''re sorting your data (indicating that the sequence of insertion is not important), you really need to switch to a tree or hash table.

Share this post


Link to post
Share on other sites
quote:
Original post by c_wraith
In general, linked lists are not a good choice of data structure for ANYTHING. That is primarily because of their inherent inefficiency.



quote:
Original post by Stoffel (follow up to c_wraith)
Right, that''s why they''re in every data structures book ever written; they just like screwing with our minds. Put down the drugs, c_wraith.

[snip, good usefull properties of linked lists]



The frustration that c_wraith feels is I think the fact that for most people the linked list is used as a general purpose expanding container. Not only is it in every data structures book, it''s the first data structure, and often the most advanced you learn in an intro programming course. stl::vector can handle 90% of the cases that a lot of people are writing thier own linked lists for. 90% of the remaining 10% are better with a stl::map.

Share this post


Link to post
Share on other sites
Keep in mind that stl is slooooow. I did a profile comparison of my own general double linked list to both the stl vector and list, and mine kicked some ass. The node insertion time for the vector was on average 0.223ms and 0.331ms for the list. Yeah, that seems pretty quick, but then insertion at any location in my list took no longer than 0.003ms. You can do quite a bit better yourself. I think that's why anyone ever codes anything themselves. I always profile "canned" code and do it myself if I think it can be done better.

"You are too useless. And now I must beat you." - English subtitle in a Honk Kong Movie.

Edited by - lunarss on November 2, 2001 8:09:38 PM

Share this post


Link to post
Share on other sites
Linked-list may not be useless, but they're the container equivalent of a bubble sort (a bubble sort is _extremely small, so if you only have about 64bytes of ROM for your sorting code a bubble sort could come in handy - so there's one extreme case it's ideal). If you can use a list, you can use any [ordered] container. If you can use a bubble-sort you can use any sort. The number of nodes your working with must be so small, it can't really matter.

A linked-list is an easy way to keep an order list. If it's not ordered (i.e. the order doesn't mean anything, e.g. you can sort it) a linked-list serves little purpose.

In case it wasn't clear, It's impossible to use a binary search on a linked-list. You have to use a binary-search-tree (or some other kind of tree) to get fast searches and handle a dynamic number of nodes.


quote:

Keep in mind that stl is slooooow...


223ms to insert a node? Do you really think the STL could only insert 5 nodes per second on your computer? VB isn't even that slow :p I don't think you could make a class that performed that poorly if you tried! I bet you clocked the construction time as well as an insert... Unless you program your advanced tree container in optimized assembly - the STL will produce code that is faster than your own if you do the same thing with it.

It took my computer 4147 ticks on average to insert a node into a list, I did it a thousand times. That's 9.2us each (this is a 450Mhz machine). Slightly faster than the 3000us you say your container took - unless that was for a thousand inserts.

Magmai Kai Holmlor
- Not For Rent


Edited by - Magmai Kai Holmlor on November 3, 2001 3:21:51 AM

Share this post


Link to post
Share on other sites
Well, I do use the vc++ profiler if that tells you anything! That also means I'm using Microsoft's STL that was implemented by a company called "Dinkumware" hehe. Here are some stats I just collected, copied right out of my debug window (with a little formatting so they fit):

Using stl list:

This is a release build with full optimization. Nothing else is going on at all in the code besides 10,000 stl list.push_front calls.

Module Statistics for testbedfinal.exe
--------------------------------------
Time in module: 38.926 millisecond
Percent of time in module: 100.0%
Functions in module: 4
Hits in module: 20002
Module function coverage: 75.0%

Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
14.480 37.2 25.000 64.2 10000 std::list >::erase
13.926 35.8 38.926 100.0 1 _main (main.obj)
10.521 27.0 10.521 27.0 10001 operator delete


Here is the same operation done with my list. The function hits aren't caught by the profiler because they are all template based. That figure is the FULL time spent in the program including garbage collection.

Module Statistics for testbedfinal.exe
--------------------------------------
Time in module: 9.636 millisecond
Percent of time in module: 100.0%
Functions in module: 2
Hits in module: 1
Module function coverage: 50.0%

Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
9.636 100.0 9.636 100.0 1 _main (main.obj)

I don't know. Seems to me that my implementation is a bit faster. I also tested doing insertions with a really bloated class and my list still performed a little better.

Is SGI's STL implementation any better than Microsofts?


Edited by - lunarss on November 3, 2001 3:56:11 AM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
if (szName == pTemp->GetName())

In this line of code, don''t you want to use strcmp() to get a match?

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
if (szName == pTemp->GetName())

In this line of code, don''t you want to use strcmp() to get a match?



Thanks, I didn''t see this mistake.



Windows (N): A 32 Bit patch to a 16 bit graphical interface based on a 8 bit operating system originaly encoded for a 4 bit processor written by a 2 bit company that can''''t stand 1 bit of competition.

Share this post


Link to post
Share on other sites
Can you profile a release build?
The std::list was 40% faster in release (opt for speed)
Insert ten thousand nodes, and delete ten thousand nodes.
Then lets compare inserts to inserts, deletes to deletes, and then look at the over all.

(I only have the standard edition so I don't have the profiler

Magmai Kai Holmlor
- Not For Rent

Edited by - Magmai Kai Holmlor on November 3, 2001 3:31:47 PM

Share this post


Link to post
Share on other sites
Okay, I''m a wee bit confused on something...
char * EnemyLinkedList::BinarySearch(int iLow, int iHigh, char * 


First off... your using the name as the means of comparison... that much I can understand. But then... your passing the same name back, if you find it? ...Not sure how that would help you in your implementation, but okie fine. Now, the problem I see is with the lines

Enemy * pTemp = pHead;
...
for (UINT i = 0; i < middle; i++)
pTemp = pTemp->pRight;


Everytime you need to find the "middle" of the list, your start at the beginning and follow the "right" pointer. This conflicts with your statement "i need performance". Let''s say, for instance, that you have 100 nodes in your linked list, and the "match" is #90. On your first time through, you''ll have a "middle" of 50, so 50 steps through the list, before you make your comparison. Then, you see that the value you were looking for is farther right, so the next time through you set your "middle" to 75, and again zip through the list (50+75). Next time through, your looking at 87 (or 88, depending on sigfigs), so you step through again (50+75+87). Well, one final time through, and your going to step OVER #90 on your way to 94 (50+75+87+90 = 352 steps, 4+ comparisons), and your still not going to be there, as you only do your comparisons at the "middle" nodes.

Now, as you "need performance", I''m assuming your working with a wee bit more than 100 nodes. For 1000 nodes, this problem baloons upwards of 4000 steps or much higher, as your chance of hitting "middle" decreases greatly.

As I see it, you have one of three options.

First off, if you get to 50 and you still have to go further right, then don''t run through those first 50 nodes again! This should speed you up a LOT. Instead, try passing a "farthest left" node pointer into the function (on calling the function for the first time, just pass it pHead) Use it like you did pHead. You''ll have to play with your highs and lows, and use a temp pointer in the function should you need to backtrack.

If you really want your linked list (and are willing to have "decent" performance), ***AND keep it simple*** , do a single loop through the list, starting from pHead, and compare every node along the way.

BUT if your hellbent on implementing a (really really fast) binary search then use a binary search tree, not a linked list. With a tree, you would be looking at the same number of steps, as calls to BinarySearch. If you have 100 nodes or 100,000 you could still be looking at 5 steps to find #90(000).

Should you go the route of the binary tree and have trouble (or just need to get started), feel free to email me HERE and I''ll see what I can do.

-Tok


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
*Howdy Kids, Do /YOU/ know what time it is? It''s tangent time* -Baldor the Bold

Share this post


Link to post
Share on other sites
Hey Magmai! To answer your question; you can actually profile a release build with the integrated profiler. It comes in very handy for quickly looking at a peice of code. The comparison I was doing was for the whole ball of wax; inserting and deleting 10,000 nodes. I have found that with larger pieces of data, both lists are closer in speed. I think thats because stl uses a different memory management scheme where it takes nodes from a previously allocated heap so there isn''t the overhead of allocating the memory for larger objects. From what I know, memory management can be done better than how the stl does it, but thats another thing for another day. ; ) I''ll try doing some finer grained testing as you suggest, and I''ll let you know what I find. Have a good night. Later.

Share this post


Link to post
Share on other sites
quote:
223ms to insert a node? Do you really think the STL could only insert 5 nodes per second on your computer?


He actually said 0. 223ms to insert a node. So that's about 4484 inserts a second.

Edited by - Zipster on November 4, 2001 2:20:03 AM

Share this post


Link to post
Share on other sites
quote:
Original post by Grib
The frustration that c_wraith feels is I think the fact that for most people the linked list is used as a general purpose expanding container. Not only is it in every data structures book, it''s the first data structure, and often the most advanced you learn in an intro programming course. stl::vector can handle 90% of the cases that a lot of people are writing thier own linked lists for. 90% of the remaining 10% are better with a stl::map.


Precisely. And I''ll stand by my assertion that Linked Lists are very rarely the best data type to use... When looking at what operations will be used by a given application in one of its data structures, the ones used most often are rarely the ones that are efficient in a linked list.

As Stoffel pointed out, there are a couple cases where linked lists are more efficient than anything else. But those are very rare applications.

Share this post


Link to post
Share on other sites
The thing I think you're missing is the extensibility of linked lists. You can use them for stacks, skiplists, queues, deques, heaps, hash tables, and I'm sure many other containers that I didn't mention. All of those can be implemented easily using a linked list. There are ways to make them almost as fast as arrays too if you do your research. What other containers would you use for dynamic data anyway?? You can't make much use of an array because you'd continually need to resize it which is much less efficient than a simple node insertion. The bottom line is that if you don't know your data management requirements, you need a dynamic data structure; enter the linked list.

BTW, Magmai:
I'm fairly new to using the VC++ profiler and realized that the listed times are in seconds, not milliseconds. The profiler readout is misleading in that it denotes the data as milliseconds. So 0.013 is actually 13 milliseconds. I clocked, in release build insertion time for one int. Performance optimization was turned on for the build. The stl list clocked in at 13 milliseconds. My list clocked in at 3. I'm also going to profile it using the performance counter, but I think the results will be similar. That seems obscurely slow, but then again I'm only testing with a p2 450. ; )

Edited by - lunarss on November 5, 2001 1:24:29 AM

Share this post


Link to post
Share on other sites
quote:
Original post by lunarss
What other containers would you use for dynamic data anyway?? You can''t make much use of an array because you''d continually need to resize it which is much less efficient than a simple node insertion. The bottom line is that if you don''t know your data management requirements, you need a dynamic data structure; enter the linked list.

Dynamically reallocated arrays are a lot more efficient than you give them credit for... Have you actually timed them?

In my remarkably quick test using on-hand data structures (java''s java.util.LinkedList and java.util.Vector classes) I found that creating million element lists with each was about twice as fast with a dynamic array implementation (Vector) than with the linked list (LinkedList) implementation.

You may want to do more thorough testing in C, so that the environment is more deterministic. Due to hotspot optimizations, I saw some interesting (factor of two) fluctuations in performance at runtime in java. Still, the value I reported above is using the best time for the LinkedList, and the worst time for the Vector. I was just doing some quick and dirty testing to get some preliminary experimental numbers.

Share this post


Link to post
Share on other sites
It all comes down to usage. If you often remove or add elements from anywhere but one end of the container, vector will not be a good idea. If the order of those elements is important (e.g. unsorted), linked lists are the way to go. Otherwise you want a map or hash table of some kind.

Share this post


Link to post
Share on other sites