#### Archived

This topic is now archived and is closed to further replies.

# How to sort a VERY large array

This topic is 6292 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I''m developing an expert system and I''m in trouble accesing data in the knowledge base fast enough. I have an array that grows to hold millions of items. Each item holds responses like yes/no/maybe/na to specific questions. When it comes to find a specific diagnostics to a chain of symptoms, a question is asked and all ''no'' clauses must not be accesed again. The next step is to take the possible clause with the most ''yes'' matches and ask the next question. My problem is finding the item with ''the most yes answers'', especially in early stages of searching. I don''t think I''ve been clear enough, but my English is not helping me right now. Thank you for your responses Regards, Horia Of all the things I''''ve lost I miss my mind the most

##### Share on other sites
well assuming you represent the various responses with numerical values (eg no = 0, yes = 1, maybe = 2, na = 3) the best sort routines would be either quick sort or a histogram sort (you create a bin per possible value, run through the list sticking items in the appropriate bin, then recombine as appropriate - much quicker than quicksort if the range of values is known)

I am not too sure I understand your problem, but I am guessing a nice solution involving trees might be worth looking at...

##### Share on other sites
It also sounds like a Priority Queue would be a good choice for your application... Possibly based on a Binary Heap, possibly based on something more complicated like a Fibonacci Heap.

##### Share on other sites
I was told that quicksort was the fastest sorting algorithm, and that there is a proof that proves that you cannot sort an array/list/vector/whatever faster than O(log 2n). But you''re saying that a histogram sort is faster?

##### Share on other sites
Quicksort is O(n log n), not O(log n). And big-O doesn''t tell you exactly how long it would take. It means that the time it takes is -proportional- to some function of n. There could be another sort that still runs O(n log n), and is faster than a quicksort. If one sort has a lower complexity than another, it means that after a sufficient number of terms, the former will beat the latter. For example, a bidirectional selection sort will completely own quicksort up to quite large values of n, when sorting an integer array.

The histogram/bucket/radix sort runs O(n). For several million terms, it''s the best option.

##### Share on other sites
The easiest (and pretty darn fast) way to sort any data, I''ve found, is just the standard library qsort function.

-------------------------------
NeXe: NeHe DirectX-style.

##### Share on other sites
If I''m not mistaken, the average case time complexity on quicksort is @(n) with @ being theta, with O(nlogn) being worst case. Just something you might want to take into account.

##### Share on other sites
doesn''t quicksort act funny on huge data sets?

"You won''t get wise with the sleep still in your eyes." - Neil Peart

##### Share on other sites
Maybe i''m missing the question, but why not use a red black tree ? That way your data items are always sorted (no inserting and then resorting). And searches are very fast, eliminating half the tree at each node.

ECKILLER

##### Share on other sites
Thank you for your answers. Although I''m not aware of some algorithms you mentioned earlier (like fibbonaci heap, priority queue,..) I will start looking for details.
I don''t think anything like a tree would work because I have 4 possible ways to go down in a tree and I have to cut just a brench (the one with "no"s) and I would have to create the tree at each question.
qsort may work, but I think my array is too long, and I don''t have to have the array sorted (sorry about the title of the post), I just want the take out the item with the most ''yes''s and continue to eliminate the ''no''s.
BTW, this program is going to run on a 4 processor machine and I would like to use the most of each processor.

Regards,
Horia

Of all the things I''''ve lost I miss my mind the most

1. 1
Rutin
25
2. 2
3. 3
4. 4
JoeJ
18
5. 5

• 14
• 14
• 11
• 11
• 9
• ### Forum Statistics

• Total Topics
631757
• Total Posts
3002141
×