I've been searching for a parallel Qsort algorithm.
Best would fit algorithm that can be parallelized for any number of processors/threads because application will be installed on PC's as well as on clusters (is it at all posiible to palallelize it that way?).
However I have none experience in writing parallel programs so I would need quite simple thing.
Thanks in advance,
PS: I did google some things but what I have found were only algorithms paralelized into 2 threads.
I suggest looking at this blog post about sorting algorithms. It discuss the implementation of sorting algorithms on the PS3. It doesn't discuss parallel quicksort, but I think the suggested parallel sorting algorithm (Parallel AA-Sort) described at the end of that post is superior.
EDIT: Some ideas of this algorithm are particularly important on clusters. You probably want to break the data in chunk and then send each part to a computer in the network to sort. Once a part is sorted, you can then merge the various parts trying to minimize network traffic.
1. Divide the data equally between the processors.
2. Sort the data on the individual processors as normal (std::sort() is much quicker than qsort).
3. Once all the sorts are done merge the results. O(n)
You don't just pick an algorithm and then parallelize it. If your goal is speed then you first need to pick the most appropriate algorithm for your circumstances. You need to consider at least the following things, and this is not an exhaustive list.
What are the typical, min and max number of items to sort?
How common are suplicate values?
What type is being sorted, and is it expensive to copy/move?
How quickly can the type of item being sorted be compared?
How much memory can you spare to improve the speed?
Can you hold all the data in RAM whilst sorting it?
Is the data likely to often be in a close-to-sorted order already?
Can you simply keep the data in a sorted container instead of re-sorting it repeatedly?
Having said that, quicksort and mergesort will both parallelise just fine. Quicksort doesn't quite parallelise as well because the workload will more often be distributed unevenly. I actually prefer to use FlashSort in many cases, which has the additional requirement that items be subtractable, or I sometimes pick something else... You might want to take a look at my sig.