**2**

# [C++] - Multithreaded Bubble Sort

###
#1
Members - Reputation: **240**

Posted 04 December 2012 - 08:13 AM

I'm working on multithreaded bubble sort example in C / C++.

I have an array of N numbers,

then I create M threads,

and divide my array of numbers into M parts - one part per thread.

Then I sort (N / M) numbers in each thread...

but what should I do next?

I dont know how to "merge" the results of multithreaded sorting.

Right now I end up with array that has M sorted sections, but I need it to be sorted entirely.

Any ideas?

PS: I need to use THREADS and Bubble sort together. The question is only how to merge the results....

Thanks for any advices and comments

###
#2
Crossbones+ - Reputation: **10554**

Posted 04 December 2012 - 08:28 AM

Is this for a school assignment, by the way? If not, you might be better served by trying to implement a multithreaded merge sort, which is more interesting and is actually useful.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

- *Pessimal Algorithms and Simplexity Analysis*

###
#3
Crossbones+ - Reputation: **3234**

Posted 04 December 2012 - 08:45 AM

If this is about trying to implement an actual efficient parallel sorting algorithm you should look at other approaches, possibly an odd-even mergesort which can be implemented using a sorting network using simple CAS elements.

I gets all your texture budgets!

###
#4
Crossbones+ - Reputation: **2462**

Posted 04 December 2012 - 03:46 PM

I second this. Split array into two, spawn two threads and each handle its own segment. Recursively do this until you reach the smallest subset.Look up merge sort - it inherently merges two sorted lists together, you'll find how to do it there. The result is basically a simplified selection sort, taking advantage of the fact that the two lists are already sorted, which runs in O(n) time and uses a temporary array. You could extend this to an arbitrary number of lists, or you could do the merging in parallel too (which would be more useful, because doing the merge step on a single thread defeats your use of multithreading in the sorting step).

Is this for a school assignment, by the way? If not, you might be better served by trying to implement a multithreaded merge sort, which is more interesting and is actually useful.

Merging the array shouldn't be too compliated either. Since each thread spawns two subthreads, you just wait until both threads finished executing, then merge, then flag its parent thread.

###
#5
Moderators - Reputation: **1398**

Posted 18 December 2012 - 09:49 PM

Quick-sort iterates the entire array over and over until nothing changes position.

Merge sort divides and conquers, sort top-half, sort bottom-half, recursive, unwind and shuffle into place (linear algorithm now since both sub-arrays are known-sorted).

To parallel quick-sort you have to lock each element of the array.

You need a parallel array of spin-locks (that's POSIX, for Win32 they are called 'critical-sections') and you need to lock the two elements you are about to compare. Compare, swap-if-needed, then unlock. You have to swap the spin-locks as well to keep the two arrays parallel!

Now you break the array into n pieces, one piece for each thread.

You need to check the element before and after your chunk (don't blow the bounds of the array!) to see if you need to swap them.

Each thread bubble-sorts it's chunk of the array starting from the top down.

The thread pauses (use a semaphore) if it makes a pass and nothing is swapped.

If another thread tosses a new element into a chunk it has to kick that chunks' semaphore to tell that thread it has to start sorting again.

Keep going until all threads are paused and it should be done.

Then you set an exit flag and kick the semaphores to shake-them-loose and terminate.

**Edited by Shannon Barber, 18 December 2012 - 09:50 PM.**

###
#6
Crossbones+ - Reputation: **5638**

Posted 19 December 2012 - 07:00 AM

However, after thinking about it for a minute just now, I figured that it is actually a quite interesting exercise. And, in fact,

*not so silly at all*.

Bubblesort is, surprisingly, actually a quite good fit for multithreading (not perfect, but quite good!). Bubblesort runs several passes over the complete set of data, only ever examining two adjacent values and swapping them if they're not in order. It's a O(N

^{2}) average algorithm. Obviously, smaller pieces of data will therefore be considerably faster (using 4 threads on partitions 1/4 the size reduces the number of operations to 1/16). That means you're doing better than Amdahl's law!

You can trivially partition the set into N pieces and run the N pieces in N threads, in parallel. You can then, after syncing at a barrier, merge the partitions by taking the i-th element of every partition, which gives an "almost sorted" dataset. On a perfectly evenly distributed dataset, it would be sorted, not "almost sorted", but of course you want it to work for any data. A final pass of bubblesort over the whole set makes the "almost sorted" set sorted. Sorting "almost sorted" data with bubblesort is very efficient, usually a single pass.

Now, if you want to do it more elegantly than the trivial approach of N threads sorting N pieces, you can for example have N threads sort 4*N pieces, using a worker queue. This has a little added complexity, but considers that not all sub-partitions will take the same number of iterations. Thus, you avoid CPU cores going idle while they wait for the slowest one to finish.

**Edited by samoth, 19 December 2012 - 07:03 AM.**