I am implementing a parallel sorting by regular sampling algorithm which is described here. I am stuck in a point at which I need to migrate sorted sublists to proper places of the sorted array. The problem can be stated in that way: There is one global array. The array has been divided into p subarrays.Each of those subarrays was sorted. p-1 global pivot elements were determined and each sub-array was divided into p sub-sub arrays (yellow, red, green). Now I need to move those sub-sub-arrays so that sub-sub-arrays with local index i are in the thread i (so they are ordered in such manner at which colors are neighbouring and the order from left to right remains).
Actually serial algorithm will do, but I just have no clever idea how to obtain a proper permutation. The following figure shows a case for p=3 threads. Yellow color denotes a sub-sub-array 0, red - 1, green - 2.
[attachment=16730:ProblemImg.png]
The sub-sub arrays may have different sizes.