In addition to Hodgman's implemtation changes:- Is it possible to have better performances ? How ?

1) Choose a random pivot element, rather than the first. This greatly reduces the probability of performance leaning in the O(n^2) direction for poorly distributed (semi-sorted to fully sorted) input data. You may want to also compute the median of the first, second and middle elements, or the median of a random subset (trade off between better medians, and more computation to get them).

2) Drop to a simpler sort when the data gets small enough. E.g., insertion sort (at approx. 8-16 elements) in order to reduce overhead. A less conservative switch (which IIRC std::sort does), is a switch to a heap sort after either a certain memory footprint size is reached or stack depth is reached. This is because this approach has a bounded stack depth, and HS becomes more efficient (due to it not being inhibited by it's cache unfriendly memory access patterns on larger sets) for smaller data sets.

3) If you are using primitive types, use a SIMD sorting network when the data set for a particular recursion is small enough.

4) Separate your sorting keys from your data for cache and swap efficiency.

5) Sort multi-threaded.

6) Sort distributed.