C++ STL Algorithms

Darkbouncer4689 · 2011-12-13T15:35:06

Hey all, I've been brushing up on my algorithms and data structures lately and today I was messing around with everything in the STL. I had always known about qsort, but it seems like there are some easier functions available if you happen to already be using an STL container such as vector. Namely something like: vector<int> myVector; sort(myVector.begin(), myVector.end()); I'm guessing that sort is implemented as a quicksort, but there is one part that I'm not sure about. The template for sort is void sort(RandomAccessIterator first, RandomAccessIterator last); I know that for the basic quicksort (taking first element as pivot) it's worst case is a sorted or reverse sorted list in which it runs at O(n^2). This can be avoided by choosing a random pivot which makes it's expected run time O(n log n). I'm guessing this is what RandomAccessIterator is about, but I'm not familiar with them. If I simply use vector.begin() and vector.end() is it converted to a RandomAccessIterator or do I need to do something special to insure a O(n log n) run time? Thanks in advance!

tbrick

109

December 12, 2011 03:38 PM

Maybe a dumb question, but why do you need to ensure a n log n run time?

Unless you are sorting VERY large lists in a real-time environment, and you have profiled that this is a verified choke-point in your program (and it matters), you are simply doing "optimization for optimization's sake." And that is nearly always a bad thing.

Adam_42

3,664

December 12, 2011 04:30 PM

Actually lists don't need to be that big for an N log N algorithm to be much faster than an N^2 one. For example if you have 10,000 objects the N log N algorithm should be about 750 times quicker (using a base 2 logarithm, and ignoring the constant factors).

Matt-D

1,574

December 12, 2011 04:39 PM

Regarding the iterators -- just read this:
http://www.sgi.com/t.../Iterators.html
and then this:
http://www.cplusplus...e/std/iterator/
This should be all that you need to get started! :-)

Interestingly, you might notice that (other than asymptotic algorithmic complexity) C++ std::sort is faster than C qsort since it (or, strictly speaking, the comparison function) can be inlined and thus more effectively optimized by the compiler:
http://stackoverflow...sort-vs-stdsort

Antheus

2,410

December 12, 2011 05:41 PM

Maybe a dumb question, but why do you need to ensure a n log n run time?

Unless you are sorting VERY large lists in a real-time environment, and you have profiled that this is a verified choke-point in your program (and it matters), you are simply doing "optimization for optimization's sake." And that is nearly always a bad thing.

Because having your application randomly hang for no apparent reason and in an impossible to debug way is a bad practice. Having average time of 0.1 second but spikes of 2.5 minutes is not acceptable.

Ruby on Rails exhibited some such fundamental flaws in parsing which caused no end to problems in production due to naive or incorrect algorithms being used for some fundamental problems. Biggest pain was unexpected delays, since average times were as expected.

For something as fundamental as sorting, the excuse of premature optimization simply doesn't hold.

Another example: I once used a third-party multi-threading task library. I crashed hard and fast. Turns out, it used recursive algorithm for partitioning, resulting in n^2 stack growth. If developer of a library makes such a fundamental mistake, it loses any and all credibility on quality of rest of code. In my case it led to automatic rejection of any third-party threading library not published by one of proven players in the field.

Hodgman

52,718

December 12, 2011 11:40 PM

Actually lists don't need to be that big for an N log N algorithm to be much faster than an N^2 one. For example if you have 10,000 objects the N log N algorithm should be about 750 times quicker (using a base 2 logarithm, and ignoring the constant factors).

In that case, why not go the whole hog and use an O(N) sorting algorithm instead, which would be about 13 times faster than the O(N log N) one, and 10,000 times faster than the O(N[sup]^2[/sup]) one

. 22 Racing Series .

Matt-D

1,574

December 13, 2011 02:53 AM

[quote name='Adam_42' timestamp='1323707411' post='4893163']
Actually lists don't need to be that big for an N log N algorithm to be much faster than an N^2 one. For example if you have 10,000 objects the N log N algorithm should be about 750 times quicker (using a base 2 logarithm, and ignoring the constant factors).

In that case, why not go the whole hog and use an O(N) sorting algorithm instead, which would be about 13 times faster than the O(N log N) one, and 10,000 times faster than the O(N[sup]^2[/sup]) one

[/quote]

AFAIK, there's no such thing as a general sorting algorithm with worst-case time complexity below O(n log n) (linearithmic) that remains practical:
http://en.wikipedia....n_of_algorithms

You might fare better in special (not general) cases where you know something extra about the data to be sorted, such as integer sorting.

In theory, there are some "general" algorithms w/ O(n) complexity, but they "are impractical for real-life use due to extremely poor performance or a requirement for specialized hardware."

EDIT: this abstract explains the situation pretty well:
"For all the sorting algorithms, it is an accepted performance limit that sorting algorithms are linearithmic or O(N lg N). The linearithmic lower bound in performance stems from the fact that the sorting algorithms use the ordering property of the data. The sorting algorithm uses comparison by the ordering property to arrange the data elements from an initial permutation into a sorted permutation.
Linear O(N) sorting algorithms exist, but use a priori knowledge of the data to use a specific property of the data and thus have greater performance. In contrast, the linearithmic sorting algorithms are generalized by using a universal property of data-comparison, but have a linearithmic performance lower bound. The trade-off in sorting algorithms is generality for performance by the chosen property used to sort the data elements. "

From the above-cited paper:
"Linearithmic Sorting

A sort algorithm performs sorting, and most sorting algorithms are comparison based sorting al-
gorithms. The comparison sorting algorithms include such algorithms as the merge sort, quick
sort, and heap sort. These sorting algorithms use comparison to arrange the elements in the
sorted permutation, and are general-purpose in nature.

The comparison sorting algorithms have a well-known theoretical [Johnsonbaugh and Schaefer
2004] performance limit that is the least upper bound for sorting that is linearithmic or O(N lg N)
in complexity. This theoretical lower bound is from the basis of the comparison sorting algo-
rithm-using sorting to arrange the data elements. A decision tree for N elements is of logarithmic
height lg N. Thus the time complexity involves the cost that is the cost of using a decision tree to
compare elements lg N and the number of elements N. Hence the theoretical least upper bound is
the product of the two costs involved, O(N lg N), which is linearithmic complexity."

Hodgman

52,718

December 13, 2011 03:39 AM

AFAIK, there's no such thing as a comparison-based sorting algorithm with worst-case time complexity below O(n log n) (linearithmic)
...

^^ fixed that for you (bold bit).

Any data where you can express ordering as a bitwise greater/less-than can be sorted with an integer sort. This includes integers, floating-point numbers, text, etc... pretty much all the fundamental data-types. Composite sorting rules simply place higher-priority rules in more significant bits, however, the complexity of most of these algorithms is related to the size of your integer 'key', so once it increases past a certain point, you'd be better off with the 'inferior' O(N log N) algorithms.

It's very common to see these kinds of O(K.N) sorting algorithms used inside game engines on decently sized data-sets. On smaller data-sets, introsort is good enough (and has lower 'constant' overhead).

. 22 Racing Series .

Matt-D

1,574

December 13, 2011 03:51 AM

[quote name='Matt-D' timestamp='1323744816' post='4893366']
AFAIK, there's no such thing as a comparison-based sorting algorithm with worst-case time complexity below O(n log n) (linearithmic)
...

^^ fixed that for you (bold bit).

Any data where you can express ordering as a bitwise greater/less-than can be sorted with an integer sort. This includes integers, floating-point numbers, text, etc... pretty much all the fundamental data-types. Composite sorting rules simply place higher-priority rules in more significant bits, however, the complexity of most of these algorithms is related to the size of your integer 'key', so once it increases past a certain point, you'd be better off with the 'inferior' O(N log N) algorithms.

It's very common to see these kinds of O(K.N) sorting algorithms used inside game engines on decently sized data-sets. On smaller data-sets, introsort is good enough (and has lower 'constant' overhead).
[/quote]

Well, if you're concerned about using precise terminology (a worthy concern, so do not take this as a criticism), I believe you wanted to say that you "specified" or "explained", not "fixed" that. ;-)
Correct me if I'm wrong, but you're not disagreeing with me that the general-purpose in this context (sorting algos) is interchangeable with comparison-based, while integer sorting is still special-purpose (even "pretty much all the fundamental data-types" is not "all the data-types", which is important for anyone concerned with precision), right?
The complexity and dependence on key w/ trade-offs this introduces is what I meant by "general" in conjunction w/ the "remains practical" part, not disagreeing with your explanation here.

Hodgman

52,718

December 13, 2011 04:27 AM

Should've put a cheeky

onto FTFY.

I wouldn't necessarily agree that comparison-based is general-purpose whereas bit-based in special-purpose -- I've personally not run into a case where I couldn't convert between one method or the other.

. 22 Racing Series .

iMalc

2,466

December 13, 2011 06:12 AM

My sources tell me that C++0x requires a guaranteed worse case complexity of O(n * log(n)) for std::sort.

Though it was only specified as average case O(n * log(n)) in C++03, most compilers tended to use IntroSort anyway which meets the C++0x requirement.

Bottom line is that you should probably not be concerned about worst case if you use std::sort on any compiler made or updated this millenium.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

C++ STL Algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

C++ STL Algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines