C++ STL Algorithms

Started by
21 comments, last by Antheus 12 years, 4 months ago
Hey all,

I've been brushing up on my algorithms and data structures lately and today I was messing around with everything in the STL. I had always known about qsort, but it seems like there are some easier functions available if you happen to already be using an STL container such as vector.

Namely something like:
vector<int> myVector;

sort(myVector.begin(), myVector.end());

I'm guessing that sort is implemented as a quicksort, but there is one part that I'm not sure about. The template for sort is

void sort(RandomAccessIterator first, RandomAccessIterator last);

I know that for the basic quicksort (taking first element as pivot) it's worst case is a sorted or reverse sorted list in which it runs at O(n^2). This can be avoided by choosing a random pivot which makes it's expected run time O(n log n). I'm guessing this is what RandomAccessIterator is about, but I'm not familiar with them. If I simply use vector.begin() and vector.end() is it converted to a RandomAccessIterator or do I need to do something special to insure a O(n log n) run time?

Thanks in advance!
Advertisement
I'm guessing this is what RandomAccessIterator is about[/quote]

No, it's not. RandomAccessIterator is simply a template parameter. Sort is expecting you to pass it iterators that support random access. This means, for instance, that std::sort cannot be used on a std::list.

http://www.cplusplus...algorithm/sort/

Approximately N*logN comparisons on average (where N is last-first).
In the worst case, up to N[sup]2[/sup], depending on specific sorting algorithm used by library implementation.


As you can see, the details are left to implementation, so there is no right answer. If you are really concerned about the details, you can check your STL headers out for yourself.
Some modern implementations use introsort, which is as fast as quicksort in practice and is O(n*log(n)) even in the worst case.

No, it's not. RandomAccessIterator is simply a template parameter. Sort is expecting you to pass it iterators that support random access. This means, for instance, that std::sort cannot be used on a std::list.


I'm a bit confused here, what exactly is an iterator that supports random access?

Thanks!
Well, a forward iterator is one that can be incremented, meaning you can always iterate forward and get the next object, but you can't iterate backwards and get the previous. A bidirectional iterator is one that can iterate in both directions, meaning you can get the next object or the previous. A random access iterator means you can get any object; the the next, the previous, one 5 places ahead or one 5 places behind...
An iterator is just a "concept", it's kind of like an "interface" or an "abstract base class", but it's invisible. It's a contract that you make, saying "this class satisfies these requirements".

e.g. to satisfy the RandomAccessIterator contract, needs to be able to (among other things)
* be incremented by an integer: e.g. i = i + 7 -- should increment the iterator forwards 7 places
* be incremented/decremented with ++/-- operators
* support the square bracket opeator, e.g. foo = i[7]
* support dereferencing, e.g. foo = *i;

N.B. standard raw pointers satisfy all of these requirements, which means that a raw pointer is a RandomAccessIterator, and thus can be passed to any template-algorithm that requires one:
const char* text = "Hello!";
const char* begin = text;
const char* end = text+6;
std::sort( begin, end );
printf(text);


Basically, if an algorithm says that it operates on RandomAccessIterators, it actually means that it will operate on any type that you pass to it, but it expects this type to satisfy the above list of requirements.
Okay, this is making sense now. I appreciate the help. I'm having trouble finding any examples that use RandomAccessIterator, for regular iterators I could do vector<int>::iterator itr. How can this be done with a random iterator?

Edit: Okay, so the random access is handled for me then? If make a sorted vector and call sort on it, passing in vector.begin and vector.end iterators, the expected run time will still be O(n log n)?
The iterators belong to a class heirarchy. Because a Random Access Iterator satisfies the requirements of a Bidirectional Iterator in that it can definitely go forwards and backwards, RandomAccessIterator is derived from BidirectionalIterator. Also, because a Bidirectional Iterator satisfies the requirements of a plain old Iterator in that it can definitely go forwards, BidirectionalIteratoris derived from an ordinary Iterator.

e.g. If a function takes a BidirectionalIterator then it can be passed a RandomAccessIterator, or a BidirectionalIterator, but not an ordinary forwards only iterator.

Since a lot of algorithms out there don't actually require random access, they don't force you to have a RandomAccessIterator. That doesn't mean that you can't give them one, as it still fits the requirements even if it goes above and beyond them.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Hodgman's example doesn't work because the string is const. This works better:
char text[] = "Hello!";
char* begin = text;
char* end = text+6;
std::sort( begin, end );
printf(text);
An iterator is considered "random access" if you can move an arbitrary number of elements forward or back in constant time. For example, an array pointer is random access because you can look up any element in the array through simple pointer arithmetic. In some containers, such as list, you need to visit each element in order to find the element adjacent to it, so the container only supports bidirectional iteration. Certain structures such as singly-linked lists or connections to external resources are unidirectional, so you can only move forward.

So "random access iterators" have nothing to do with avoiding quick-sort's worst case whatsoever.

The actual implementation of quick-sort can vary considerably, but the objective is to choose pivot points as close to the middle of the range as often as possible. A particular implementation MAY experience it's worst case when the elements are sorted. Or it may check for that, possibly resulting in a sorted range being the best case instead. A naive implementation might overflow the stack in it's worst case, but this is easily countered by only recursively calling the function on the smaller of the two partitions. Finally, pure quick-sort results in too many recursive function calls, so smarter implementations will opt to perform some other sort (e.g. insertion) when the partitions are split small enough.
The main purpose of random pivot selection is actually about preventing deliberate exploitation of quick-sort's weakness. Simple issues such as preventing poor performance when sorting an already sorted range can be solved with other means.


This topic is closed to new replies.

Advertisement