Ranges and the STL

Started by
5 comments, last by gekko 11 years, 3 months ago

So a while back I saw Andrei Alexandrescu's talk on ranges vs iterators and was inspired. So I rewrote a portion of the STL to use ranges to see if what he was saying was correct and how things worked out. I've finished a bunch of range traits and functions, a vector and list container, and have been working through the usual standard algorithms (find, search, unique, ect...) and came across a weird problem. Before I get to the question though a quick review of how I implemented ranges.

Working with ranges is closer to working with the raw containers than iterators interface-wise. You have Empty() to test if there are elements in the range. PopFront() and PopBack() remove the first/last element from the range. Front() and Back() are used to access the first/last element. There's also a Slice() function which allows me to take a subrange of the range. One thing to note is that ranges cannot be 'enlarged'.

In the D programming language there's a function called findSplit() http://dlang.org/phobos/std_algorithm.html#findSplit which I was trying to implement. With iterators this is a simple single pass algorithm regardless of the iterator type (ie. the iterators can be forward, bidirectional, random, any way it works easily). With ranges I can implement the algorithm with random ranges (ie. the type of range that one would get from a vector) as the Slice() function is constant time for random ranges. Slice() is linear for bidirectional ranges (the type of range used to traverse a list), which makes the findSplit() algorithm take essentially multiple passes (which for such an essential algorithm is unacceptable).

I can't think of any way to do this algorithm cleanly and efficiently with ranges, but it seems Andrei succeeded. Am I missing something? Or is this a fundamental problem with ranges? Any thoughts/ideas are welcome.

Advertisement

If I'm reading the D docs correctly, this is just a find algorithm that returns 3 ranges: all elements before the first match was found, the first matching element and all neighboring matches elements, all elements after. If so, then yes, this is a fundamental "problem" with ranges. Generally with find you return the range beginning with the matching element and ending with the end of the range. But that decision is fairly arbitrary, it's really just because it's easier to code: reduce the range with PopFront until you find a match, then return the range.

I've done some work with implementing a pure range interface and you run into this issue quite a bit. By pure range interface I simply mean there are no iterators, so I can't fall back on them in situations like this. I've encountered quite a few design issues, where you need to decide what is the correct range to return in a given situation. For this situation, if you supported finding the difference between two ranges you could pull it off. Given the flexibility of ranges to do chaining, infinite ranges and other crazy things, I don't think that is generically possible.

So my suggestion is to provide a new range type which limits an existing range to N elements. Just needs to be a forward range which contains the original range and a counter. Then report empty when either the actual range is empty (never in this case), or you popped N times. Then just count how many elements came before the first match, limit the original range to that count. Do that again for the range of matching elements, and then what your left with should be the final range.

-- gekko

What's wrong with the naive solution, where you maintain 3 separate ranges, and linearly search for the pivot?


def find_split<T>(r : range<T>, pivot : T):
    left : range<T> = r
    centre : range<T> = r
    right : range<T> = r
   
    while centre.front() != pivot:
        right.popFront()
        centre.popFront()
   
    while len(centre) > 1:
        left.popBack()
        centre.popBack()

    return (left, centre, right)

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

[quote name='swiftcoder' timestamp='1358522062' post='5022910']
What's wrong with the naive solution, where you maintain 3 separate ranges, and linearly search for the pivot?
[/quote]

Similar to iterator categories, that would restrict the algorithm to only accept ranges that are bi-directional and support length. You should be able to do the algorithm using only forward ranges, which limits you interface to Empty, PopFront, and Front.

-- gekko
Similar to iterator categories, that would restrict the algorithm to only accept ranges that are bi-directional and support length.
Length is purely an artefact, no? You can implement length as a linear-time operation on any range, and more realistically you can cache the length up front (a la most implementations of std::list).

The only way I can see that we wouldn't be able to cache the length, is if we can construct ranges from arbitrary iterator pairs - but in that case, we automatically have an O(1) slice, and we don't need the length at all.
You should be able to do the algorithm using only forward ranges, which limits you interface to Empty, PopFront, and Front.
You can't implement the algorithm *at all* under those constraints - forget about efficiency. You have to have a slice operator available, if you want to operate on forward iterators.

And once you have a O(N) slice operator, a linear-time solution is available:
def find_split<T>(r : range<T>, pivot : T):
    centre : range<T> = r
    right : range<T> = r

    while centre.front != pivot:
        centre.popFront()
        right.popFront()

    right.popFront()

    left : range<T> = r.slice(r.start, centre.start)

    return (left, centre, right)

Of course, I have a feeling that the OP was looking for a linear-time solution with a constant factor of one. That is indeed not possible with pure ranges.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

If I'm reading the D docs correctly, this is just a find algorithm that returns 3 ranges: all elements before the first match was found, the first matching element and all neighboring matches elements, all elements after. If so, then yes, this is a fundamental "problem" with ranges. Generally with find you return the range beginning with the matching element and ending with the end of the range. But that decision is fairly arbitrary, it's really just because it's easier to code: reduce the range with PopFront until you find a match, then return the range.

I've done some work with implementing a pure range interface and you run into this issue quite a bit. By pure range interface I simply mean there are no iterators, so I can't fall back on them in situations like this. I've encountered quite a few design issues, where you need to decide what is the correct range to return in a given situation. For this situation, if you supported finding the difference between two ranges you could pull it off. Given the flexibility of ranges to do chaining, infinite ranges and other crazy things, I don't think that is generically possible.

So my suggestion is to provide a new range type which limits an existing range to N elements. Just needs to be a forward range which contains the original range and a counter. Then report empty when either the actual range is empty (never in this case), or you popped N times. Then just count how many elements came before the first match, limit the original range to that count. Do that again for the range of matching elements, and then what your left with should be the final range.


Ya it seems that I'll be doing the same thing. I have the function Take(range,n), which returns a 'limited' forward range (as you described). It seems Andrei came across the same problem. In the docs for findSplit() he writes: "If haystack is a random-access range, all three components of the tuple have the same type as haystack. Otherwise, haystack must be a forward range and the type of result[0] and result[1] is the same as std.range.takeExactly.". It seems that's the best we can do given non-random iterators, though seems a bit messy IMHO and the return type will change depending on the arguments to the function.

Similar to iterator categories, that would restrict the algorithm to only accept ranges that are bi-directional and support length.
Length is purely an artefact, no? You can implement length as a linear-time operation on any range, and more realistically you can cache the length up front (a la most implementations of std::list).

The only way I can see that we wouldn't be able to cache the length, is if we can construct ranges from arbitrary iterator pairs - but in that case, we automatically have an O(1) slice, and we don't need the length at all.
You should be able to do the algorithm using only forward ranges, which limits you interface to Empty, PopFront, and Front.
You can't implement the algorithm *at all* under those constraints - forget about efficiency. You have to have a slice operator available, if you want to operate on forward iterators.

And once you have a O(N) slice operator, a linear-time solution is available:

def find_split<T>(r : range<T>, pivot : T):
    centre : range<T> = r
    right : range<T> = r

    while centre.front != pivot:
        centre.popFront()
        right.popFront()

    right.popFront()

    left : range<T> = r.slice(r.start, centre.start)

    return (left, centre, right)

Of course, I have a feeling that the OP was looking for a linear-time solution with a constant factor of one. That is indeed not possible with pure ranges.

Ranges can come from many more sources than iterators/containers. There are many useful ways to construct ranges, most algorithms have a 'lazy' version, which leads to some very interesting compositions. So I find I come across Forward ranges more often than I do forward iterators. With the function Take() described above you can implement Slice() for any range in linear time.

Length is purely an artefact, no? You can implement length as a linear-time operation on any range, and more realistically you can cache the length up front (a la most implementations of std::list).

The only way I can see that we wouldn't be able to cache the length, is if we can construct ranges from arbitrary iterator pairs - but in that case, we automatically have an O(1) slice, and we don't need the length at all.

I don't think ranges could be considered a successful alternative to iterators if they require that all ranges must know their size. There is no reason at all ranges shouldn't be able to work on list implementations which do not cache a size, and you wouldn't be able to handle many intrusive list implementations. It would also mean infinite ranges couldn't be supported.

More importantly, I don't think there is any need for knowing the length. It can be used to optimize some implementations, but they certainly aren't necessary for most algorithms, much the same as iterators.

You can't implement the algorithm *at all* under those constraints - forget about efficiency.

Yes you can, I described how in my first reply. I just found out it's called take in D. Here's some pseudo-code that should illustrate the idea:


template <typename ForwardRange>
class TakeRange
{
public:
    TakeRange(const ForwardRange& r, size_t n) : m_Range(r), m_RemainingElements(n) {}

    bool Empty() const
    {
        return m_RemainingElements == 0 || m_Range.Empty();
    }

    auto Front() const -> decltype(m_Range.Front())
    {
        assert(!m_Range.Empty());
        return m_Range.Front();
    }

    void PopFront()
    {
        assert(!m_Range.Empty());
        m_Range.PopFront();
        --m_RemainingElements;
    }

private:
    ForwardRange m_Range;
    size_t m_RemainingElements;
};


template <typename ForwardRange, typename T>
tuple<TakeRange<ForwardRange>, TakeRange<ForwardRange>, ForwardRange> FindSplit(const ForwardRange& r, const T& value)
{
    ForwardRange rightRange(r);

    size_t count = 0;
    while (!rightRange.Empty() && rightRange.Front() != value)
    {
        rightRange.PopFront();
        ++count;
    }

    auto leftRange = TakeRange(r, count);
    auto centerRange = TakeRange(rightRange, 1);

    if (!rightRange.Empty())
    {
        rightRange.PopFront();
    }

    return { leftRange, centerRange, rightRange };
}

You can certainly wrap a range in another range which was mention in the original presentation to allow reversal, chaining, lockstep, etc. The idea here is since we can't move the actual end of the range since it's only a forward range, but we can limit the range to only return up to N elements, which is what "take" does. As you can see from the sample implementation, it can be implemented entirely with three operations: Empty, Front, and PopFront.

-- gekko

This topic is closed to new replies.

Advertisement