Splitting Strings (c++)

Started by
3 comments, last by Brother Bob 16 years, 10 months ago
I noticed that Boost Libraries has a function for this, however I was disappointed to find that it does not use output iterators. Since I feel that output iterators are far more flexible than any imaginary container independent code (according to Scott Meyers "Effective STL" there is no such thing) I decided to write my own:

// Splits string s at delimiter placing resultant strings into output iterator dest
template <typename OutputIterator>
void Split(const std::string s,char delimiter,OutputIterator &dest,bool compress=true)
{
	std::string buffer;
	for (unsigned i=0;i<s.length();i++)
	{
		if (s!=delimiter)
			buffer.push_back(s);
		else if (buffer.length()||!compress)
		{// skip blank entries if compression is on, advance after fill entry
			// copy buffer
			*dest++=buffer;
			// clear buffer
			buffer.clear();
		}
	}
	// add last entry
	if (buffer.length()||!compress)
		*dest=buffer;
};
Programming since 1995.
Advertisement
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:
tokenizer<> tok(s);copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.
Im not sure exactly what feedback you want but since your a fan of iterators why not make the input a pair of iterators allowing you to use it with a std::vector<char> instead of having to use a std::string and while we're at it lets remove the dependence on chars so we can use std::vector<wchar_t>'s or something.

// Split now takes two paramaters begin and end which represent the// range to split and output the results to dest.template<typename InIter, typename OutIter>void Split( InIter begin          , InIter end          , typename std::iterator_traits<InIter>::value_type delimiter          , OutIter dest          , bool compress=true          ){    typedef typename std::iterator_traits<InIter>::value_type value_type;    typedef typename std::iterator_traits<OutIter>::value_type out_type;    InIter buffer_start = begin;    InIter buffer_end = begin;    for (InIter i = begin; i != end; ++i)    {        if (*i != delimiter)        {            ++buffer_end;        }        else if (buffer_start != buffer_end || !compress)        {            *dest = out_type(buffer_start, buffer_end);            ++dest;            buffer_start = i;            buffer_end = i;        }    }    if (buffer_start != buffer_end || !compress)    {        *dest = out_type(buffer_start, buffer_end);    }};


Edit: Fixed a couple of errors.
Quote:Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:
tokenizer<> tok(s);copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

Thank you Julian90, I like your solution although I think it poses some restrictions on the output iterator (out_type(buffer_start, buffer_end); must be valid) I don't currently foresee a need for any output iterators that would be incompatible. But it does have the nice performance benefit of not needing a std:string buffer.
Programming since 1995.
Quote:Original post by T1Oracle
Quote:Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:
tokenizer<> tok(s);copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

I searched through the Boost documentation (not too much, just glanced at the places I figured one would belong to) and couldn't find anything regarding a split function as you implemented it first, so I assumed you were talking about the tokenizer. I apologize for that.

Anyways, I can imagine how it could be implemented in quadratic time, but I can't see why it would be. The tokenizer is linear time. Stepping through the code, I see they (yours and the one I posted) do pretty much the same thing; copy characters to a temporary token holder until a delimiter, and pass it to the output iterator.

This topic is closed to new replies.

Advertisement