Sign in to follow this  
T1Oracle

Splitting Strings (c++)

Recommended Posts

I noticed that Boost Libraries has a function for this, however I was disappointed to find that it does not use output iterators. Since I feel that output iterators are far more flexible than any imaginary container independent code (according to Scott Meyers "Effective STL" there is no such thing) I decided to write my own:
// Splits string s at delimiter placing resultant strings into output iterator dest
template <typename OutputIterator>
void Split(const std::string s,char delimiter,OutputIterator &dest,bool compress=true)
{
	std::string buffer;
	for (unsigned i=0;i<s.length();i++)
	{
		if (s[i]!=delimiter)
			buffer.push_back(s[i]);
		else if (buffer.length()||!compress)
		{// skip blank entries if compression is on, advance after fill entry
			// copy buffer
			*dest++=buffer;
			// clear buffer
			buffer.clear();
		}
	}
	// add last entry
	if (buffer.length()||!compress)
		*dest=buffer;
};

Share this post


Link to post
Share on other sites
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.

Share this post


Link to post
Share on other sites
Im not sure exactly what feedback you want but since your a fan of iterators why not make the input a pair of iterators allowing you to use it with a std::vector<char> instead of having to use a std::string and while we're at it lets remove the dependence on chars so we can use std::vector<wchar_t>'s or something.


// Split now takes two paramaters begin and end which represent the
// range to split and output the results to dest.
template<typename InIter, typename OutIter>
void Split( InIter begin
, InIter end
, typename std::iterator_traits<InIter>::value_type delimiter
, OutIter dest
, bool compress=true
)
{
typedef typename std::iterator_traits<InIter>::value_type value_type;
typedef typename std::iterator_traits<OutIter>::value_type out_type;

InIter buffer_start = begin;
InIter buffer_end = begin;
for (InIter i = begin; i != end; ++i)
{
if (*i != delimiter)
{
++buffer_end;
}
else if (buffer_start != buffer_end || !compress)
{
*dest = out_type(buffer_start, buffer_end);
++dest;
buffer_start = i;
buffer_end = i;
}
}

if (buffer_start != buffer_end || !compress)
{
*dest = out_type(buffer_start, buffer_end);
}
};



Edit: Fixed a couple of errors.

Share this post


Link to post
Share on other sites
Quote:
Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

Thank you Julian90, I like your solution although I think it poses some restrictions on the output iterator (out_type(buffer_start, buffer_end); must be valid) I don't currently foresee a need for any output iterators that would be incompatible. But it does have the nice performance benefit of not needing a std:string buffer.

Share this post


Link to post
Share on other sites
Quote:
Original post by T1Oracle
Quote:
Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

I searched through the Boost documentation (not too much, just glanced at the places I figured one would belong to) and couldn't find anything regarding a split function as you implemented it first, so I assumed you were talking about the tokenizer. I apologize for that.

Anyways, I can imagine how it could be implemented in quadratic time, but I can't see why it would be. The tokenizer is linear time. Stepping through the code, I see they (yours and the one I posted) do pretty much the same thing; copy characters to a temporary token holder until a delimiter, and pass it to the output iterator.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this