Sign in to follow this  

Splitting Strings (c++)

This topic is 3840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I noticed that Boost Libraries has a function for this, however I was disappointed to find that it does not use output iterators. Since I feel that output iterators are far more flexible than any imaginary container independent code (according to Scott Meyers "Effective STL" there is no such thing) I decided to write my own:
// Splits string s at delimiter placing resultant strings into output iterator dest
template <typename OutputIterator>
void Split(const std::string s,char delimiter,OutputIterator &dest,bool compress=true)
{
	std::string buffer;
	for (unsigned i=0;i<s.length();i++)
	{
		if (s[i]!=delimiter)
			buffer.push_back(s[i]);
		else if (buffer.length()||!compress)
		{// skip blank entries if compression is on, advance after fill entry
			// copy buffer
			*dest++=buffer;
			// clear buffer
			buffer.clear();
		}
	}
	// add last entry
	if (buffer.length()||!compress)
		*dest=buffer;
};

Share this post


Link to post
Share on other sites
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.

Share this post


Link to post
Share on other sites
Im not sure exactly what feedback you want but since your a fan of iterators why not make the input a pair of iterators allowing you to use it with a std::vector<char> instead of having to use a std::string and while we're at it lets remove the dependence on chars so we can use std::vector<wchar_t>'s or something.


// Split now takes two paramaters begin and end which represent the
// range to split and output the results to dest.
template<typename InIter, typename OutIter>
void Split( InIter begin
, InIter end
, typename std::iterator_traits<InIter>::value_type delimiter
, OutIter dest
, bool compress=true
)
{
typedef typename std::iterator_traits<InIter>::value_type value_type;
typedef typename std::iterator_traits<OutIter>::value_type out_type;

InIter buffer_start = begin;
InIter buffer_end = begin;
for (InIter i = begin; i != end; ++i)
{
if (*i != delimiter)
{
++buffer_end;
}
else if (buffer_start != buffer_end || !compress)
{
*dest = out_type(buffer_start, buffer_end);
++dest;
buffer_start = i;
buffer_end = i;
}
}

if (buffer_start != buffer_end || !compress)
{
*dest = out_type(buffer_start, buffer_end);
}
};



Edit: Fixed a couple of errors.

Share this post


Link to post
Share on other sites
Quote:
Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

Thank you Julian90, I like your solution although I think it poses some restrictions on the output iterator (out_type(buffer_start, buffer_end); must be valid) I don't currently foresee a need for any output iterators that would be incompatible. But it does have the nice performance benefit of not needing a std:string buffer.

Share this post


Link to post
Share on other sites
Quote:
Original post by T1Oracle
Quote:
Original post by Brother Bob
I don't understand your issue with output iterators. The iterator interface for the tokenizer works with output iterators. For example:

tokenizer<> tok(s);
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout, "\n"));

The ostream iterator was just an example; pick any output iterator you like.


I don't know about the tokenizer although your solution seems to require O(n2) time instead of O(n) like my solution. Anyway, boost libraries has a string split function and it requires that you pass a reference to the container object which is what I do not like about it.

I searched through the Boost documentation (not too much, just glanced at the places I figured one would belong to) and couldn't find anything regarding a split function as you implemented it first, so I assumed you were talking about the tokenizer. I apologize for that.

Anyways, I can imagine how it could be implemented in quadratic time, but I can't see why it would be. The tokenizer is linear time. Stepping through the code, I see they (yours and the one I posted) do pretty much the same thing; copy characters to a temporary token holder until a delimiter, and pass it to the output iterator.

Share this post


Link to post
Share on other sites

This topic is 3840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this