How to implement string index bounding?

Started by
18 comments, last by zennehoy 18 years, 5 months ago
Hi all! I'm in the process of implementing my own string class, mostly because I'm not a big fan of the built-in C++ string, but also just for the experience. Dynamic resizing and all the other basics work great, but I'd like some feedback on how to implement indexing of characters within the string. There's no problem when the specified index falls within the current length of the string, but I'm undecided on how to implement an index outside of the valid range. Take the following code for example:

char c;
ztring z = "Hello World!"; // Just create a string with some text.
z[2] = 'a';                // z is now "Healo World!", no problem
c = z[6];                  // c is 'W'
c = z[42];                 // c is '\0', what should happen to z?
z[42] = '?';               // what should this do?

The simplest approach would be to make this illegal, as in C++ or Java. Instead, I'm considering increasing the size of z to make room for the indexed value. The problem is that this would have to occur even when just reading the value, since I can't (as far as I know) differentiate between a read and a write. Thus, any reference to z[534] would make a shorter string 535 bytes long, with a lot of '\0'-space in between. Basically I'm looking for feedback as to what others would consider a comprehensible solution. So, what behavior would you expect from a string? Thanks! Zen
Advertisement
whats wrong with std::string? its great.

okay your problem:

ask yourself, when people indexing a string index it out of bounds, what are they trying to do. if they want to expand the string, add a function that does that.

otherwise throw an exception or something. thats the easiest way to handle it.

my guess is for every programmer using a large value to index the array out of bounds to expand it, theres about 400 with a bug in their code.

or you could do the std::string thing an say that operator[]( int ) is unchecked, it is the clients responsibility to control the index variable.
Quote:Original post by zennehoy
Hi all!
I'm in the process of implementing my own string class, mostly because I'm not a big fan of the built-in C++ string

Why not?

Quote:but also just for the experience.

Fair enough, but realise that std::string will be a better general purpose string than anything you write.

Quote:Instead, I'm considering increasing the size of z to make room for the indexed value. The problem is that this would have to occur even when just reading the value, since I can't (as far as I know) differentiate between a read and a write. Thus, any reference to z[534] would make a shorter string 535 bytes long, with a lot of '\0'-space in between.

Basically I'm looking for feedback as to what others would consider a comprehensible solution. So, what behavior would you expect from a string?
Thanks!
Zen

So you're wanting something that behaves so that reading an character out of bounds is.. undefined? an error? returns a default value? something else? And writing a character out of bounds resizes the string. You could do something like:
// I call it silly string because I consider any attempt to write a// general purpose string class in C++ for any purpose other than learning to be// silly.  Just use std::string#include <iostream>#include <string>class SillyString{	private:		class CharacterProxy;	public:		SillyString(std::string const & s);		unsigned int size() const;		operator std::string() const;		CharacterProxy operator[](unsigned int index);	private:		void resize(int size);		// for this exercise I've just wrapped std::string for ease of		// implementation		std::string data_;		class CharacterProxy		{			public:				CharacterProxy(SillyString & string, unsigned int index);				operator char const() const;				CharacterProxy & operator=(char c);			private:				SillyString & string_;				unsigned int const index_;		};		friend class CharacterProxy;};SillyString::SillyString(std::string const & s)	:	data_(s){}unsigned int SillyString::size() const{	return data_.size();}SillyString::operator std::string() const{	return data_;}SillyString::CharacterProxy SillyString::operator[](unsigned int index){	return SillyString::CharacterProxy(*this, index);}void SillyString::resize(int size){	data_.insert(data_.end(), size + 1 - data_.size(), '\0');}SillyString::CharacterProxy::CharacterProxy(SillyString & string, unsigned int index)	:	string_(string),	index_(index){}SillyString::CharacterProxy::operator char const() const{	if (index_ >= string_.size())	{		return '\0';	}	return string_.data_[index_];}SillyString::CharacterProxy & SillyString::CharacterProxy::operator=(char c){	if (index_ >= string_.size())	{		string_.resize(index_);	}	string_.data_[index_] = c;	return *this;}int main(){	SillyString str("hello");	std::cout << str[3] << '\n';	std::cout << str[72] << '\n';	str[0] = 'H';	std::cout << static_cast< std::string >(str) << '\n';	str[17] = '!';	std::cout << static_cast< std::string >(str) << '\n';}

Enigma
Alright, I honestly hope you didn't just type all that up...
I suppose using a "proxy"-class like that is the only way to differentate a read from a write? I've used the same approach in a project before, but was hoping to avoid it in this case. How much overhead does it introduce? Otherwise that's exactly the sort of functionality I was imagining, resize on write, default value on read.

As to why: it started with the inability to do str += int. I haven't used std::string much, but I always felt I had to be carefull when passing strings as parameters. The key goal for my own ztring class has always been to make the syntax of its usage identical to that of an int (ignoring additional features like []). Duplication of data is also minimized (and I mean minimized), even across assignments and such.
Really though it was an excuse to play around with C++...

Anyways, thanks for your replies!
Zen

ps. Just out of curiosity, what does std::string do with something like this?
std::string s1="hello",s2=" ",s3="world",s4="!",s5;
s5 = s1 + s2 + s3 + s4;
How many data buffers are created? 5 should be enough, 1 for each initialized string and 1 for the entire series of concatenations. A simple solution would create 7, 1 for each string and another for each addition.
Quote:Original post by zennehoy
Alright, I honestly hope you didn't just type all that up...
I suppose using a "proxy"-class like that is the only way to differentate a read from a write? I've used the same approach in a project before, but was hoping to avoid it in this case. How much overhead does it introduce? Otherwise that's exactly the sort of functionality I was imagining, resize on write, default value on read.

In terms of overhead I would guess on a modern optimising compiler you'd be looking at something between "nothing" and "not much".

Quote:... I haven't used std::string much, but I always felt I had to be carefull when passing strings as parameters.

The only thing you need to be careful about is passing by value verses passing by const reference and that's not a string specific issue, it applies to many C++ types. Even if you make the wrong choice a profiler will usually tell you and it's easy to fix.

Quote:The key goal for my own ztring class has always been to make the syntax of its usage identical to that of an int (ignoring additional features like []).

I'm not sure what you mean by this. In what way is std::string unlike an int?

Quote:...Really though it was an excuse to play around with C++...

That's fine, just don't let me catch you crawling back to the forums with an error you can't track down in six months time because you're using your own string class and it has a subtle bug in it [lol].

Quote:ps. Just out of curiosity, what does std::string do with something like this?
std::string s1="hello",s2=" ",s3="world",s4="!",s5;
s5 = s1 + s2 + s3 + s4;
How many data buffers are created? 5 should be enough, 1 for each initialized string and 1 for the entire series of concatenations. A simple solution would create 7, 1 for each string and another for each addition.

If you're talking about dynamic memory allocations (i.e. operator new) then I believe the standard allows implementations to make any number of allocations in such a situation, although "normal" implementations will probably make between none and nine (yes, none is possible using the short string optimisation). In practice it looks like modern compilers make eight dynamic allocations (not sure about VC++ 2005, I haven't gotten round to installing that yet).

The question though is "is this a problem with std::string"? The answer is almost certainly no. Is it possible to write operator+ and operator= for some string class such that you can guarantee that no more than X dynamic allocations for the expression string s1;string s2 = ...;string s3 = ...;string s4 = ...;...;string sX = ...;s1 = s2 + s3 + s4 + ... + sX;? As far as I'm aware it is not (excluding techniques that could just as easily be applied to std::string). Not even using proxy classes and clever techniques.

So lets look at the question slightly differently. Is is possible to write code for std::string such as:
string s1;
string s2 = ...;
string s3 = ...;
string s4 = ...;
...
string sX = ...;
something;
such that s1 equals the concatentation of s2..sX and can guarantee no more than X + 1 memory allocations? (I'll allow plus one for the implementation to dynamically allocate a small buffer for s1 initially, even though I would expect most implementations not to do so). The answer is, unsurprisingly, yes, to an extent. And it's not even particularly hard. Start with the basic case for X = 5:
void concatenate(std::string & target, std::string const & s2, std::string const & s3, std::string const & s4, std::string const & s5){	target.reserve(s2.size() + s3.size() + s4.size() + s5.size());	target += s2;	target += s3;	target += s4;	target += s5;}

Then you can either provide versions of the above for varying numbers of parameters, or use preprocessor metaprogramming and a template (preprocessor template, not a C++ template) to generate them automatically.

Enigma
Quote:Original post by Enigma
The only thing you need to be careful about is passing by value verses passing by const reference and that's not a string specific issue, it applies to many C++ types. Even if you make the wrong choice a profiler will usually tell you and it's easy to fix.
Quote:The key goal for my own ztring class has always been to make the syntax of its usage identical to that of an int (ignoring additional features like []).

I'm not sure what you mean by this. In what way is std::string unlike an int?

If you pass a std::string by value, does its data get recreated? If so, that would mean you can't pass a std::string by value (can't as in, well, you know). That would be one difference to a regular int. My ztring class passes by value as expected; changes made in the function don't effect the value of the ztring in the calling function. Pass it as a reference and changes effect both. sizeof(ztring) is 4, so it's no different than passing a pointer.

Quote:don't let me catch you crawling back to the forums with an error you can't track down in six months time because you're using your own string class and it has a subtle bug in it [lol].

Alright, if I get any string-related errors, I promise to try it with std:string first ;)

Quote:"normal" implementations will probably make between none and nine (yes, none is possible using the short string optimisation).

What is short string optimisation?

As far as memory allocations are concerned, I was talking about the number of buffers created to actually hold the string data. When running a string of additions (result=s1+s2+s3+...+sn) you don't need to store the data for each sub-sum, you can just create a single data buffer containing s1+s2, then simply append all further strings. Its identical to result=s1+s2; result+=s3; result+=s4; etc., but written as a single line with + instead of +=.

I'll give the proxy-class idea a go, unless someone has a cleaner (ie no extra class) idea to accomplish the same. Thanks for your input!
Zen
Quote:Original post by zennehoy
... sizeof(ztring) is 4, so it's no different than passing a pointer.


Not exactly. Make an std::vector with a million elements in it and the sizeof will be the same as an empty vector of the same type, while passing the million element vector by value would be very expensive

sizeof doesn't account for dynamic memory that the class allocates. Passing a string by value is somewhat expensive because it news another dynamic buffer to hold a copy of the string for the life of the function. const reference is preferred in most cases for this reason.
Some std::string implementations use the copy-on-write semantic - that is, copying a string object does not duplicate the buffer holding the characters until you modify the string - then if it's buffer is shared, it will recreate it and do the modification to that.

Why don't you read the standard? The standard for C++ outlines exactly what a std::string implementation needs to guarantee in order to be a conforming std::string implementation. Even though you're 'writing your own', you should take a look at the list of requirements - one of the primary goals of C++ is efficiency - in fact, often in the C++ standard library efficiency wins over safety - that is, range-checking is less efficient than not range checking, so it's not normally done. However, there are ways (like the at() method) to do range checking, and many debug implementations do range checking and such so that you're in the clear, but you don't suffer many performance hits. For some reason people have this idea that "objects are bloated and std::string takes up gigabytes of memory just to store 'hello' and it runs really slow and you can't pass it by value and I can write better code using character arrays and blah blah blah..."
Quote:Original post by zennehoy
If you pass a std::string by value, does its data get recreated? If so, that would mean you can't pass a std::string by value (can't as in, well, you know). That would be one difference to a regular int. My ztring class passes by value as expected; changes made in the function don't effect the value of the ztring in the calling function. Pass it as a reference and changes effect both. sizeof(ztring) is 4, so it's no different than passing a pointer.

Passing a std::string by value means that any change made to the string inside the function does not affect the value of the original string, just like an int. Whether this is achieved by an immediate deep copy or by Copy-On-Write or some other technique is up to the implementation.

Quote:What is short string optimisation?

Some implementations of std::string make the actual string object slightly larger so that they can store short strings within the actual string memory itself instead of having to use dynamically allocated memory. For more details see Scott Meyers' Effective STL. (In fact even if you don't want any more details read the book and the two others in the series - Effective C++ and More Effective C++ - anyway because they're excellent books).

Quote:As far as memory allocations are concerned, I was talking about the number of buffers created to actually hold the string data. When running a string of additions (result=s1+s2+s3+...+sn) you don't need to store the data for each sub-sum, you can just create a single data buffer containing s1+s2, then simply append all further strings. Its identical to result=s1+s2; result+=s3; result+=s4; etc., but written as a single line with + instead of +=.

I know what you meant. It sounds easy and worthwhile on the face of it, but try implementing it. I think you'll find it's not as easy/possible as you thought in the general case.

Enigma

sizeof( std::string ) is 4 on my compiler (some gcc version )

and if you pass a std::string to a function and modify it, it will not modify the original... so other than the expand on write what are you gaining?

string += int can be added as a external operator function

you could probably just inherit the zstring class from std::string to make it act like you want.
void func( std::string str ){   str += "zxw";   std::cout << str << std::endl;}std::string& operator+=( std::string& str, int val ){    str += intToString( val ); // uses stringstreams and such    return str;}int main( int argc, char **argv ){    std::cout << "size of string == " << sizeof( std::string ) << std::endl;    std::string str = "xxx";    func( str );    cout << str << std::endl;    str += 10;    cout << str << std::endl;}


outputs

size of string == 4
xxxzxw
xxx
xxx10

EDIT: im slow :(

This topic is closed to new replies.

Advertisement