Archived

This topic is now archived and is closed to further replies.

Shannon Barber

Is this safe std::string code?

Recommended Posts

std::string line;
while(!fstr.eof())
	{
	line.resize(256, ''\0'');
	fstr.getline(&*line.begin(), (std::streamsize)line.size(), ''\n'');
	}
 
I was skeptical, but it does work on the two implementations I have to try it on. Is the practicality of std::string like that of std::vector?

Share this post


Link to post
Share on other sites
Hello Magmai,

yes std::string is perfectly legal.
all standard library class can be access by std::
one way to get around the use of using std:: is by
have a using(std) this tells the compiler to look in this namesapce for anyhting you use that appear in after or in the block the using appears. Though I would not use this in a header file.
To many using( ) is not good but a few well place ones work great.
In a source file using a lot if standard library class just put using(std) after all your includes and before your code. And you won''t have to type std:: for every cout, endl, map, vector, list, string, cerr, setw, etc.

Lord Bart

Share this post


Link to post
Share on other sites
I'm sorry, I should have been more explicit.
I am curious about the direct use of the underlying buffer,
fstr.getline(&*line.begin() ,

i.e. is the memory provided by std::string iterators guaranteed to be linear, as it is with vectors (or rather, is it a universal implementation as it is with vectors).


[edited by - Magmai Kai Holmlor on September 2, 2003 7:37:30 PM]

Share this post


Link to post
Share on other sites
Here''s how I would write this:

std::ifstream fstr("file.dat");

std::string line;

while(std::getline(fstr, line))
{
// do stuff with line...
}


I don''t know if what you wrote is guaranteed to be legal or not. Accessing the raw pointer to the internal storage via an iterator seems a bit fishy. I''d be tempted to avoid it wherever possible, and in this case I would think it is possible.

Share this post


Link to post
Share on other sites
Bart: Errrrr... I think MKH knows plenty about namespaces.

Magmai: It isn''t technically legal; basic_strings are not currently required to have packed-array representation. However, all the implementations I know of do. (it''s the same deal with vectors) Many people consider this lack of specification to be a mistake, and IIRC, it''ll be remedied in the next STL spec. For the time being, tho, don''t worry about it. HOWEVER, remember that after your getline, line is a 256-character string which probably has an embedded null in it and garbage after that. Treat it as such.


How appropriate. You fight like a cow.

Share this post


Link to post
Share on other sites
As far as I know it is not guaranteed by the standard. With a std::vector it is also not guaranteed, but for almost every implementation I have come across, the storage has been contiguous. I have had the same experience with std::string (well, basic_string to be more precise). I wouldn't be surprised if there are some guarantees stated in the next standard, so people can be confident that they are writing portable code. Also, the contiguous storage allows for compatability with C functions.

Incidentally, may I ask why you are not using std::getline, defined in string?

Edit: OK, I started writing my reply when only one person had replied... I really need to pick up some speed.

[ Google || Start Here || ACCU || STL || Boost || MSDN || GotW || MSVC++ Library Fixes || BarrysWorld || E-Mail Me ]

[edited by - Lektrix on September 2, 2003 7:52:51 PM]

Share this post


Link to post
Share on other sites
This can lead to horrible bugs if your implementation happens to be reference counted.

For example:

std::string one = "a string";
std::string two = one; //one and two point to the SAME DATA.

&*(one.begin()) = ''b''; // begin may or may not copy the data. Assigning to the iterator definitly will copy it. But you just bypassed it entirely.



My code might be off; but you get the idea. Now, two might actually equal "b string".

Share this post


Link to post
Share on other sites
quote:
Original post by Deyja
For example:

std::string one = "a string";
std::string two = one; //one and two point to the SAME DATA.



No, this makes a copy. basic_string has an explicit constructor defined that is implemented so as to allocate separate memory.
quote:
Original post by Deyja

&*(one.begin()) = 'b'; // begin may or may not copy the data. Assigning to the iterator definitly will copy it. But you just bypassed it entirely.



First thing's first, dereferencing the iterator returned by one.begin() will return a char, which you then get the address of, so you are trying to assign a (const) char to a char*, which is a conversion error. I think that you need to learn about how std::string works. The string associated with two would not be altered, as one and two have separate memory allocated for each other.

[ Google || Start Here || ACCU || STL || Boost || MSDN || GotW || MSVC++ Library Fixes || BarrysWorld || E-Mail Me ]

[edited by - Lektrix on September 2, 2003 8:21:08 PM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Actually, I think a vector might be garunteed to be continous.

But I''m pretty sure a string doesn''t have to be.

Share this post


Link to post
Share on other sites
quote:
Original post by Deyja
My code might be off; but you get the idea. Now, two might actually equal "b string".


CoW was one of the things that came to mind... buuuut calling .begin() is a good sign that the user intends to do something to the string, so CoW ought to be invoked upon calling it.

Share this post


Link to post
Share on other sites
Elements of a std::vector (possibly except std::vector, not sure about that) are indeed required by the standard to be stored in contiguous memory, so it's safe to use &v[0] (or &*v.begin()) as a pointer to the vector data. The explicit requirement was added in a technical corrigenda (which means it's now part of the standard).

To allow for standard library implementation trade-offs there's no similar requirement for std::string. You can get a pointer to the contents in contiguous memory through std::string::data() and std::string::c_str(), but that may be a copy and must not be modified.

[edited by - spock on September 2, 2003 11:08:27 PM]

Share this post


Link to post
Share on other sites
This is one of those things that gets the people on the newsgroups to all chime in "WELL IF YOU LOOK AT SECTION 2891.213.11.AC appendix 3 it doesn''t guarantee that it will work!!!!" Except, it does on a lot of compilers. I wouldn''t think twice when you had to use it if you marked it as a hack. It is one of the things I complain about when I complain about std::string.

Like I said in another thread, for a *basic* string (which it is, it is even named that) there is NO reason why the storage wouldn''t be contiguous. Only something more elaborate like an SGI rope would be spread out.

I''m starting to think that languages developed by committee (cough, C++) are just so darn slow to change that it''s detrimental to the programmer. Being able to read directly into the buffer would be a whole lot more useful than making an input buffer and then building a string from it. It''s an example of taking encapsulation way too far, where you actually make more work for the programmer. I think the people developing the language need to come down and see what some common problems are with it.

Share this post


Link to post
Share on other sites
quote:
Original post by antareus I''m starting to think that languages developed by committee (cough, C++) are just so darn slow to change that it''s detrimental to the programmer. Being able to read directly into the buffer would be a whole lot more useful than making an input buffer and then building a string from it. It''s an example of taking encapsulation way too far, where you actually make more work for the programmer. I think the people developing the language need to come down and see what some common problems are with it.


Well, I have two comments. First of all, the proper solution(as demonstrated by GrinningGator) is more elegant than any solution using direct access to a buffer. So the abstraction is good here, because there is no possibility of overflow, there aren''t special cases if you run out of space, and the programmer isn''t burdened with memory management. Second of all, much of the stl(although probably not strings) was not designed by committee, but rather by two individuals.

Share this post


Link to post
Share on other sites
quote:
Original post by Deyja
std::string one = "a string";
std::string two = one; //one and two point to the SAME DATA.

&*(one.begin()) = ''b''; // begin may or may not copy the data. Assigning to the iterator definitly will copy it. But you just bypassed it entirely.

If neither one.begin() nor *(one.begin()) invokes the copy-on-write, the implementation is horribly broken.

Share this post


Link to post
Share on other sites
I think most people have answered the question with "yes, it''ll work, but it''s not standard."

I''ll add to it why it might not actually work in (not that uncommon) cases:

1. String pooling. Some implementations do string pooling to reduce the overhead of having bunches of small strings with the same contents around. VC .NET has an option to turn this on IIRC.

2. Many implementations handle different-length strings different. At least one implementation has a 16-byte array in-object to store short strings, but use dynamicly allocated memory to store longer strings. This could (in some situations) lead to a problem if the class makes the wrong decision based on actual string length.

Share this post


Link to post
Share on other sites