• Advertisement
Sign in to follow this  

How reliable are streams?

This topic is 4252 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm currently using streams to convert numbers to strings and vica versa. I'm parsing numbers from a file, so for that the istringstream >> integer operation is used. Is it defined in the C++ standard how exactly this operation is to happen? Can I be certain, that if I type 465624 in a file, that this stringstream will always convert it correctly to the integer 465624? I didn't find any description on how this conversion works on cppreference.com, so that's why I'd like to know how well defined this is. If it's not well defined, then maybe some other version from another compiler could break the parsing of my files, because I'm relying on this stringstream to do it correctly, instead of my own string to integer conversion I used in the past. Also, is it possible to overload the >> and << operators of streams for integers and floating point numbers to give more options? Because iomanip is actually pretty limited, you can only choose from 3 numberal systems, no roman and not even binary. [Edited by - Lode on July 6, 2006 4:34:17 AM]

Share this post


Link to post
Share on other sites
Advertisement
You can add your own manipulators if you like.

And basic conversions like string -> int are pretty well defined, and should work the same on all (standards-compliant) compilers. :)

Share this post


Link to post
Share on other sites
Class templates num_put and num_get is used for the formatting, their exact behavior is described in 22.2.2. Basically they get information from the facet and the actual input/output is just described in terms of printf and scanf, for instance:
Quote:
The representations at the end of stage 1 consists of the char's that would be printed by a call of printf(s, val) where s is the conversion specifier determined above.

Share this post


Link to post
Share on other sites
I still didn't get convinced, especially when the behaviour of it can be determined by "locales".

I made something that I can use in my game engine instead of streams, it has less features, but it does everything needed for my game engine and I have more control and certainty over it. This way I have full control over every byte read from the script files it parses.


/*
To be exactly certain about how numbers get converted into characters and vica
versa, independent of locales and so on, I write my own StringStream
implementation. It can convert from and to std::strings, which is enough to
suit the needs of everything done by the printing and file parsing done in this
game engine.
It can both input and output, all you have to do is define << and >> operators
the same way as you would for std::streams.
*/

class StringStream
{
std::string s; //the buffer
int floatLength; //length for conversion to string from floating point numbers (not really like "precision" but attempts to be like that)

public:
std::string str() { return s; }
StringStream(std::string v) { s = v; floatLength = 8; }
StringStream() { s = ""; floatLength = 8; }

void setLength(int floatLength) { this->floatLength = floatLength; }
int getLength() { return floatLength; }

StringStream& operator<<(StringStream& v)
{
s += v.s;
return *this;
}

StringStream& operator>>(StringStream& v)
{
v.s = s;
return *this;
}

template<typename T>
void toUint(T& i) //converts the string to the given unsigned integer type
{
i = 0;
for(unsigned p = 0; p < s.length(); p++)
if(s

>= '0' && s

<= '9') i = 10 * i + (s

- '0');
}

template<typename T>
void toSint(T& i) //converts the string to the given signed integer type
{
toUint(i);
if(s.size() > 0 && s[0] == '-') i = -i;
}

template<typename T>
void toFloat(T& f)
{
f = 0;
T div = 1;
bool afterPoint = false;
for(unsigned p = 0; p < s.length(); p++)
if(s

>= '0' && s

<= '9')
{
if(!afterPoint) { f = 10 * f + (s

- '0'); }
else
{
div /= 10;
f += div * (s

- '0');
}
}
else if(s

== '.') afterPoint = true;
if(s.size() > 0 && s[0] == '-') f = -f;
}

template<typename T>
void fromUint(T i) //set the string to given unsigned integer
{
s = "";
if(i == 0) s = "0";
else while(i > 0) { s = char((i % 10) + '0') + s; i /= 10; }
}

template<typename T>
void fromSint(T i) //set the string to given signed integer
{
if(i >= 0) fromUint(i);
else
{
fromUint(-i);
s = ("-" + s);
}
}

template<typename T>
int floatMod10(T f)
{
if(f < 0) f = -f;
if(f < 1) return 0;

T s = 10; //subtractor
while(f > s) s *= 10;
s /= 10; //the above subtractor was one too large

while(s > 100 / 2) //I use 50 instead of 10 to avoid unprecisions
{
while(f > 0) f -= s;
f += s;
s /= 10;
}

return int(f) % 10;
}


template<typename T>
void fromFloat(T f) //set the string to given float
{
s = "";
bool negative = false;
if(f < 0)
{
negative = true;
f = -f;
}
T fcopy = f;
if(fcopy >= 0.0 && fcopy < 1.0)
{
s += "0";
}
else while(fcopy >= 1.0)
{
int digit = floatMod10(fcopy);
s = char(digit + '0') + s;
fcopy /= 10.0;
}

int lengthleft = floatLength - s.size() - 1; //The -1 is because the point itself is also a character.
if(negative) lengthleft--;

fcopy = f;
if(lengthleft > 0)
{
s += ".";
}
while(lengthleft > 0)
{
fcopy *= 10.0;
int digit = floatMod10(fcopy);
s = s + char(digit + '0');
lengthleft--;
}
if(negative) s = "-" + s;
}

StringStream& operator>>(unsigned char& v) { toUint(v); return *this; }
StringStream& operator>>(unsigned short& v) { toUint(v); return *this; }
StringStream& operator>>(unsigned int& v) { toUint(v); return *this; }
StringStream& operator>>(unsigned long& v) { toUint(v); return *this; }
StringStream& operator>>(char& v) { toSint(v); return *this; }
StringStream& operator>>(short& v) { toSint(v); return *this; }
StringStream& operator>>(int& v) { toSint(v); return *this; }
StringStream& operator>>(long& v) { toSint(v); return *this; }
StringStream& operator>>(float& v) { toFloat(v); return *this; }
StringStream& operator>>(double& v) { toFloat(v); return *this; }

StringStream& operator<<(unsigned char& v) { fromUint(v); return *this; }
StringStream& operator<<(unsigned short& v) { fromUint(v); return *this; }
StringStream& operator<<(unsigned int& v) { fromUint(v); return *this; }
StringStream& operator<<(unsigned long& v) { fromUint(v); return *this; }
StringStream& operator<<(char& v) { fromSint(v); return *this; }
StringStream& operator<<(short& v) { fromSint(v); return *this; }
StringStream& operator<<(int& v) { fromSint(v); return *this; }
StringStream& operator<<(long& v) { fromSint(v); return *this; }
StringStream& operator<<(float& v) { fromFloat(v); return *this; }
StringStream& operator<<(double& v) { fromFloat(v); return *this; }

StringStream& operator<<(const unsigned char& v) { fromUint(v); return *this; }
StringStream& operator<<(const unsigned short& v) { fromUint(v); return *this; }
StringStream& operator<<(const unsigned int& v) { fromUint(v); return *this; }
StringStream& operator<<(const unsigned long& v) { fromUint(v); return *this; }
StringStream& operator<<(const char& v) { fromSint(v); return *this; }
StringStream& operator<<(const short& v) { fromSint(v); return *this; }
StringStream& operator<<(const int& v) { fromSint(v); return *this; }
StringStream& operator<<(const long& v) { fromSint(v); return *this; }
StringStream& operator<<(const float& v) { fromFloat(v); return *this; }
StringStream& operator<<(const double& v) { fromFloat(v); return *this; }

StringStream& operator<<(const char* v) { s += v; return *this; }
StringStream& operator<<(std::string& v) { s += v; return *this; }
StringStream& operator<<(const std::string& v) { s += v; return *this; }
};


//usage: std::string str = valtostr(25454.91654654f);
template<typename T>
std::string valtostr(const T& val)
{
StringStream sstream; //also works with std::ostringstream instead
sstream << val;
return sstream.str();
}

//usage: double val = strtoval<double>("465498.654");
template<typename T>
T strtoval(const std::string& s)
{
StringStream sstream(s); //also works with std::istringstream instead
T val;
sstream >> val;
return val;
}

//length is decimal precision of the floating point number
template<typename T>
std::string valtostr(const T& val, int length)
{
StringStream sstream; //also works with std::ostringstream instead
//sstream << std::setprecision(length) << val;
sstream.setLength(length); sstream << val;
return sstream.str();
}





[Edited by - Lode on July 6, 2006 7:30:17 AM]

Share this post


Link to post
Share on other sites
Quote:
Can I be certain, that if I type 465624 in a file, that this stringstream will always convert it correctly to the integer 465624?


Yes.

Quote:
I didn't find any description on how this conversion works on cppreference.com, so that's why I'd like to know how well defined this is.


Buy yourself a copy of the C++ Standard.

Quote:
If it's not well defined, then maybe some other version from another compiler could break the parsing of my files


That would be moronic.

Quote:
I still didn't get convinced, especially when the behaviour of it can be determined by "locales".


The Standard C locale is active by default. It is the same for everybody.

Quote:
This way I have full control over every byte read from the script files it parses.


Famous last words.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny
Quote:
I didn't find any description on how this conversion works on cppreference.com, so that's why I'd like to know how well defined this is.


Buy yourself a copy of the C++ Standard.


What does it cost and where to buy it? Google search didn't give me much helpful information.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
What does it cost and where to buy it? Google search didn't give me much helpful information.


Here or here are your best choices. Note that the C locale is defined by the C Standard.

Quote:
Also, is it possible to overload the >> and << operators of streams for integers and floating point numbers to give more options?


No.

Quote:
Because iomanip is actually pretty limited, you can only choose from 3 numberal systems, no roman and not even binary.


Create a Roman or a Binary num_get and num_put facets and imbue your stream with them. They aren't supported by streams because, let's face it, they are of little use. I can see no justification for having roman numbers in the standard library...

You can also read and write binary using a std::bitset or a boost::dynamic_bitset.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny
Quote:
Original post by Lode
What does it cost and where to buy it? Google search didn't give me much helpful information.


Here or here are your best choices. Note that the C locale is defined by the C Standard.


I know it's off topic, but, I'm wondering why someone would have to pay for the C++ standard. I thought C++ was a free and open language that wasn't owned by anyone. But to read the standard of it you have to pay money to ANSI. Does this money go to ANSI, or does it go to the designers of C++ like Soustroup?

Share this post


Link to post
Share on other sites
For the C standard, you'll want either this or that, though keep in mind that most C compilers implement C89, rather than C99.

Share this post


Link to post
Share on other sites
You are quite possibly the single worst cast of NIH syndrome I've seen on the boards. If you're worried about the standard library versions not doing what you want, then stick some regression tests for numeric conversions in your build process. It's not as if you're going to be any more certain that your code will work across different compilers. (Here's a hint: there's a few places in your numeric conversions I can guarantee they won't work the same on some compilers.) For that matter, if you're that worried that your scripting system is going to have troubles across multiple compilers, just use an existing scripting language. There are plenty of options such as Lua, Python, Pawn, Squirrel and so on.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
I know it's off topic, but, I'm wondering why someone would have to pay for the C++ standard. I thought C++ was a free and open language that wasn't owned by anyone. But to read the standard of it you have to pay money to ANSI. Does this money go to ANSI, or does it go to the designers of C++ like Soustroup?


When you buy it from ANSI, the money goes to ANSI, which - suprise, suprise - they use to develop more standards. That's their job, they're a standards organization. The money made from selling other standards went towards the development of the C++ standards. If you don't want to give money to ANSI or ISO, maybe your own national standards organization will sell you a copy for less, or even give you one for free.

Share this post


Link to post
Share on other sites
If you feel cheap, a copy of the current draft is available from the C++ Committee website for free. Note though that it is not the official document. Likewise, draft copies of the C standard are probably still floating around on the net.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
You are quite possibly the single worst cast of NIH syndrome I've seen on the boards. If you're worried about the standard library versions not doing what you want, then stick some regression tests for numeric conversions in your build process. It's not as if you're going to be any more certain that your code will work across different compilers. (Here's a hint: there's a few places in your numeric conversions I can guarantee they won't work the same on some compilers.) For that matter, if you're that worried that your scripting system is going to have troubles across multiple compilers, just use an existing scripting language. There are plenty of options such as Lua, Python, Pawn, Squirrel and so on.



Well, I think I'll convert back to the std::streams then, luckily it's very easy to change between StringStream and std::stream in my code.

Little question though, what are the things that won't work on some compilers? Maybe the modulo divisions? Those are only used on positive numbers :)

Share this post


Link to post
Share on other sites
You've assumed that char == ascii for a start.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
Little question though, what are the things that won't work on some compilers? Maybe the modulo divisions? Those are only used on positive numbers :)


Off-hand and without reading your code, I would say, conversions that rely on characters having specific numeric values (i.e. ASCII), on char being either signed or unsigned, and anything involving floating-point numbers.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
Little question though, what are the things that won't work on some compilers? Maybe the modulo divisions? Those are only used on positive numbers :)

To find out about compiler-specifc behaviour, keywords and extensions, you should consult your compiler's manual. Specifics and compliance issues are listed there.


Good luck,
Pat.

Share this post


Link to post
Share on other sites
I made a little testcase:


#include <sstream>
#include <string>
#include <iostream>

int main()
{
std::stringstream s;
int i = 1;
s << i;
int value = s.str()[0];

//is value now equal to "49" on ANY platform?

std::cout << value;

return 0;
}


So will "value" now be equal to 49 on any platform, no matter what charset it uses, what country it's in, how many bits a char is, and so on?

Because if it isn't, then my "StringStream" wouldn't be a "NIH", since it's then different in that it's independent of locale etc...

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
Well, I think I'll convert back to the std::streams then, luckily it's very easy to change between StringStream and std::stream in my code.

Of course, that change will change the semantics of your code, since your custom stream class handles chars in a manner that is, frankly, bizarre. Seriously, use boost::lexical_cast. Your custom conversion functions have a number of problems with them that are resolved by boost::lexical_cast.

Quote:

Little question though, what are the things that won't work on some compilers? Maybe the modulo divisions? Those are only used on positive numbers :)

In addition to what's been mentioned, your code may or may not even compile on types of wchar_t depending on the compiler, or even compiler options. Also, since there are no error signalling routines, you can silently mangle the converted number depending on the size of the relevant numeric type. There are probably other problems since I didn't look too closely at your code, but from a rough look, it seems like not only can your floating point conversions give you different results on different compilers, they can give you different results on different calls in the same run of your program.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
So will "value" now be equal to 49 on any platform, no matter what charset it uses, what country it's in, how many bits a char is, and so on?


No, only on platforms that use ASCII characters. If you were on, say, an EBCDIC system, the numeric value of the character '1' would be 241. The stream would still produce a '1', but the actual numeric value of the character is platform-specific.

Quote:
Because if it isn't, then my "StringStream" would NOT be a "NIH", since it's then different in that it's independent of charset etc...


No. It is not. Your class would suffer from exactly the same problem in exactly the same way. '1' is not always equal to 49.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny
Quote:
Original post by Lode
So will "value" now be equal to 49 on any platform, no matter what charset it uses, what country it's in, how many bits a char is, and so on?


No, only on platforms that use ASCII characters. If you were on, say, an EBCDIC system, the numeric value of the character '1' would be 241. The stream would still produce a '1', but the actual numeric value of the character is platform-specific.

Quote:
Because if it isn't, then my "StringStream" would NOT be a "NIH", since it's then different in that it's independent of charset etc...


No. It is not. Your class would suffer from exactly the same problem in exactly the same way. '1' is not always equal to 49.


If my game would be copied to such an EBCDIC system, then the script file would be copied over too. This script file contains bytes, bytes of ascii characters. Would this file be converted to EBCDIC too?

I'm looking at files as "a list of 8-bit numbers", and those numbers never change and determine what happens when it's parsed, maybe this is a wrong way to look at files?

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
If my game would be copied to such an EBCDIC system, then the script file would be copied over too. This script file contains bytes, bytes of ascii characters. Would this file be converted to EBCDIC too?


Nope. The bytes would stay the same, but they would be interpreted differently. i.e. your script would be an unreadable mess, which neither your class nor the standard library could make sense of. Unless, that is, your game were copied as a binary (rather than recompiled from source), in which case the interpretation wouldn't change (barring dynamic linking), regardless of whether you used your own class or the standard streams.

Of course, the odds of you running into an EBCDIC system these days are close to nil. [smile]

Share this post


Link to post
Share on other sites
Hmm, if, (say in the odd case where an EBCDIC system would compile and run the game :p), the C++ source code would be converted to EBCDIC, and the script file would also be converted to EBCDIC, would in that case, everything become readable again by the standard streams, and integers from the script file be interpreted correctly, and so on?

Also, is the "default locale" used by the C++ standard always the same, so would it be US ASCII even on an EBCDIC system, and also on a Chinese computer?

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
Hmm, if, (say in the odd case where an EBCDIC system would compile and run the game :p), the C++ source code would be converted to EBCDIC, and the script file would also be converted to EBCDIC, would in that case, everything become readable again by the standard streams, and integers from the script file be interpreted correctly, and so on?


Yes.

Quote:
Also, is the "default locale" used by the C++ standard always the same, so would it be US ASCII even on an EBCDIC system, and also on a Chinese computer?


It would be the same on a US and on a Chinese computer, yes. Things would be different on an EBCDIC machine (since it would need to use different numeric values to encode the characters).

Incidentally, that's why C provides functions like isdigit() and isalpha() to classify characters rather than having you rely on their binary encoding. Aside from convenience, it protects you from differences in character encoding.

Share this post


Link to post
Share on other sites
The default locale is the "C" locale which means using '.' as a decimal point, no thousands separator and generally acting as if the numbers were interpreted the same as if they showed up inside C source code by the compiler in the same native character set encoding used by the system.

However, the notion of treating files as streams of 8 bit numbers is not something portable. Some platform use 16 bit bytes. Some use 32 bits. Your file loading and reading might end up being very very different on those compilers. For that matter, system calls (such as opening files with file names) might end up different depending if the operating sytem interpreted char *s as ANSI characters strings or UTF-8 unicode character strings even if both systems used 8 bit chars.

Porting programs to different platforms is non-trivial; however, you can best reduce your problems by using standard library functions and portable third party libraries that have already done the hard parts of getting the porting right. If you have to verify your own code for every tiny little bit then your platform porting is going to be a nightmare.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement