Jump to content
  • Advertisement
Sign in to follow this  
Nuclear868

Dark sides of C++ to/from string conversions

This topic is 868 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As there is a need for binary-to-string convertions in my projects, there is a need for string manipulation functionality.

Configuration Values are stored in binary form and are configured from string input, or sent as XML document over the network.

 

Having bunch of variables of various types (the non-volatile storage is limited) makes sense to have generic function which transforms stored values to strings, for example, to display them somewhere.

 

Let's say each config value type instantiates a template, which converts it to string. C++11 added std::to_string functions. Sounds great!

We create something like this: (it is not the only option, but other options have other flaws)

std::vector<std::string> someStrings;

template<typename T>
void add_something_as_string (const T& value)
{
    std::string str = std::to_string(value);

    //do something with str
    someStrings.push_back(str);
}

And test it with this example structure:

struct test_setup {
    uint32_t     param1;
    uint32_t     param2;
    uint8_t      param3;
    uint8_t      param4;
    char         label[40];
};
  //store values as strings
  test_setup cfg = {
  0, 1, 0, 1,
  "Label"};

  add_something_as_string(cfg.param1);
  add_something_as_string(cfg.param2);
  add_something_as_string(cfg.param3);
  add_something_as_string(cfg.param4);
  add_something_as_string(cfg.label);

Great, the same call, for all types.

Save, make, but...

test.cpp: In instantiation of ‘void add_something_as_string(const T&) [with T = char [40]]’:
test.cpp:163:35:   required from here
test.cpp:145:40: error: call of overloaded ‘to_string(const char [40])’ is ambiguous
  std::string str = std::to_string(value);
                                        ^
.... (skipped 7 line include chain)

/usr/include/c++/4.8/bits/basic_string.h:2864:3: note: std::string std::to_string(int) <near match>
   to_string(int __val)
   ^
/usr/include/c++/4.8/bits/basic_string.h:2864:3: note:   no known conversion for argument 1 from ‘const char [40]’ to ‘int’
/usr/include/c++/4.8/bits/basic_string.h:2869:3: note: std::string std::to_string(unsigned int) <near match>
   to_string(unsigned __val)
   ^
/usr/include/c++/4.8/bits/basic_string.h:2869:3: note:   no known conversion for argument 1 from ‘const char [40]’ to ‘unsigned int’
/usr/include/c++/4.8/bits/basic_string.h:2875:3: note: std::string std::to_string(long int) <near match>
   to_string(long __val)

...

The reason is, that it cannot find overload for std::to_string for    char label[40];

Too bad, this breaks all such generic variable-to-string conversions. It needs separate handling, when the input type is string.

Also, it is forbidden to add stuff to the std namespace, so we need another option.

The solution is simple, but I still wonder, why it wasn't added to the C++11 standard.

Why not add "proxy" std::to_string overloads for string types, which just return their parameters.

Now we have to proxy the entire std::to_string functions, because as mentioned, adding stuff to std is undefined behavior:

namespace Util {

template<typename T>
std::string to_string (const T& value)
{
    return std::to_string(value);
}

std::string to_string (const char *value)
{
    return value; //was it so hard???
}

std::string to_string (const std::string &value)
{
    return value; //was it so hard???
}

}

Creating a namespace sounds not bad. Now all calls to std::to_string must be replaced with calls to Util::to_string.

The second and the third function are the most important. They will be called for string template type.

Everything else will call the first function. It can be omitted if the code calls 'to_string' only after:

using Util::to_string;
using std::to_string;

I prefer calling Util::to_string everywhere (hoping that the template proxy will be inlined by the compiler).

 

 

==================

 

OK, creating strings works excellent, what about the oposite? Reading from string into variable?

There is no std::from_string equivalent of std::to_string, maybe because writing into variable is more tricky and there is no overload by return type.

Let's create something similar in our Util namespace:

namespace Util {

template<typename T>
T from_string (const std::string& value)
{
    T ret = 0;
    std::stringstream ss(value);
    ss >> ret;
    return ret;
}

}

Great. Even looks more functional than std::to_string, as user-defined types can overload operator >> and create themselves from strings. Also, it works for string types

Let's test it.

 

We will read from the strings, that we created in the previous example (stored in the someStrings vector):

   //read from strings
   test_setup cfg1;

   cfg1.param1 = Util::from_string<uint32_t>(someStrings[0]);
   cfg1.param2 = Util::from_string<uint32_t>(someStrings[1]);
   cfg1.param3 = Util::from_string<uint8_t>(someStrings[2]);
   cfg1.param4 = Util::from_string<uint8_t>(someStrings[3]);

We pass the type as argument, but everything looks good.

But... as we pass config value from string to binary, some setting behave not as expected. It looks like a bug, but... what can go wrong in this innocent function from_string? Let's print all members: (remember, p1 and p3 should be 0, p2 and p4 should be 1)

printf ("p1: %d, p2: %d, p3: %d, p4: %d\n",cfg1.param1, cfg1.param2, cfg1.param3, cfg1.param4);

result:

p1: 0, p2: 1, p3: 48, p4: 49

Whoa?!?!? What the...

 

p3 and p4 should be 0 and 1, but they are 48 and 49. Something strange happened.

Printing them in the debugger, reveals that 48 and 49 are the ASCII codes of 0 and 1. So, the stream operators work as expected for all types, but treat string contents as ascii code for all char types. Again, this breaks the generic functions

 

Ok, char is meant to be used for ... characters, but

but there are 3 (three) distinct char types: char, signed char and unsigned char. They are distinct types, which means that overload resolution can distinguish them.

 

Adding to the struct:

    char            c1;
    signed char     c2;
    unsigned char   c3;

Calling value-to-string function:

    add_something_as_string(cfg.c1);
    add_something_as_string(cfg.c2);
    add_something_as_string(cfg.c3);

Printing called function:

void add_something_as_string (const T& value)
{
    printf ("%s\n",__PRETTY_FUNCTION__);
...

results in:

void add_something_as_string(const T&) [with T = char]
void add_something_as_string(const T&) [with T = signed char]
void add_something_as_string(const T&) [with T = unsigned char]

But all types behave like char.

 

It would be wise to leave unsigned char behave as int and other types, when using stream operators, while char handles character's ascii code.
For example, C scanf function's %c specifies behave like char here, but there are %hhd and %hhu specifiers, which write into char type, but behave like all other types.

 

In C++ we should use some hacky solutions like this:

template<typename T>
T from_string (const std::string& value)
{
    T ret = 0;
    std::stringstream ss(value);

    if (sizeof(T) == 1) {
        int ret1;
        ss >> ret1;
        ret = static_cast<T> (ret1);
        return ret;
    }
    ss >> ret;
    return ret;
}

Assuming that only sizeof(1) type have this behavior. (Maybe there is something more elegent)

Same issue exists for converting values to string. Therefore I have chosen std::to_string, which... for surprise, manipulates even char types as binary.

Another solution is writing template specializations, one per char type for the grand total of 4 almost equivalent functions for simple reading from string.

 

Good news: Everything works now and all 'dirty hacks' are in one place.
 

Share this post


Link to post
Share on other sites
Advertisement

The reason is, that it cannot find overload for std::to_string for    char label[40];
Too bad, this breaks all such generic variable-to-string conversions. It needs separate handling, when the input type is string.


Input type is NOT string. Input type is a byte array - perhaps degraded to pointer to byte.

If you wanted a string, you use std::string.
 

Also, it is forbidden to add stuff to the std namespace, so we need another option.
The solution is simple, but I still wonder, why it wasn't added to the C++11 standard.
Why not add "proxy" std::to_string overloads for string types, which just return their parameters.


Not needed. For user-defined types (char[] is not a user defined type, btw) you can use a combination of using and argument-dependent-lookup.
 
namespace MyNamespace
{
  class MyClass
  {
    // Stuff...
  }

  std::string to_string(const MyClass& value)
  {
    // do your conversion and return...
  }
}

void CallToString()
{
  using std::to_string;
  auto string1 = to_string(1);
  auto string2 = to_string(4.5);
  auto string3 = to_string(MyNamespace::MyClass());
}

Whoa?!?!? What the...
 
p3 and p4 should be 0 and 1, but they are 48 and 49. Something strange happened.
Printing them in the debugger, reveals that 48 and 49 are the ASCII codes of 0 and 1. So, the stream operators work as expected for all types, but treat string contents as ascii code for all char types. Again, this breaks the generic functions


uint8_t may be typedeffed as char on your compiler, in which case the template can't tell the difference, and so relegates to reading it as an ASCII character.

So it's doing what it is supposed to - and it's only surprising because uint8_t (and the other sized types) are not distinct types that the standard library has templated on. Edited by SmkViper

Share this post


Link to post
Share on other sites
template<class _Traits>
    inline basic_istream<char, _Traits>&
    operator>>(basic_istream<char, _Traits>& __in, unsigned char& __c)
    { return (__in >> reinterpret_cast<char&>(__c)); }

and there's actually no reason not to use <uintmax_t> for each call to from_string, since you are trying to get an unsigned integer value and that is the largest unsigned integer type.

also, instead of if (sizeof(T) == 1), you could just add template specialization for char and unsigned char (or int8_t and uint8_t).

template<>
int8_t from_string<int8_t>(const std::string& value)
{
     int ret;
     std::stringstream ss(value);
     ss >> ret;
     return static_cast<int8_t>(ret);
}
template<>
uint8_t from_string<uint8_t>(const std::string& value)
{
     unsigned ret;
     std::stringstream ss(value);
     ss >> ret;
     return static_cast<uint8_t>(ret);
}
Edited by nfries88

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!