I have a custom string class that takes a character type and a traits class as template parameters, and I am looking to input/output the string to a std::basic_stream. One of the problem is, my traits class definition is different from the standard traits type, and thus it isn't possible for it to be 100% compatible.
Since the string is allowed to contain a custom transformation format, where a code point could comprise multiple code units, it is possible that the string is a variable-width text encoding. For this reason, I presume that I should output code units, rather than code points. As an example, a terminal that is expecting UTF-8 output would expect a stream of 8bit code units, rather than a 32bit code point.
So, what would anybody suggest for accomplishing this?
Outputting A Custom String To A Standard Stream
as ApochPiQ mentioned, the standard templated std::string method is basically what you are after. If that seems a bit of a hurdle to wade through, a google search of "overloading << operator c++" came back with a "couple" results, like:
http://www.learncpp.com/cpp-tutorial/93-overloading-the-io-operators/
Have you looked at how various standard library implementations do this for std::string? It should be a pretty analogous process.
Looking through GCC's implementation, running into the problem that it handles everything through locales, and such; this runs the issue that the custom traits class can essentially define its own locale, so the standard library attempting to convert code units to code points (or the opposite direction) before it ever reaches me is a problem. This subverts what I am doing, and thus, I must try to read raw data, since the format might be one that the custom string expects, but not one that std::basic_stream and the standard locales would.
Also, it seems that Windows internally converts wchar_t string to a char stream using ANSI codepages, so I can imagine that it'd butcher my data if I outputted data from different locales through the same stream. My point is, I feel like the standard implementation might be barking up the wrong tree, if I, for example, want to implement outputting UTF-16 data, assuming that the OS doesn't have built-in UTF-16 support.
as ApochPiQ mentioned, the standard templated std::string method is basically what you are after. If that seems a bit of a hurdle to wade through, a google search of "overloading << operator c++" came back with a "couple" results, like:
http://www.learncpp.com/cpp-tutorial/93-overloading-the-io-operators/
I do have an overload for the operators, I'm wondering how to implement it properly. You might notice that the link does not modify the behavior of reading in the data; rather it just batches together a couple standard operations. What I'm doing does not adhere to the conventional idea of a character, so I can't just std::cin >> str.char_array.
I understand, a console interprets what a console is capable of handling; my terminal emulator uses UTF-8. However, in this case it would substitute UTF-8 for UTF-16; I picked UTF-16 as an example because it doesn't have code units that are a single byte. But, let's go with the example of writing one of these encodings to an fstream, as this is closer to the actual purpose.
If you want binary serialization, you just dump the bytes to disk in whatever form you can re-read them later.
If you want something else, you dump something else.
I'm not sure what the question is?
For example, how do I output the code units through standard streams in a way that (1) the implementation of the streams won't mangle it completely, and (2) other programs would be able to access it in more or less the expected format (no locale transformations for UTF-8 code units, UTF-16 doesn't get piped through Window's underlying implementation that converts it to char array using your ANSI code page)?
Sort of like how I'd expect to just use fwrite() in C, and call it a day.
The equivalent would be basic_ostream::write().
Hm... it does seem like it would be that easy, doesn't it?
I guess my next question is a matter of the basic_ostream's charT parameter. If the code unit is larger than the width of charT, it seems like I'd have to break each code point up, and write it in pieces, and likewise, if charT is larger than a code unit, then there are extra bits in the written value that are unaccounted for.
Seems like I need to somehow add a partial specialization for char streams only, and make it be a compile error if you try to output the string with a stream that's any wider. Otherwise, I'd have to enforce that both the application that reads and the application that writes uses a stream of the same width, to account for the idiosyncrasies.