Using A File Stream With A Character Type Other Than char Or wchar_t

Started by
10 comments, last by Ectara 10 years, 10 months ago

How would one go about using a stream with a parameter of charT corresponding to something other than char or wchar_t? I tried something like this:


std::basic_fstream<short, std::char_traits<short> > fs("test.txt", std::ios_base::out | std::ios_base::binary);

and it creates the file, writing to it does nothing, and flushing the stream sets the badbit.

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t. The problem is a simple one; if I try to seek 2^29 elements from the start in a stream with an element size of 4 bytes, I integer overflow on a 32bit two's complement system if I try to naively seek ahead the size of the element times the number of elements, using fseek() as a backend. Same for any operation that gets near the extremity of the offset type's precision, despite using those values being well-defined. Additionally, implementing peek() is complicated, since fungetc() is only guaranteed to store one character, so an element of a size greater than one isn't portable.

I tried using C++'s basic_streams, but most implementations don't seem to support streams of arbitrary width. If I were trying to support sane template parameters to my stream class, how should I go about implementing the back-end for file I/O? All other back-ends are easy.

Advertisement

Bump. I'm really kind of at a stand-still.

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t.

Basically, don't do that.

The stream systems were designed based around single-byte characters, with multi-byte characters mostly working okay as long as you stick with pure text.

The stream system is not designed to work with a base type of arbitrary objects.

You need to learn how serialization works, then properly encode and decode your data.

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t.

Basically, don't do that.

The stream systems were designed based around single-byte characters, with multi-byte characters mostly working okay as long as you stick with pure text.

The stream system is not designed to work with a base type of arbitrary objects.

You need to learn how serialization works, then properly encode and decode your data.

I know how serialization works. Then how do wchar_t streams work, being that they are often greater than char in size?

It uses a system of creating locales, creating streams of the given type that match the locale, imbue the stream with the locale, and finally write data that matches the types given.

All wchar_t does is allow you a character size greater than 1 byte. Except when it doesn't, because it isn't guaranteed to.

The built-in C++ language wide characters don't work with variable size characters, nor does it work very well with actual Unicode since wchar_t is a fixed length and Unicode is not. The type wchar_t is a compiler specific size which may be as little as 8 bits. Windows uses 2 bytes by default, Linux and OSX use 4 bytes.

Most good libraries that work with Unicode text actually work single-byte streams and extract the data they need.

I've never seen a good case for the built-in wide character support. I've worked on quite a few programs that are translated into languages from around the globe, yet none of the libraries can I recall actually relying on wchar_t.

All wchar_t does is allow you a character size greater than 1 byte.

Perhaps I'm not explaining myself right, but this is what I'm looking for. Fixed width character streams.


Perhaps I'm not explaining myself right, but this is what I'm looking for. Fixed width character streams.


C++11 has support for int16_t and int32_t.

If you need some other width you do all those things mentioned above:

It uses a system of creating locales, creating streams of the given type that match the locale, imbue the stream with the locale, and finally write data that matches the types given.

Create your locale, your ctype information, your char_traits, all customized for the size of your custom character. Create your stream with the proper type and traits, imbue the stream with the locale, and extract the data.

So, what would be the best way to do this for any char type? Would I be able to make locale information and character traits classes that depend on template parameters, and use them?

So, what would be the best way to do this for any char type? Would I be able to make locale information and character traits classes that depend on template parameters, and use them?

For a char type you do nothing; it is already done.

For int16_t and int32_t get a C++11 compliant compiler, and it is already done.

For some other type, you will need to define char_traits<your_type_here> and its twenty or so functions.

Then you will need to create all the locale information with functions like widen(), narrow(), and another fifty or so functions.

Create the stream, create your locale, imbue the stream with the locale, and proceed.

Of course, none of this will help your side problem of large file sizes.

Even with your custom type you won't be able to exceed std::numeric_limits<std::streamsize>::max();

That's likely to be a 31-bit number (2GB), or if you system supports large files in the standard library it will be a 63-bit number (8 exabytes), depending on your system. Based on what you've written, it will likely be the 2GB limit.

It isn't so much the issue of large files, as it is that if I tried to implement larger characters with a char stream backend, it would mess up seeking and other functions that treat all offsets as valid, by causing an overflow when it converts from sizes in characters to sizes in bytes internally. Hence, I'm looking to side-step it all by not having a char stream back-end.

My goal is to define the most bare-bones traits, locale, and anything else that performs no transformation on the data, does not recognize its significance, and simply returns the data verbatim; my own stream front-end handles all transformations and locale-like things.

Do you have any links at hand for creating such a simple set of templated traits and locale classes? I'm having a hard time finding any sort of article or documentation on creating one's own traits class or locale. I have my own base traits class for my stream class, which is incompatible with the standard char traits, so this isn't an unfamiliar concept, but it's looking more and more like I need to read my compiler's implementation, try to mimic it, and pray to whatever deity they follow.

This topic is closed to new replies.

Advertisement