Jump to content

  • Log In with Google      Sign In   
  • Create Account

Using A File Stream With A Character Type Other Than char Or wchar_t


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 Ectara   Crossbones+   -  Reputation: 2970

Like
1Likes
Like

Posted 24 May 2013 - 12:35 AM

How would one go about using a stream with a parameter of charT corresponding to something other than char or wchar_t? I tried something like this:

std::basic_fstream<short, std::char_traits<short> > fs("test.txt", std::ios_base::out | std::ios_base::binary);

and it creates the file, writing to it does nothing, and flushing the stream sets the badbit.

 

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t. The problem is a simple one; if I try to seek 2^29 elements from the start in a stream with an element size of 4 bytes, I integer overflow on a 32bit two's complement system if I try to naively seek ahead the size of the element times the number of elements, using fseek() as a backend. Same for any operation that gets near the extremity of the offset type's precision, despite using those values being well-defined. Additionally, implementing peek() is complicated, since fungetc() is only guaranteed to store one character, so an element of a size greater than one isn't portable.

 

I tried using C++'s basic_streams, but most implementations don't seem to support streams of arbitrary width. If I were trying to support sane template parameters to my stream class, how should I go about implementing the back-end for file I/O? All other back-ends are easy.



Sponsor:

#2 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 28 May 2013 - 03:24 PM

Bump. I'm really kind of at a stand-still.



#3 frob   Moderators   -  Reputation: 21331

Like
0Likes
Like

Posted 28 May 2013 - 04:07 PM

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t.

Basically, don't do that.  

 

The stream systems were designed based around single-byte characters, with multi-byte characters mostly working okay as long as you stick with pure text.

 

The stream system is not designed to work with a base type of arbitrary objects.

 

You need to learn how serialization works, then properly encode and decode your data.


Check out my personal indie blog at bryanwagstaff.com.

#4 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 28 May 2013 - 04:15 PM

I ask, because I have a template stream class that also accepts a parameter for the type of elements it contains, and I found that I didn't know how to implement things like seeking where the element type wasn't char or wchar_t.

Basically, don't do that.  

 

The stream systems were designed based around single-byte characters, with multi-byte characters mostly working okay as long as you stick with pure text.

 

The stream system is not designed to work with a base type of arbitrary objects.

 

You need to learn how serialization works, then properly encode and decode your data.

I know how serialization works. Then how do wchar_t streams work, being that they are often greater than char in size?



#5 frob   Moderators   -  Reputation: 21331

Like
0Likes
Like

Posted 28 May 2013 - 05:02 PM

It uses a system of creating locales, creating streams of the given type that match the locale, imbue the stream with the locale, and finally write data that matches the types given.

 

All wchar_t does is allow you a character size greater than 1 byte. Except when it doesn't, because it isn't guaranteed to.

 

The built-in C++ language wide characters don't work with variable size characters, nor does it work very well with actual Unicode since wchar_t is a fixed length and Unicode is not. The type wchar_t is a compiler specific size which may be as little as 8 bits. Windows uses 2 bytes by default, Linux and OSX use 4 bytes.

 

 

Most good libraries that work with Unicode text actually work single-byte streams and extract the data they need.

 

 

I've never seen a good case for the built-in wide character support.  I've worked on quite a few programs that are translated into languages from around the globe, yet none of the libraries can I recall actually relying on wchar_t.


Check out my personal indie blog at bryanwagstaff.com.

#6 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 28 May 2013 - 05:32 PM

All wchar_t does is allow you a character size greater than 1 byte.

Perhaps I'm not explaining myself right, but this is what I'm looking for. Fixed width character streams.



#7 frob   Moderators   -  Reputation: 21331

Like
1Likes
Like

Posted 28 May 2013 - 08:29 PM


Perhaps I'm not explaining myself right, but this is what I'm looking for. Fixed width character streams.


C++11 has support for int16_t and int32_t.

If you need some other width you do all those things mentioned above:

It uses a system of creating locales, creating streams of the given type that match the locale, imbue the stream with the locale, and finally write data that matches the types given.

Create your locale, your ctype information, your char_traits, all customized for the size of your custom character. Create your stream with the proper type and traits, imbue the stream with the locale, and extract the data.
Check out my personal indie blog at bryanwagstaff.com.

#8 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 28 May 2013 - 09:14 PM

So, what would be the best way to do this for any char type? Would I be able to make locale information and character traits classes that depend on template parameters, and use them?



#9 frob   Moderators   -  Reputation: 21331

Like
0Likes
Like

Posted 28 May 2013 - 11:59 PM

So, what would be the best way to do this for any char type? Would I be able to make locale information and character traits classes that depend on template parameters, and use them?

For a char type you do nothing; it is already done.

 

For int16_t and int32_t get a C++11 compliant compiler, and it is already done.

 

 

 

For some other type, you will need to define char_traits<your_type_here> and its twenty or so functions.

 

Then you will need to create all the locale information with functions like widen(), narrow(), and another fifty or so functions.

 

Create the stream, create your locale, imbue the stream with the locale, and proceed.

 

 

 

Of course, none of this will help your side problem of large file sizes.  

 

Even with your custom type you won't be able to exceed std::numeric_limits<std::streamsize>::max();

 

That's likely to be a 31-bit number (2GB), or if you system supports large files in the standard library it will be a 63-bit number (8 exabytes), depending on your system.  Based on what you've written, it will likely be the 2GB limit.


Check out my personal indie blog at bryanwagstaff.com.

#10 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 29 May 2013 - 12:19 AM

It isn't so much the issue of large files, as it is that if I tried to implement larger characters with a char stream backend, it would mess up seeking and other functions that treat all offsets as valid, by causing an overflow when it converts from sizes in characters to sizes in bytes internally. Hence, I'm looking to side-step it all by not having a char stream back-end.

 

My goal is to define the most bare-bones traits, locale, and anything else that performs no transformation on the data, does not recognize its significance, and simply returns the data verbatim; my own stream front-end handles all transformations and locale-like things.

 

Do you have any links at hand for creating such a simple set of templated traits and locale classes? I'm having a hard time finding any sort of article or documentation on creating one's own traits class or locale. I have my own base traits class for my stream class, which is incompatible with the standard char traits, so this isn't an unfamiliar concept, but it's looking more and more like I need to read my compiler's implementation, try to mimic it, and pray to whatever deity they follow.



#11 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 30 May 2013 - 06:24 PM

I created a custom templated character traits class, and substituted that in, and it still sets the bad and fail bits when I try to write and flush.



#12 Ectara   Crossbones+   -  Reputation: 2970

Like
0Likes
Like

Posted 05 June 2013 - 10:35 AM

I can't seem to get it to work; it always sets the bad bit. Which mystifies me, because if I haven't correctly provided all of the required classes, I can't imagine how it even compiles, unless every class in the standard stream's implementation on my machine has a default implementation of just failing.

 

I guess I have to mark in the documentation that unless they're using C++11, they're stuck with char and wchar_t.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS