C++ Streams versus C Standard I/O Streams

Started by
5 comments, last by Ectara 10 years, 11 months ago

Are there any concrete benefits other than ease of use for the C++ streams over the C file streams? It is much easier to parse files and read text with C++ streams, by being able to get the next character without advancing the file pointer, and other features. However, I am hesitant to declare C++ streams as being more beneficial than C streams, because the standard streams in C++ seem to require the input to be placed into a buffer that the stream interface then uses to get the data; this means that a certain number of elements must be read into the buffer in order to do any reading operation. This has the side effect that the offset of the stream's back end won't actually reflect the logical file position reported by the stream interface. This becomes important if you need to interact with the stream through its interface, then interact with the back end alone, which becomes broken because it is left in an unexpected state; if it must fill a 32 byte buffer, then it would read 32 bytes, and you would read from the buffer. If you get 5 elements from the stream, the buffer's position would only be 5 ahead, but the back end would be 32 ahead, the size of the buffer.

Now, this may seem like an unimportant concept, but there is a use case for this, which has many different problems with this method. I have a unique archive format, with a library that allows you to open the files contained within as streams like ordinary files. Additionally, the files may be compressed, and the whole archive can be encrypted, so the archive library manages an encryption stream on top of the stream that contains the archive itself. If the file is compressed, there is a compression stream on top of the encryption stream. On top of everything else, there is an archive stream, which effectively simulates a normal file stream while reading from the stream containing the archive. This may seem overly complex, but it allows one to transparently use a file from an archive, which may be compressed or encrypted, and the modules that read and write data with streams don't know the difference. This makes it possible to read an image from an encrypted archive using the same function that reads an image from a file on disk.

My point isn't to debate how the stream hierarchy works; it is a successfully implemented extensible framework using a stream model like C's streams. The thing is, using an interface like C++ streams that require a stream buffer would prevent one from having any control over where the file pointer of the back end actually is, and would prevent you from having unbuffered streams.

So, put another way, how is the C++ stream model not problematic? Does the issue of unbuffered streams ever come up for anyone? How about non-rewindable streams? I had a stream type at one point that paired a font and a frame buffer, which would allow you to write characters to the stream, and it would be printed to the frame buffer. Being that the stream was a static image and a raster position, there was no way to measure how far back to move, or overwrite already written characters. As a result, the stream is unrewindable. There are also examples of unrewindable input streams. If the stream can't be rewound, and it is used in a buffered stream as described above, then the stream will be positioned further ahead than the logical offset in order to fill the buffer, and when all is said and done, the characters that were unused between the logical offset and the physical offset of where the buffer ended are lost, because you can't seek backward.

Advertisement
You seem to know more about this than I do, but can you not get unbuffered C++ streams by calling setbuf(0,0)?

You seem to know more about this than I do, but can you not get unbuffered C++ streams by calling setbuf(0,0)?

I may not know as much about the streams as I could, but technically, that isn't required to do anything. For file streams, setbuf(0, 0) must make the stream unbuffered, but for most uses, it is implementation defined, and for stream buffers, setbuf(0, 0) is required to do nothing. So, there's nothing stopping the stream from maintaining a separate buffer from the string stream, and having the same synchronization problems. It is only defined to work on file buffers, one of many possible kinds.

However, the question concerning the standards is more of a hypothetical one; I'm rewriting my old stream modules from C using new paradigms and models, and I'm wondering whether I should keep the old design, which was based on C's streams, or adopt something more like C++'s streams. As far as advantages go, using C++ streams to parse text files character by character is incredibly easy by being able to read the current character without moving the file pointer ahead. In C++, if you make a file buffer unbuffered, it still provides the ability to to get the current character without advancing the file pointer, which is something I wouldn't be able to implement in many stream types without buffering the input, or using unget functionality. Unfortunately, without a standard way for me to read the current character without moving the file pointer while not redundantly using C++ streams as the back end, I'd run into the same synchronization problems as before, because I'd have to read the current character, advancing the file pointer, then either pretend I didn't move the file pointer, or try to seek backwards. The first would make the children streams unsynchronized; they'd be one character ahead of the topmost stream interface, since they don't know why the character was read, only that it happened. The second would break unrewindable streams.

If I could figure out a way to be like C's streams, and also read the current character without advancing the file pointer, that would solve this whole problem. The main benefit that I see is being able to easily get the current character without advancing; since one of the stream types doesn't support it in a standard fashion that I know about, it can't be part of the base stream interface from which all of them inherit.

Given that the C file streams don't have a mechanism to replace the back end at all, I don't see why you think that not being able to predictably interact with the back end in the C++ streams would be a disadvantage for C++ streams. It seems extremely illogical to state that product A is inferior to product B because product A won't let you do something that you can't do with product B.

Given that the C file streams don't have a mechanism to replace the back end at all, I don't see why you think that not being able to predictably interact with the back end in the C++ streams would be a disadvantage for C++ streams. It seems extremely illogical to state that product A is inferior to product B because product A won't let you do something that you can't do with product B.

Right, but I only ever use C streams as a back end. Are you just saying that this is a task that cannot be attempted?

To clarify the clarification, I use my own stream model, which is a third and completely separate stream model in regards to the two standardized ones mentioned here. I do have a file stream type, that uses C standard I/O to manipulate files using my stream interface.

FWIW: in my case I wrap most filesystem stuff in a custom VFS layer.

internally, it is more object-based (and new types of file-like objects can be created), but uses a C-like front-end API (it is actually very similar to stdio, just with a name prefix added). however, the backend is more extensible, where new types of files and filesystems can be added as-needed (the VFS backend is actually a little more like the Linux VFS, using a hierarchy and mounting things at various mount-points, with "FS drivers" generally defining both the behavior at the mount-point, and also internally managing file-types specific to this FS type, such as for mounting an OS directory or a ZIP archive).

like, the FS driver will register itself with the VFS, providing a call to mount a filesystem, and the VMount interface provides things like open/opendir/stat/... open may return a VFile, which contains any methods to read/write/seek/close/... within the file, ...

most of this stuff is kept hidden internally though (where code will just open/read/write/... files, without needing to worry too much about things like VFS mounts).

theoretically, a person could also do it Java-style, but IMHO this adds a lot more code and complexity to the API without making really a huge improvement in usability.

another alternative is, granted, just doing something like having a File class or interface (say, an IFile abstract base class) containing a subset of a stdio-like API. if needed, this could be further wrapped to implement more specialized IO interfaces.

admittedly, I am not really personally a huge fan of the design of iostream.

I wasn't a fan of it, either. It seems like it greatly over-complicated putting characters in a stream, but I can see why some decisions were made. After thinking long and hard, I came to realize that my I/O streams function almost just like C++ iostreams, just implemented in C. I had a self-allocated structure holding details of the stream's back-end data storage, which is referenced through an opaque void pointer in the I/O stream structure, and the stream had an operations structure, which was a pointer to a table of function pointers that perform the lowest level functionality that is unique to that stream type, when passed the opaque pointer.

This is just like a stream buffer, just implemented without fancy constructs like inheritance, virtual functions, and other odds and ends of polymorphism.

I also came to see the merit of having two stream positions, after realizing that the fstream seekg() and seekp() are allowed to both move a single pointer internally, which makes the most sense for a file that traditionally only has one pointer. I was sold on allowing two stream positions, but most importantly, it being allowed and, to a certain extent, expected that some types might only have a single internal pointer.

Right now, with all of these revelations in mind, I'm trying to work toward a rewrite using C++'s new language features, and adding support for a peek() function. For standard I/O streams, it seems the only way to provide this is to use ungetc(), because seeking backwards might not be possible. As a result, it seems that I will have to add the stipulation that the standard file stream is flushed before and after it is attached to one of my stream front-ends, which would discard any ungotten characters, and only use the one guaranteed character of ungetc() for peek() operations; actual unget() functionality in my streams is handled separately, so the use of standard I/O's ungetc() would be feasible, given that the user adheres to the promise that they won't try to use it while the standard stream is the child of one of my streams.

This topic is closed to new replies.

Advertisement