• Advertisement
Sign in to follow this  

ungetc() Behavior

This topic is 1968 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.


Recommended Posts

As I'm sure many of you know, the C standard says that at least one character can be pushed back on to the stream with the ungetc() function. However, as for when more successive ungetc() calls are allowed, all bets are off as to how it works. I've finally gotten around to dealing with the frustration of adding an unget function to my I/O stream library, and it supports multiple ungets (up to 16 currently, but can be changed by modifying a constant). I've read documentation from all sorts of C libraries, and how they handle multiple calls to ungetc() is wildly different. I couldn't even get a grasp of what was the most popular method. What follows is a description of my current implementation:

Maximum of 16 ungotten characters; can be changed through modifying a constant.
Characters that are put back are read in LIFO order.
The put-back buffer is discarded after any seeking operation (fseek(), fsetpos(), rewind()), and the file position is unchanged from where ungetc() left it. I'm considering adding fflush() to this list.
A write operation starts writing at where the modified file position is; for streams that need to be flushed or seek()'ed between mode changes, this is irrelevant. It discards an amount of put-back characters equal to the amount written.

If anyone could share their thoughts on what makes the most sense, I'd appreciate it.

Share this post


Link to post
Share on other sites
Advertisement

If anyone could share their thoughts on what makes the most sense, I'd appreciate it.

Don't allow ungetc. It breaks all the assumptions present in the underlying stream model, and doesn't add any functionality that application-level buffering couldn't implement more reliably.

That's my two cents.

Share this post


Link to post
Share on other sites

[quote name='Ectara' timestamp='1346606337' post='4975775']
If anyone could share their thoughts on what makes the most sense, I'd appreciate it.

Don't allow ungetc. It breaks all the assumptions present in the underlying stream model, and doesn't add any functionality that application-level buffering couldn't implement more reliably.

That's my two cents.
[/quote]
Noted. However, the "underlying stream model" could be one of countless different stream types; on top of having many different terminal stream types, it could be any of numerous stream attachments, that operate through the same interface, perform operations on the data, and pass it through to the next part of the stream. All of these terminal streams and stream attachments are aware that various things could happen before they give or get information, and I've taken precautions to ensure that ungetc() doesn't break them, among various other operations. IMHO, allowing ungetc() would be a lifesaver at times; a stream that cannot be rewound would be allowed to look ahead, and step back if it was mistaken. Buffering the stream might not be beneficial; having to pass the buffer around from whatever had to look ahead to whatever must read it next is cumbersome, and while these streams support a tremendous array of buffer settings, the buffers must be flushed when one attempts to seek through the stream, to ensure that the data read is up to data, and that the data written is committed to the stream. Attempting to flush the buffers might entail fetching another buffer from the stream, which could require rewinding the unrewindable streams.

In short, while I can debate many design decisions of the C standard, I believe this one to be beneficial, and well thought-out.

However, if someone can name one type of stream that is readable and writable, but not rewindable, I will take this into account, because ungetc() would have consequences. Edited by Ectara

Share this post


Link to post
Share on other sites
I'm a little unclear on what you are trying to do here. Are you building your own implementation of the standard library streams, or are you building a distinct streams library, modelled on the standard library?

Regardless, some philosophical ramblings re streams libraries:

- The presence of combined read/write streams always bothers me. There is no use case for these that can't be handled either with separate read and write streams, or by combining a stream with an application-side buffer.

- ungetc(), rewind() and passing a negative argument to seek() all violate the idea that a stream is intended as an in-order iteration. I'd prefer that all of these operations were disallowed, since you can either rearrange your reads to be sequential, or emulate all of the above via an application-side buffer.

Share this post


Link to post
Share on other sites

I'm a little unclear on what you are trying to do here. Are you building your own implementation of the standard library streams, or are you building a distinct streams library, modelled on the standard library?

Regardless, some philosophical ramblings re streams libraries:

- The presence of combined read/write streams always bothers me. There is no use case for these that can't be handled either with separate read and write streams, or by combining a stream with an application-side buffer.

- ungetc(), rewind() and passing a negative argument to seek() all violate the idea that a stream is intended as an in-order iteration. I'd prefer that all of these operations were disallowed, since you can either rearrange your reads to be sequential, or emulate all of the above via an application-side buffer.

It's a separate library, with the standard library as inspiration for its interface.

Well, I do have a use for read/write streams; the stream is not always one directional, which allows for things such as my archive format. It can be set up to create new streams with file handles into the archive, so being able to read and write to the files in the archive uses the ability to seek back and forth in the archive's main stream. Buffering the entire archive isn't feasible, since the files can be enormous. However, if you wanted to buffer the entire thing, there's a stream attachment for that, which allows to you either use a buffer, or memory map the file, which is transparent to the user, and it can be seeked, written, and read like any other stream.

I like ungetc() over seeking backward. Again, buffering all that you might need is not always feasible. Then you must keep track of the buffer's lifetime, pass it to everywhere that might need to read the file next (which is a pain), and have enough memory for what could be an 8gb file. Also:

Buffering the stream might not be beneficial; having to pass the buffer around from whatever had to look ahead to whatever must read it next is cumbersome, and while these streams support a tremendous array of buffer settings, the buffers must be flushed when one attempts to seek through the stream, to ensure that the data read is up to data, and that the data written is committed to the stream. Attempting to flush the buffers might entail fetching another buffer from the stream, which could require rewinding the unrewindable streams.


It's well and fine, when you're reading the input, and reading a model file of some sort, but if you're parsing text, and then the stream needs to be passed to something else that will read, too, reading the whole stream into a buffer is a hard thing to manage. The main benefit of ungetc(), in my eyes, is that if you have to read ahead to know if there's still an action to be performed, you can logically step back a character, and the next thing to have the stream passed to it won't know the difference (if done right).

Again, there's an attachment that will buffer the whole file for you, and emulate a stream when reading and writing to it. So, if buffering the whole file is your thing, it is possible and easy to do with this stream model, and passing the same stream with a transparent attachment is easier than passing around a buffer and its information. However, often it isn't the best option, so ungetc() and the other operations are implemented. It's easy to tell the stream that it is unrewindable, which would make negative seeking fail, but ungetc() should still succeed.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement