Text file I/O - read line functions and buffering

Started by
2 comments, last by Hannesnisula 10 years, 9 months ago

Hi!

I'm looking into file I/O for text files and I've been wondering about functions which read lines from files. Does anyone know if they normally read in every iteration (for each character) or read into a buffer to maybe speed things up?

I'm developing for Windows and I know the OS buffer readings from files (although one can specify not to, with a flag). Because of this I imagine that reading a single character from a file (the cache, really) in every iteration wouldn't be all too bad compared to buffering the file data yourself, right? Has anyone delved into this or perhaps even benchmarked it?

Advertisement

This depends on what underlying systems you are using. In general, this does not matter for disk io though since, as you say, even if you use unbuffered IO the disk IO still reads a disk block at a time and as such ends up with a small buffer preventing repeatedly hitting the disk. Having said that though, the overhead of the function calls adds up quickly, especially when they talk to OS buffers due to a lot of additional checks that the OS has to perform which you can avoid if you buffer yourself.

I did do a test on this at one time. Using a simple fread 1 character at a time versus simply reading the entire file and then parsing myself, the result was about a 1:5 ratio in favor of the manual parsing. On the other hand, when I used C++ ifstream and the generic read line function, it turned out to be as fast as the full file version which suggests the C++ streams are pretty smart about behind the scenes buffering.

All said and done, I know that the calls to the underlying io functions are generally fairly slow so you want to avoid that. Other solutions though, you just have to test them yourself. It should also be noted that I ran this test several years ago on WinXP, it could easily be completely out of date by this time. So, test test test.. :)

I have doubts this concern is meaningful. While loading data from disk is necessary and might be performance-critical in some situations, I strongly suggest to keep away from it for all serious use only good reason being memory management.

Previously "Krohm"

This depends on what underlying systems you are using. In general, this does not matter for disk io though since, as you say, even if you use unbuffered IO the disk IO still reads a disk block at a time and as such ends up with a small buffer preventing repeatedly hitting the disk. Having said that though, the overhead of the function calls adds up quickly, especially when they talk to OS buffers due to a lot of additional checks that the OS has to perform which you can avoid if you buffer yourself.

I did do a test on this at one time. Using a simple fread 1 character at a time versus simply reading the entire file and then parsing myself, the result was about a 1:5 ratio in favor of the manual parsing. On the other hand, when I used C++ ifstream and the generic read line function, it turned out to be as fast as the full file version which suggests the C++ streams are pretty smart about behind the scenes buffering.

All said and done, I know that the calls to the underlying io functions are generally fairly slow so you want to avoid that. Other solutions though, you just have to test them yourself. It should also be noted that I ran this test several years ago on WinXP, it could easily be completely out of date by this time. So, test test test.. smile.png

Thanks for the great answer!

I have doubts this concern is meaningful. While loading data from disk is necessary and might be performance-critical in some situations, I strongly suggest to keep away from it for all serious use only good reason being memory management.

It's more curiosity than concern, really. I don't think any way would actually impact performance noticeably in my case.

This topic is closed to new replies.

Advertisement