Jump to content

  • Log In with Google      Sign In   
  • Create Account

Text file I/O - read line functions and buffering


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 hannesnisula   Members   -  Reputation: 1038

Like
0Likes
Like

Posted 05 July 2013 - 12:25 PM

Hi!

 

I'm looking into file I/O for text files and I've been wondering about functions which read lines from files. Does anyone know if they normally read in every iteration (for each character) or read into a buffer to maybe speed things up?

 

I'm developing for Windows and I know the OS buffer readings from files (although one can specify not to, with a flag). Because of this I imagine that reading a single character from a file (the cache, really) in every iteration wouldn't be all too bad compared to buffering the file data yourself, right? Has anyone delved into this or perhaps even benchmarked it?



Sponsor:

#2 AllEightUp   Moderators   -  Reputation: 4272

Like
2Likes
Like

Posted 05 July 2013 - 04:25 PM

This depends on what underlying systems you are using.  In general, this does not matter for disk io though since, as you say, even if you use unbuffered IO the disk IO still reads a disk block at a time and as such ends up with a small buffer preventing repeatedly hitting the disk.  Having said that though, the overhead of the function calls adds up quickly, especially when they talk to OS buffers due to a lot of additional checks that the OS has to perform which you can avoid if you buffer yourself.

 

I did do a test on this at one time.  Using a simple fread 1 character at a time versus simply reading the entire file and then parsing myself, the result was about a 1:5 ratio in favor of the manual parsing.  On the other hand, when I used C++ ifstream and the generic read line function, it turned out to be as fast as the full file version which suggests the C++ streams are pretty smart about behind the scenes buffering.

 

All said and done, I know that the calls to the underlying io functions are generally fairly slow so you want to avoid that.  Other solutions though, you just have to test them yourself.  It should also be noted that I ran this test several years ago on WinXP, it could easily be completely out of date by this time.  So, test test test.. :)



#3 Krohm   Crossbones+   -  Reputation: 3261

Like
0Likes
Like

Posted 06 July 2013 - 01:58 AM

I have doubts this concern is meaningful. While loading data from disk is necessary and might be performance-critical in some situations, I strongly suggest to keep away from it for all serious use only good reason being memory management.



#4 hannesnisula   Members   -  Reputation: 1038

Like
0Likes
Like

Posted 06 July 2013 - 02:22 AM

This depends on what underlying systems you are using.  In general, this does not matter for disk io though since, as you say, even if you use unbuffered IO the disk IO still reads a disk block at a time and as such ends up with a small buffer preventing repeatedly hitting the disk.  Having said that though, the overhead of the function calls adds up quickly, especially when they talk to OS buffers due to a lot of additional checks that the OS has to perform which you can avoid if you buffer yourself.

 

I did do a test on this at one time.  Using a simple fread 1 character at a time versus simply reading the entire file and then parsing myself, the result was about a 1:5 ratio in favor of the manual parsing.  On the other hand, when I used C++ ifstream and the generic read line function, it turned out to be as fast as the full file version which suggests the C++ streams are pretty smart about behind the scenes buffering.

 

All said and done, I know that the calls to the underlying io functions are generally fairly slow so you want to avoid that.  Other solutions though, you just have to test them yourself.  It should also be noted that I ran this test several years ago on WinXP, it could easily be completely out of date by this time.  So, test test test.. smile.png

 

Thanks for the great answer!

 

I have doubts this concern is meaningful. While loading data from disk is necessary and might be performance-critical in some situations, I strongly suggest to keep away from it for all serious use only good reason being memory management.

 

It's more curiosity than concern, really. I don't think any way would actually impact performance noticeably in my case.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS