STL file reading performance

Started by
8 comments, last by xtrmntr 18 years, 11 months ago
Hopefully I am doing something drastically wrong here, or perhaps this is normal. I am trying to read from a file using a wifstream (it's a UTF-16 encoded file). The file contains roughly 24000 lines of text where the text is ~length of 6 wchar_t's each. When performing the following statements, it takes about 2-3 seconds (by my observation) to complete: I have tried:

wifstream file("text.txt");
wstringstream buffer;
buffer << file.rdbuf();

And:

wifstream file("text.txt");
wchar_t * buf = new wchar_t[fileSize];
file.read(buf, fileSize);

At first I thought it was due to the fileSize but thought "what the heck" and tried fopen:

FILE* f;
wchar_t * buf = new wchar_t[fileSize];
f = fopen("text.txt", "rb");
fread(buf, fileSize, 2, f);

I found that fopen reads in the entire file without any delays whatsoever(?!?!). Anybody have a clue as to why I may be seeing such a huge degrade in performance using fstream?
Advertisement
Just a guess but...

Have you activated compiler optimizations? Because, as far as I know, C++ ultra-nested iostream is more sensitive to optimizations thank C's simple functions.

Perhaps that's just it, perhaps I'm off track ;)
The streambuf might be reading the file one character at a time, or at least not very many at once, depending on how your implementation has its defaults.

Perhaps giving the streambuf a larger buffer would help?
a) Add std::ios_base::sync_with_stdio(false); prior to your first use of the IOStream library.
b) Have you actually imbued your stream with an UTF-16 locale?
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
Fruny: I'm using the codecvt<> defined by P. J. Plaugher (http://groups-beta.google.com/group/comp.std.c++/msg/960feb01524a8f2d?hl=en&lr=&ie=UTF-8&oe=UTF-8) which allows you to actually read 2-byte characters instead of having it read 1-byte and pad it with 0. If there is a built-in one I would love to know about it.

Turning off stdio syncing made:

wifstream file("text.txt");wchar_t * buf = new wchar_t[fileSize];file.read(buf, fileSize);


work much faster. My assumption is that when I do something like:

wifstream file("text.txt");wstringstream s;s << file.rdbuf();


the speed hit _may_ be due to buffer reallocation? (a bit of a stretch though)

Thanks for the help guys!
Quote:Original post by xtrmntr
Fruny: I'm using the codecvt<> defined by P. J. Plaugher


Fair enough. [smile]

Quote:the speed hit _may_ be due to buffer reallocation? (a bit of a stretch though)


Sounds plausible. Is there a particular reason why you want to use a wstringstream as opposed to directly writing to a vector or a wstring; using stream iterators, for example?
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
The file consists of lines of the format: number tab string. wstringstream was the closest thing to a tokenizer that I could find in the STL. Instead of doing:

getline(file, myString, L'\u0009');int number = _wtoi(myString.c_str());getline(file, myString, L'\u2028');


I can just do:

stringStream >> number >> std::ws >> myString;


At the very least, it looks nicer. Plus there will be other files formatted differently than this one.



On a side note:

- Is there some good reason why file.close() does NOT clear the file streams state? What else would someone do with a closed file stream except open up another file or wait for the variable to be destroyed?
How about

#include <fstream>#include <iterator>#include <vector>struct Foo{   int num;   std::wstring str;};std::wistream& operator>>(std::wistream& is, Foo& f){   is >> f.num >> std::ws;   std::getline(is, f.str);   return is;}int main(){   std::ios_base::sync_with_stdio(false);   std::wifstream file("text.txt");   std::istream_iterator<Foo, wchar_t> begin(file);   std::istream_iterator<Foo, wchar_t> end;   std::vector<Foo> vec(begin, end);}


If the reallocation in the std::vector slows you down, try a std::deque
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
Quote:Is there some good reason why file.close() does NOT clear the file streams state? What else would someone do with a closed file stream except open up another file or wait for the variable to be destroyed?


No. It's an acknowledged flaw of the library that needs to be fixed.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
That's even cooler! :)

rating++;

This topic is closed to new replies.

Advertisement