Sign in to follow this  
xtrmntr

STL file reading performance

Recommended Posts

xtrmntr    144
Hopefully I am doing something drastically wrong here, or perhaps this is normal. I am trying to read from a file using a wifstream (it's a UTF-16 encoded file). The file contains roughly 24000 lines of text where the text is ~length of 6 wchar_t's each. When performing the following statements, it takes about 2-3 seconds (by my observation) to complete: I have tried:
wifstream file("text.txt");
wstringstream buffer;
buffer << file.rdbuf();

And:
wifstream file("text.txt");
wchar_t * buf = new wchar_t[fileSize];
file.read(buf, fileSize);

At first I thought it was due to the fileSize but thought "what the heck" and tried fopen:
FILE* f;
wchar_t * buf = new wchar_t[fileSize];
f = fopen("text.txt", "rb");
fread(buf, fileSize, 2, f);

I found that fopen reads in the entire file without any delays whatsoever(?!?!). Anybody have a clue as to why I may be seeing such a huge degrade in performance using fstream?

Share this post


Link to post
Share on other sites
persil    199
Just a guess but...

Have you activated compiler optimizations? Because, as far as I know, C++ ultra-nested iostream is more sensitive to optimizations thank C's simple functions.

Perhaps that's just it, perhaps I'm off track ;)

Share this post


Link to post
Share on other sites
me22    212
The streambuf might be reading the file one character at a time, or at least not very many at once, depending on how your implementation has its defaults.

Perhaps giving the streambuf a larger buffer would help?

Share this post


Link to post
Share on other sites
Fruny    1658
a) Add std::ios_base::sync_with_stdio(false); prior to your first use of the IOStream library.
b) Have you actually imbued your stream with an UTF-16 locale?

Share this post


Link to post
Share on other sites
xtrmntr    144
Fruny: I'm using the codecvt<> defined by P. J. Plaugher ([url]http://groups-beta.google.com/group/comp.std.c++/msg/960feb01524a8f2d?hl=en&lr=&ie=UTF-8&oe=UTF-8[/url]) which allows you to actually read 2-byte characters instead of having it read 1-byte and pad it with 0. If there is a built-in one I would love to know about it.

Turning off stdio syncing made:


wifstream file("text.txt");
wchar_t * buf = new wchar_t[fileSize];
file.read(buf, fileSize);



work much faster. My assumption is that when I do something like:


wifstream file("text.txt");
wstringstream s;
s << file.rdbuf();



the speed hit _may_ be due to buffer reallocation? (a bit of a stretch though)

Thanks for the help guys!

Share this post


Link to post
Share on other sites
Fruny    1658
Quote:
Original post by xtrmntr
Fruny: I'm using the codecvt<> defined by P. J. Plaugher


Fair enough. [smile]

Quote:
the speed hit _may_ be due to buffer reallocation? (a bit of a stretch though)


Sounds plausible. Is there a particular reason why you want to use a wstringstream as opposed to directly writing to a vector or a wstring; using stream iterators, for example?

Share this post


Link to post
Share on other sites
xtrmntr    144
The file consists of lines of the format: number tab string. wstringstream was the closest thing to a tokenizer that I could find in the STL. Instead of doing:


getline(file, myString, L'\u0009');
int number = _wtoi(myString.c_str());
getline(file, myString, L'\u2028');



I can just do:


stringStream >> number >> std::ws >> myString;



At the very least, it looks nicer. Plus there will be other files formatted differently than this one.



On a side note:

- Is there some good reason why file.close() does NOT clear the file streams state? What else would someone do with a closed file stream except open up another file or wait for the variable to be destroyed?

Share this post


Link to post
Share on other sites
Fruny    1658
How about


#include <fstream>
#include <iterator>
#include <vector>

struct Foo
{
int num;
std::wstring str;
};

std::wistream& operator>>(std::wistream& is, Foo& f)
{
is >> f.num >> std::ws;
std::getline(is, f.str);
return is;
}

int main()
{
std::ios_base::sync_with_stdio(false);

std::wifstream file("text.txt");
std::istream_iterator<Foo, wchar_t> begin(file);
std::istream_iterator<Foo, wchar_t> end;
std::vector<Foo> vec(begin, end);
}


If the reallocation in the std::vector slows you down, try a std::deque

Share this post


Link to post
Share on other sites
Fruny    1658
Quote:
Is there some good reason why file.close() does NOT clear the file streams state? What else would someone do with a closed file stream except open up another file or wait for the variable to be destroyed?


No. It's an acknowledged flaw of the library that needs to be fixed.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this