Sign in to follow this  

Why is C++ ifstream so slow in VS.NET 2008

This topic is 3367 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there, I'm trying to read a large log file in C++ using the ifstream library. However, when I start to run the below code, I've noticed that my CPU usage is bumped up significantly e.g: by 60%. Below is my full program. However, when I do rewrite it using the C style I/O, I noticed that my CPU usage is reduced significantly - at such that it is lightning fast. I'm using Visual Studio 2008 Team Suite. Now, I've read on this forum that C++ is NOT slower than C, however, I'm finding it extremely challenging to find a simple way -for the beginner/average C++ programmer-to read a file line-by-line in C++ which is fast/efficient. Now, can someone please help with explaining to me why the below code is so slow? Or is it just that C++ is slower than C anyway and only advanced C++ programmer with in dept knowledge of the C++ language will be able to produce code that is fast?
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
	string b;
	cout<<"Hello World\n";

	ifstream inputFile("c:/messages.log", ios::in);
	std::string s;
	s.clear();

	while(!inputFile.eof())
	{
		getline(inputFile, s);
		cout<<s<<endl;
	}

	getline(cin, b);
	return 0;
}

Kind regards, Jr

Share this post


Link to post
Share on other sites
You don't have to check for eof after each getline :

#include <iostream>
#include <fstream>
#include <string>

using std::cout;
using std::cin;
using std::string;

int main(int argc, char *argv[])
{
string b;
cout<<"Hello World\n";

std::ifstream inputFile("c:/messages.txt", std::ios::in);
string s;
s.clear();

while(getline(inputFile,s))
{
cout<<s<<"\n";
}

getline(cin, b);
return 0;
}




I don't know how much that will speed it up though.Seeing the C code would help more.

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
Now, can someone please help with explaining to me why the below code is so slow? Or is it just that C++ is slower than C anyway and only advanced C++ programmer with in dept knowledge of the C++ language will be able to produce code that is fast?


The way the code itself is inherently slow, the C++ functions are plenty fast for such a task. However you have to make decisions as to which of the three three metrics of: speed, complexity, storage you are willing to sacrifice to get results you want.

The absolute fastest way to get all data from a file would be to use read and load chunks of the file into memory or even the entire file. If you do it by chunks, you sacrifice complexity, since all lines are not the same size, you will have to realign text based on new lines to process it correctly. If you do it by the entire file at once, you sacrifice memory storage since you loaded an entire file at once.

The method you are currently doing sacrifices speed because you have no complexity, it's a fairly trivial process you are using, and storage, since you only consume data as individual line entries so you save memory space.

Depending on what you are wanting to do will dictate which performance metrics you want to sacrifice. If you can afford to load the entire file into memory at once, you will soon see looping through each line and displaying to to the console is the real slowdown. Outputting to the console takes time and for most cases you won't ever notice it. However, if you have amounts of data that are large enough in magnitude, you will certainly see the slowdowns.

However, always consider this when you run into these issues: does the current speed of the program make it unable to do its job? If you were trying to load a 500mb file that took hours to display, the program is unusable and you would need to design something better. If you were only loaded a few hundred kb and it took a minute longer or so than your other method, is that speed difference really worth trying to fix? (That is not to say you should never investigate such things, for that is how you learn [wink])

I myself just ran into quite an interesting problem with working with sending a lot of data over the network. I was using a vector to buffer the data and it was causing all clients to crawl to a stop and wait for the current client to finish the transfer. I switched to a queue, and the problem literally disappeared and the data blazed through all clients. The problem was not necessarily my code, but the poor choice of a container that was resizing too often and being overburdened.

So moral of the story, if you notice weird slow downs, it'll mostly be due to trying to do something that is simply not made to do that task. At that point, your best bet is to try some other methods or consider why you are doing things that way to begin with in an attempt to find something faster. Hope that clears up a few things.

Share this post


Link to post
Share on other sites
Hi Guys,

Below is the equivalent C style code - and it does run faster than it "C++" counterpart.

I'm no advanced user of C++ however, I would surely love to see a simple example - relevant to this topic - where Mr. Stroustrup "allegedly fast C++" can be as fast as the traditional C. If one cannot leverage the speed of C++ without having advanced knowledge of its internal workings, then one might as well use plain old C for performance intensive application and Java if they need their application to be designed in an OO manner?




#include <stdlib.h>
#include <stdio.h>
#include <conio.h>

#define MAX_ARRAY 4096

int main(int argc, char *argv[])
{
FILE * inputFile;
char tmpLine[MAX_ARRAY] = {0};

//Get input file
if ((inputFile = fopen("c:/messages.log", "r")) == NULL)
{
printf("Can't open %s\n", argv[1]);
exit(1);
}

// Loop over the file line by line.
while ((fgets(tmpLine,MAX_ARRAY, inputFile)) != NULL)
{

printf("%s \n", tmpLine);
}

fclose(inputFile);
getch();

return 0;
}





Share this post


Link to post
Share on other sites
- Are you compiling as Release build
- What is the time needed to run C++ and C version (after removing the std::cout/printf)

Quote:
If one cannot leverage the speed of C++ without having advanced knowledge of its internal workings


In order to leverage the "speed", internal knowledge of C++ is the least of your problems. In-depth familiarity of memory cache, concurrency and algorithms will be mandatory.

Quote:
Java if they need their application to be designed in an OO manner?


Yes, that's the correct approach. If you do not have proficient software engineers, using one of modern languages is preferred.

Not because of performance, but if they have difficulty analyzing performance properties of trivial ifstream, they are not even remotely capable of developing in C++ with all its pitfalls and gotchas, which will not result in slow code, but painfully buggy applications.

Quote:
with in dept knowledge of the C++ language will be able to produce code that is fast?


Bigger issue is classification of the problem. Is it even slow? No numbers, no real data, just some claims that it's "slow". CPU usage is not a measure of slowness.

Share this post


Link to post
Share on other sites
As said above, we don't really know that it *is* slow yet. However, if it is, I can think of a couple of improvements:
1: #define _SECURE_SCL 1 // disables Microsoft's "safe" extensions. I don't know how much of an impact this has on streams, though, or if it's only containers/iterators
2: sync_with_stdio(false); // Does what it says on the box. By default, iostreams sync everything with the C I/O. If you don't need that, this may speed things up
3: Use I/O provided by the platform. Windows has functions for accessing files on its own. Use them if you want things to be as efficient as possible.
4: Just read everything into one big buffer in memory, rather than reading a line at a time.

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
If one cannot leverage the speed of C++ without having advanced knowledge of its internal workings, then one might as well use plain old C for performance intensive application and Java if they need their application to be designed in an OO manner?

There's a common misconception that C++ exists to magically give you extra speed when compared to other OO languages. This is not the case. Rather, C++ exists to magically give you some OO functionality on top of C/Assembly. It's there to make life easier for the high-performance programmer, not to make programs faster for the typical application programmer.

Share this post


Link to post
Share on other sites
The two programs are not the same. The C++ program gracefully handles lines of an arbitrary length (limited by memory), whereas the C program handles lines of an extremely large - but fixed - length. You must re-write the programs to do the same thing before meaningfully comparing the performance of the two.

Share this post


Link to post
Share on other sites
OK Antheus, I see you're unable of respond to a question without projecting your arrogance and introducing a undertone of insult.

Nevertheless, for those who are willing to help without showing off how smart/brilliant they are, I've ran some test and the results are outlined below:


The filesize of messages.log is 1.38mb, it has 12732 lines within it.

C Version in Debug Mode, took 26.0 seconds
C Version in Release Mode, took 26.0 seconds

C+ Version Debug Mode, 1.38mb log file, took 65.0 seconds
C+ Version Release Mode, 1.38mb log file, took 61.0 seconds


For such a small application, I cannot imagine that one can spot significant differences between Debug / Release mode.

I just thought that for such a trivial program, that there show not be any significant time differences between the C/C++ I/O?

Thanks for the help,
Jr.

Share this post


Link to post
Share on other sites
Did you make the changes black night suggested?

First off your code is not equivalent. When you insert std::endl into a stream you are requesting a certain behavior that has serious performance implications. At the very least make your output statements the similar std::cout << s << "\n"; vs printf("%s \n", tmpLine);.

Share this post


Link to post
Share on other sites
Ok Guys, I removed the "cout<<s" statements from within the body of the while loop and the overall time dropped to 17.0 seconds. I guess after all the problem was with cout<<. I need to find a c++ work around for cout, but I'm now happy with the 17.0 secs time. Thank you all for the help you've rendered me.

Regards,
Jr.

Share this post


Link to post
Share on other sites
Here are some sample tests with a few different styles. I know, clock() is generally too crude for profiling, but given enough data it can give us an idea:

#include <ctime>
#include <cstdio>
#include <string>
#include <vector>
#include <fstream>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>

const char *const filename = "test.txt";
const int BUF_SIZE = 4096;

void cio_fixed()
{
FILE *file = fopen(filename,"r");
if(!file)
{
printf("Couldn't open: %s",filename);
return;
}

char buffer[BUF_SIZE];
while(fgets(buffer,BUF_SIZE,file))
{
printf("%s\n",buffer);
}
fclose(file);
}

void cio_memory()
{
FILE *file = fopen(filename,"r");
if(!file)
{
printf("Couldn't open: %s",filename);
return;
}

fseek(file,SEEK_END,0);
unsigned len = ftell(file);
fseek(file,SEEK_SET,0);

char *buffer = static_cast<char *>(malloc(len + 1));
fgets(buffer,len,file);

buffer[len] = '\0';
printf("%s\n",buffer);

free(buffer);
fclose(file);
}

void cplusplusio_naive()
{
std::ifstream file(filename);
if(!file)
{
std::cout << "Couldn't open " << filename << '\n';
}

std::string line;
line.reserve(BUF_SIZE);
while(std::getline(file,line))
{
std::cout << line << '\n';
}
}

void cplusplusio_fixed()
{
std::ifstream file(filename);
if(!file)
{
std::cout << "Couldn't open " << filename << '\n';
}

char line[BUF_SIZE + 1];
while(file.read(line,BUF_SIZE))
{
line[BUF_SIZE] = '\0';
std::cout << line << '\n';
}
}

void cplusplusio_with_printf()
{
std::ifstream file(filename);
if(!file)
{
std::cout << "Couldn't open " << filename << '\n';
}

std::string line;
while(std::getline(file,line))
{
printf("%s\n",line.c_str());
}
}

void cplusplusio_memory()
{
std::ifstream file(filename);
if(!file)
{
std::cout << "Couldn't open " << filename << '\n';
}

file.seekg(std::ios_base::end);
size_t size = file.tellg();
file.seekg(std::ios_base::beg);

std::vector<char> buffer(size + 1);

file.read(&buffer.front(),size);
buffer.back() = '\0';
std::cout << &buffer.front() << '\n';
}

typedef void (*test_func_ptr)();

std::vector<std::string> results;

void do_test(const char *name, test_func_ptr test)
{
clock_t time = clock();
test();
fflush(stdout);
std::cout << std::flush;
time = clock() - time;

std::stringstream result;
result << "test: " << name << ' ' << time;
results.push_back(result.str());
}

int main()
{
std::ios::sync_with_stdio(false);
{
std::ofstream out(filename);
for(unsigned i = 0 ; i < 10000 ; ++i )
{
std::string random;
const unsigned size = 50 + rand() % 50;//BUF_SIZE;
random.resize(size);
for(unsigned j = 0 ; j < size ; ++j)
{
random[j] = 'a' + rand() % ('z' - 'a');
}
out << random << '\n';
}
}

#define do_test(X) do_test(#X,&X)
do_test(cio_fixed);
do_test(cio_memory);
do_test(cplusplusio_with_printf);
do_test(cplusplusio_naive);
do_test(cplusplusio_fixed);
do_test(cplusplusio_memory);
#undef do_test

std::copy(results.begin(),results.end(),std::ostream_iterator<std::string>(std::cout,"\n"));
}





Results (MSVC 2008, Release mode):
Quote:

test: cio_fixed 547
test: cio_memory 0
test: cplusplusio_with_printf 484
test: cplusplusio_naive 25719
test: cplusplusio_fixed 25141
test: cplusplusio_memory 0

I have ran it a few times, while adding different tests, they seem to always come out in the same order, relative to one another.

So, if you figure your logs will reliably fit in memory, I would go with that one [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
Below is the equivalent C style code - and it does run faster than it "C++" counterpart.

Apples and oranges. Below is the equivalent C++ counterpart to your C program.

#include <conio.h>
#include <fstream>
#include <iostream>
#include <string>

using namespace std;

const size_t MAX_ARRAY = 4096;
const string filename = "c:/messages.log";

int main(int argc, char *argv[])
{
sync_with_stdio(false);

//Get input file
ifstream inputFile(filename.c_str());
if (!inputFile)
{
cerr << "Can't open " << filename << "\n";
exit(1);
}

// Loop over the file line by line.
string tmpLine(MAX_ARRAY, 0);
while(getline(inputFile,tmpLine))
{
cout << s << "\n";
}

getch();
}


(with repect to Black Knight)

The main difference is that your C code does not reallocate the temporary string each time and your C code does not flush the output stream each time.

The above code allocates the storage for the temporary string up front instead of potentially several realocate/copies in your loop, and does not flush the output stream on each loop iteration. It also detaches the C++ IO from C IO, something the C version does not have to deal with (C does not play well with others: others have to play well with C).

For your C version to really be comparable, you would have add the fflush(stdout) call after your printf and you would have to add code to check to see how much input is available and reallocate your temp string if necessary on each loop iteration.

Yes, you have to understand some fundamentals about how to use your tools in order to get the best use of them. This applies to all tool use. You would not expect to be able to sculpt Michaelangelo's "David" with a chainsaw after only one demo on how to pull start no matter how many axes you've swung. Er, or sumpthing.

Share this post


Link to post
Share on other sites
Quote:
Original post by rip-off
Here are some sample tests with a few different styles. I know, clock() is generally too crude for profiling, but given enough data it can give us an idea:
*** Source Snippet Removed ***

In theory you might be able to make additional gains in performance with std::cout by formatting it first with std::stringstream and then dumping the data to the console. Even just combining single lines with \n before dumping the line should help a little, but streaming the whole thing to a stringstream and then dumping the stringstream's rdbuf() should help a lot.

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
Ok Guys, I removed the "cout<<s" statements from within the body of the while loop and the overall time dropped to 17.0 seconds. I guess after all the problem was with cout<<. I need to find a c++ work around for cout, but I'm now happy with the 17.0 secs time. Thank you all for the help you've rendered me.

Regards,
Jr.


Quote:
Original post by Bregma
The main difference is that your C code does not reallocate the temporary string each time and your C code does not flush the output stream each time.


(I don't recommend "reserving" a std::string like that, BTW, because a string knows its own length, and will happily treat any '\0' characters as part of the actual string data. Thus the sample code would potentially output a few thousand '\0' characters with each line. That happens not to have any effect because of how '\0' is rendered, but it still strikes me as messy. Also, I'm unaware of anything in the standard that would require that the string reuse its internal buffer if used this way.)

Quote:
Original post by Spoonbender
2: sync_with_stdio(false); // Does what it says on the box. By default, iostreams sync everything with the C I/O. If you don't need that, this may speed things up


Seriously. These things make a huge difference. It's set up to sync like that by default, BTW, because (a) if you happened to need it, you would get very strange results without it, and (b) normally people are not concerned with the speed of writing things to the console, because the point of writing things to the console is to have a human read them, and humans read approximately a zillion times slower still.

(I'm guessing that you only pay a heavy price when you stay synced with stdio and flush the buffer repeatedly, since it's at that point that it would have to actually do something to sync, if I'm thinking straight. But I haven't tested it.)

Quote:
Original post by gp343
OK Antheus, I see you're unable of respond to a question without projecting your arrogance and introducing a undertone of insult.


If you detect arrogance and an undertone of insult in that, you're going to find it very difficult to get good answers from competent programmers about anything. Sorry. Asking basic questions about what information you're working from isn't insulting; it's troubleshooting. Asserting that C++ is complex isn't arrogant; it's realistic.

You might want to check your own tone from the OP, for that matter. It could easily be interpreted as trolling. Fortunately, we tend not to assume that kind of malice around here unless forced.

BTW, you actually did one thing especially well that is missed by a fair number of people. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
OK Antheus, I see you're unable of respond to a question without projecting your arrogance and introducing a undertone of insult.


I asked two very direct questions. Second one was:
Quote:
- What is the time needed to run C++ and C version (after removing the std::cout/printf)


Which lead to:
Quote:
Ok Guys, I removed the "cout<<s" statements from within the body of the while loop and the overall time dropped to 17.0 seconds. I guess after all the problem was with cout<<


The rest of the discussion on C++ performance was trying to dispel some of the baseless finger pointing at .Net, Mr. Stroustrup, C++ and similar.

And yes, I can be arrogant. But I wasn't in this post, and I even provided solution to your very problem, along with methodology to verify the result.

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
OK Antheus, I see you're unable of respond to a question without projecting your arrogance and introducing a undertone of insult.


No, I think the person who needs to tone it down is you; there was nothing arrogant or insulting about his post in the slightest. As I see it you are projecting your frustrations on to this reply; please refrain from this in the future.

Share this post


Link to post
Share on other sites
Quote:
Original post by gp343
Ok Guys, I removed the "cout<<s" statements from within the body of the while loop and the overall time dropped to 17.0 seconds. I guess after all the problem was with cout<<. I need to find a c++ work around for cout, but I'm now happy with the 17.0 secs time. Thank you all for the help you've rendered me.
Since nobody pointed this out to you yet, you are operating under another misconception. std::cout is not causing your performance problems, instead it is the fact that you are explicitly flushing every output with std::endl. If you replace std::endl with a literal '\n', it should perform about the same as the printf version (since it will now be equivalent with respect to output handling).

Share this post


Link to post
Share on other sites
Quote:
Original post by swiftcoder
If you replace std::endl with a literal '\n', it should perform about the same as the printf version (since it will now be equivalent with respect to output handling).


One would think so, but my tests don't agree with this. I would like to know why, I can't see what else std::cout could be doing that could be taking so much time.
Quote:

In theory you might be able to make additional gains in performance with std::cout by formatting it first with std::stringstream and then dumping the data to the console. Even just combining single lines with \n before dumping the line should help a little, but streaming the whole thing to a stringstream and then dumping the stringstream's rdbuf() should help a lot.

Adding this test:

void cplusplusio_sstream()
{
std::ifstream file(filename);
if(!file)
{
std::cout << "Couldn't open " << filename << '\n';
}

std::string line;
std::stringstream stream;
while(std::getline(file,line))
{
stream << line << '\n';
}

std::cout << stream.rdbuf();
}


Not much of a change in results:
Quote:

test: cio_fixed 3797
test: cio_memory 0
test: cplusplusio_with_printf 500
test: cplusplusio_naive 26297
test: cplusplusio_fixed 25375
test: cplusplusio_memory 0
test: cplusplusio_sstream 25844

Share this post


Link to post
Share on other sites

This topic is 3367 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this