Working with large files in ifstream and string

Started by
6 comments, last by Drew_Benton 18 years, 1 month ago
I recently tried to work with a HTML file using C++. The HTML file, as can be expected in a lot of cases, is pretty long - in this case, 600 lines (and many lines run for over 1000 characters). However, since I'm doing a lot of searching in the file, it would have been very convenient to treat the file as one big string. I tried to save the entire file (opened using std::ifstream) into a single std::string. When the program is run, it simply stalls. C++ streams and the workings thereof are still a mysterious topic to me, and I'm clueless about how to deal with errors when they arise in such cases. I did a little digging and found a special field in std::ifstream called failbit. Apparently, this bit can be tested using std::ifstream::fail() (is this correct?). I tested this function in the loop used to read the file and found that this function returns true when the current line of the file is 512. Could this be because the file is too big for std::ifstream? Or is it my attempt to save so many characters to the std::string that is causing the problem? Or is there a weird character at line 512 that may cause std::ifstream to fail? The relevant portion of my source code:

//...
std::ifstream in( "..." );
std::string file_contents;
//...
unsigned long i = 0;
	while( !in.eof() )
	{
		++i;

		char szOneline[ 4096 ];
		in.getline( szOneline, 4095 );

		if( in.fail() )
		{
			std::cout << "Failure at line " << i+1 << std::endl;
			i=0;
			std::cin.get();
			return 0;
		}

		file_contents += szOneline;
		file_contents += '\n';
	}


Is there a more convenient way to work with HTML in C++?
.:<<-v0d[KA]->>:.
Advertisement
void ReadFile(std::string const & pFilePath, std::string & pData){	if(pFilePath.empty())		return void();	else if(!pData.empty())		pData.clear();	std::ifstream pInputStream(pFilePath.c_str());	if(!pInputStream.good())		return void();	std::getline(pInputStream, pData, '\x00');	pInputStream.close();}


Just tried this with a 7 mb file, worked fine...
According to Josuttis: The C++ Standard Library

Failbit: is set if an operation was not processed correctly but the stream is generally IK. Normally this flag is set as a result of a format error during reading. For example, this flag is set if an integer is to be read but the next character is a letter.
Badbit: is set if the stream is somehow corrupted or if data is lost. For example, this flag is set when positioning a stream that refers to a file before the beginning of the file.

fail() returns true if an error has occurred (failbit or badbit is set).

Hope this helps,
EmmetjeGee
#include <algorithm>#include <fstream>#include <iterator>#include <string>int main(){  ...  std::string text;  copy(istream_iterator<char>(ifstream("filename")), istream_iterator<char>(), back_insert_iterator<string>(text));  ...}


While the above works, never write code like this. It's too dense, it cascades too many operations without robustness, and it's fairly opaque to most people. I only wrote it to illustrate a number of constructs available for operations like reading an entire file into a string (std::back_insert_iterator, std::istream_iterator, std::copy).

I believe there's also a solution involving filebufs, but I can't quite figure out the semantics just yet.
Got this from Drew Benton:

std::string fileToText( const std::string &filename ){   std::ifstream fin(filename.c_str());   return std::string((std::istreambuf_iterator<char>(fin)),std::istreambuf_iterator<char>());}


It has worked on any file I've tried it with...
That's what I was looking for. Good job, rip-off!
Quote:While the above works, never write code like this. It's too dense, it cascades too many operations without robustness, and it's fairly opaque to most people.


ifstream file("filename");istream_iterator<char> file_begin(file);istream_iterator<char> file_end;std::string text(file_begin, file_end);
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
Quote:Original post by rip-off
Got this from Drew Benton:

*** Source Snippet Removed ***

It has worked on any file I've tried it with...


For the record, I take no credit for that code, it's all Enigma (as well as a great thread to bookmark) [wink]

This topic is closed to new replies.

Advertisement