Jump to content

  • Log In with Google      Sign In   
  • Create Account


std::ifstream::read() failing to read all the bytes.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 Servant of the Lord   Crossbones+   -  Reputation: 18483

Like
5Likes
Like

Posted 31 March 2013 - 09:46 PM

I'm wanting to load a file, and put the entire file into a std::string.

This method works perfectly fine:

//Read the file into the stringStream.
std::ostringstream stringStream;
stringStream << file.rdbuf();
	
//Convert the stringStream into a regular string.
return stringStream.str();

However, I was trying this method (which I've never used before), and encountered a weird issue where it's not reading the entire file.

//Loads the entire file, newlines and all, and returns its contents as a string.
//Returns an empty string if the file doesn't exist.
std::string LoadFileAsString(const std::string &filename, ReadMode readMode)
{
	//Open the file for reading.
	std::ifstream file;
	if(readMode == ReadMode::Binary)
		file.open(filename.c_str(), std::ios_base::in | std::ios_base::binary);
	else
		file.open(filename.c_str(), std::ios_base::in);

	if(!file)
	{
		Log::Message(MSG_SOURCE("FileFunctions", Log::Severity::Error)) << "Failed to load '" << Log_HighlightCyan(GetFilenameFromPath(filename)) << "' at " << Log_DisplayPath(filename)
																		<< "\nDo I have sufficient privileges? Does the file even exist?" << Log::FlushStream;
		return "";
	}

	std::string fileContents;
	
	//Find the size of the file, by seeking to the end.
	file.seekg(0, std::ios::end);
	
	//Resize our buffer to match the size of the file.
	unsigned endPos = file.tellg();
	fileContents.resize(endPos);
	
	//Seek back to the beginning of the file.
	file.seekg(0, std::ios::beg);
	
	//Read the file into the string.
	std::cout << "Buffer size: " << fileContents.size() << std::endl;
	std::cout << "Seek size: " << endPos << std::endl;
	file.read(&fileContents[0], fileContents.size());
	
 	//Temporary stuff while trying to figure out what happened:
	if(file)
        {
		std::cout << "All characters read successfully." << std::endl;
	}
	else
	{
		std::cout << "Error: only " << file.gcount() << " could be read (of " << endPos << ")" << std::endl;
		
		std::cout << "Error flags include:\n"
				  << ((file.rdstate() & std::ifstream::goodbit)? "\tstd::ifstream::goodbit,\n" : "")
				  << ((file.rdstate() & std::ifstream::eofbit)?  "\tstd::ifstream::eofbit,\n" : "")
			      << ((file.rdstate() & std::ifstream::failbit)? "\tstd::ifstream::failbit,\n" : "")
			      << ((file.rdstate() & std::ifstream::badbit)? "\tstd::ifstream::badbit,\n" : "")
				  << std::endl;
	}
	
	return fileContents;
}

The file certainly exists, and it's opening fine, but I'm getting:

Buffer size: 859
Seek size: 859
Error: only 784 could be read (of 859)
Error flags include:
	std::ifstream::eofbit,
	std::ifstream::failbit,

Using a hex editor, I see 859 bytes
Windows Explorer reports 859 bytes

Why is it stopping at only 784?

I checked the character it is stopping at, and it's only a space. I did a Ctrl+F (using the hex editor) for 0x1A (EndOfFile or Substitute char in the ASCII table) and it never occurs in the file.

What is going on?

 

------------------------------------------------------

Ah, I just found the problem through more googling. Since this problem was confusing me, and since I had already typed out the post, I figure I'll just post the thread anyway for anyone else that encounters the problem.

The problem occurs because Windows marks the end of a line with first a carriage return character (\r), and then a line-feed character (\n). Two characters, two bytes.

When loading with ifstream() from file, if in text mode, both bytes will be combined into a single newline when read.
When I seeked to the end of the file, it told me the number of bytes.
But when I read the file into my buffer, it combined the two line-ending characters into one, which counted as a single byte.
So the byte-length of the file was greater than the number of characters read from the file (once carriage returns and newlines were merged into single newline characters), because I was reading in text mode.

This means my solution is one of two things:
A) Read the file in binary mode. The string will then contain both \r and \n in the text.
B) Read the file in text mode, and understand and be prepared for the merging of the two characters.

I'm just going to use option 'B', and cheat by letting the operation 'fail', and if I detect the failbit set, I'll get the real number of characters read using gcount(), and use that to truncate the buffer after-the-fact.


Edited by Servant of the Lord, 31 March 2013 - 11:04 PM.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


Sponsor:

#2 swiftcoder   Senior Moderators   -  Reputation: 9846

Like
1Likes
Like

Posted 01 April 2013 - 04:11 PM

When I seeked to the end of the file, it told me the number of bytes.
But when I read the file into my buffer, it combined the two line-ending characters into one, which counted as a single byte.

This problem becomes infinitely worse when you try to read a variable-length character encoding, like, say, UTF-8 :)

 

For the simple case you describe, I usually default to calling std::getline() in a loop...


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#3 Servant of the Lord   Crossbones+   -  Reputation: 18483

Like
0Likes
Like

Posted 01 April 2013 - 05:01 PM

That's what I do in my (unmentioned) LoadFileAsStringList() function because it returns the contents as a std::vector<std::string>, but for most of my usage I actually want the newlines intact (though I could do without the carriage returns!).

 

I've never dealt with unicode before, but from what I've heard, I can look forward to not enjoying it. laugh.png


It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS