I'm wanting to load a file, and put the entire file into a std::string.
This method works perfectly fine:
//Read the file into the stringStream.
std::ostringstream stringStream;
stringStream << file.rdbuf();
//Convert the stringStream into a regular string.
return stringStream.str();
However, I was trying this method (which I've never used before), and encountered a weird issue where it's not reading the entire file.
//Loads the entire file, newlines and all, and returns its contents as a string.
//Returns an empty string if the file doesn't exist.
std::string LoadFileAsString(const std::string &filename, ReadMode readMode)
{
//Open the file for reading.
std::ifstream file;
if(readMode == ReadMode::Binary)
file.open(filename.c_str(), std::ios_base::in | std::ios_base::binary);
else
file.open(filename.c_str(), std::ios_base::in);
if(!file)
{
Log::Message(MSG_SOURCE("FileFunctions", Log::Severity::Error)) << "Failed to load '" << Log_HighlightCyan(GetFilenameFromPath(filename)) << "' at " << Log_DisplayPath(filename)
<< "\nDo I have sufficient privileges? Does the file even exist?" << Log::FlushStream;
return "";
}
std::string fileContents;
//Find the size of the file, by seeking to the end.
file.seekg(0, std::ios::end);
//Resize our buffer to match the size of the file.
unsigned endPos = file.tellg();
fileContents.resize(endPos);
//Seek back to the beginning of the file.
file.seekg(0, std::ios::beg);
//Read the file into the string.
std::cout << "Buffer size: " << fileContents.size() << std::endl;
std::cout << "Seek size: " << endPos << std::endl;
file.read(&fileContents[0], fileContents.size());
//Temporary stuff while trying to figure out what happened:
if(file)
{
std::cout << "All characters read successfully." << std::endl;
}
else
{
std::cout << "Error: only " << file.gcount() << " could be read (of " << endPos << ")" << std::endl;
std::cout << "Error flags include:\n"
<< ((file.rdstate() & std::ifstream::goodbit)? "\tstd::ifstream::goodbit,\n" : "")
<< ((file.rdstate() & std::ifstream::eofbit)? "\tstd::ifstream::eofbit,\n" : "")
<< ((file.rdstate() & std::ifstream::failbit)? "\tstd::ifstream::failbit,\n" : "")
<< ((file.rdstate() & std::ifstream::badbit)? "\tstd::ifstream::badbit,\n" : "")
<< std::endl;
}
return fileContents;
}
The file certainly exists, and it's opening fine, but I'm getting:
Buffer size: 859
Seek size: 859
Error: only 784 could be read (of 859)
Error flags include:
std::ifstream::eofbit,
std::ifstream::failbit,
Using a hex editor, I see 859 bytes
Windows Explorer reports 859 bytes
Why is it stopping at only 784?
I checked the character it is stopping at, and it's only a space. I did a Ctrl+F (using the hex editor) for 0x1A (EndOfFile or Substitute char in the ASCII table) and it never occurs in the file.
What is going on?
------------------------------------------------------
Ah, I just found the problem through more googling. Since this problem was confusing me, and since I had already typed out the post, I figure I'll just post the thread anyway for anyone else that encounters the problem.
The problem occurs because Windows marks the end of a line with first a carriage return character (\r), and then a line-feed character (\n). Two characters, two bytes.
When loading with ifstream() from file, if in text mode, both bytes will be combined into a single newline when read.
When I seeked to the end of the file, it told me the number of bytes.
But when I read the file into my buffer, it combined the two line-ending characters into one, which counted as a single byte.
So the byte-length of the file was greater than the number of characters read from the file (once carriage returns and newlines were merged into single newline characters), because I was reading in text mode.
This means my solution is one of two things:
A) Read the file in binary mode. The string will then contain both \r and \n in the text.
B) Read the file in text mode, and understand and be prepared for the merging of the two characters.
I'm just going to use option 'B', and cheat by letting the operation 'fail', and if I detect the failbit set, I'll get the real number of characters read using gcount(), and use that to truncate the buffer after-the-fact.