File Loading Adds Weird Characters to the end

Started by
5 comments, last by SeanMiddleditch 9 years, 11 months ago

Hello everyone, I got some code that loads a text file into a char* buffer and for the most part, it works! The issue is that it adds some weird extra characters to the end. Here is an example:

5n6sWXT.png

In this example above, the text file has the words "WORKS!" and the char buffer allocated is taken from the size of the file which in this case, is new char [6].

Here is the code:


bool DefaultResourceLoader::VLoadResource (std::shared_ptr <ResourceHandle> pResourceHandle)
	{
		std::ifstream loadedFile;
		loadedFile.open(pResourceHandle->GetResourceName());
		loadedFile.read(pResourceHandle->GetWritableResourceBuffer(), pResourceHandle->GetFileSize());
		
		if ( loadedFile.fail() )
		{
			printf("ERROR");
		}
		loadedFile.close();
		return true;
	}

GetResourceName returns the file name as a string. GetWriteableResourceBuffer returns the char* buffer which is of size of the file, in this case, new char[6]. GetFileSize also returns the file size so 6 for this example.

Is there any reason why those characters get printed out? I'm using unicode in my VS2010 settings, and I tried type casting it to char* after, or creating a string for it and printing it. I also tried printf and cout but both result in the same thing.

Advertisement

You need to add a '\0' c-string terminator. make the char buffer 1 bigger, then when you read it, assign the last char in the array a value of 0.

int filesize = whatever;
char *buf = new char[filesize + 1];
read_the_data();
buf[filesize] = 0;

This is needed as a c-string holds a null terminator to signify the end of the string.

Oh awesome, that works! Thanks, but shouldn't a text file which is basically just a really long char have the 0 at the end as well?


shouldn't a text file which is basically just a really long char have the 0 at the end

A file on most modern operating systems is just a stream of bytes. Technically there is not really such a thing as a 'text file', just a stream of printable characters mixed with line-end markers (like LF on Unix, or CR/LF on DOS) and storing null-terminated C strings in a file will make it unreadable by things that expect traditional text, like text editors.

So, no, string a null byte at the end of a string in a 'text file' would be unexpected. The conversion to C-syle strings needs to be done when the file is read by a C program (or Pascal-style strings when the file is read by a Pascal program, or....).

Stephen M. Webb
Professional Free Software Developer

As far as i know, a text file is treated similar to binary file except that some os's do some linefeed, carriage return translations. Text files are sometimes parsed by reading line-by-line, which, in c++, is straightforward to do (std::getline()). You have the filesize, you know when you've reached the end of file, and you can read line-by-line if you want. Thinking from a text file point of view, i don't even think '\0' is valid in it (does getline() stop at the '\0'). I guess what i'm trying to say, a text file is basically a binary file except it generally contains printable text rather than some sort of binary data and a '\0' would be considered binary data.

(did i get that right, lol)

EDIT: ninja'd

Oh okay, makes sense. It seems weird that char's need a 0 at the end but I guess thats just implementation. Well, now my Resource Cache works so thanks a lot for the help guys!

Somewhat more generally:

There are multiple ways to represent a string. This includes both how the bytes are encoded (ASCII, UCS2, etc.) as well as how the length of a string is determined.

For the latter, C chose to use a sentinel value (the '\0' NUL character) and C++ sort of follows suit most of the time. Other languages (and some libraries built on top of C/C++) choose to use an explicit size field in addition to the contents of the string, and hence have no need for a sentinel. OS files fall in the second category; each file has an explicit length and so there's no need for a sentinel byte stored at the end of every file.

This means that text files in the OS are represented slightly differently than C's utilities expect strings to be formatted. That extra '\0' byte you have to append converts from the OS format to C's antiquated format.

Avoid any such assumptions that strings are sentinel-terminated in your own code if you possibly can and prefer using interfaces/objects that have an explicit size or range.

Sean Middleditch – Game Systems Engineer – Join my team!

This topic is closed to new replies.

Advertisement