Sign in to follow this  
D.V.D

File Loading Adds Weird Characters to the end

Recommended Posts

D.V.D    1029

Hello everyone, I got some code that loads a text file into a char* buffer and for the most part, it works! The issue is that it adds some weird extra characters to the end. Here is an example:

 

5n6sWXT.png

 

In this example above, the text file has the words "WORKS!" and the char buffer allocated is taken from the size of the file which in this case, is new char [6].

 

Here is the code:

bool DefaultResourceLoader::VLoadResource (std::shared_ptr <ResourceHandle> pResourceHandle)
	{
		std::ifstream loadedFile;
		loadedFile.open(pResourceHandle->GetResourceName());
		loadedFile.read(pResourceHandle->GetWritableResourceBuffer(), pResourceHandle->GetFileSize());
		
		if ( loadedFile.fail() )
		{
			printf("ERROR");
		}
		loadedFile.close();
		return true;
	}

GetResourceName returns the file name as a string. GetWriteableResourceBuffer returns the char* buffer which is of size of the file, in this case, new char[6]. GetFileSize also returns the file size so 6 for this example.

 

Is there any reason why those characters get printed out? I'm using unicode in my VS2010 settings, and I tried type casting it to char* after, or creating a string for it and printing it. I also tried printf and cout but both result in the same thing.

Share this post


Link to post
Share on other sites
bradbobak    1825

 You need to add a '\0' c-string terminator. make the char buffer 1 bigger, then when you read it, assign the last char in the array a value of 0. 

 

int filesize = whatever;
char *buf = new char[filesize + 1];
read_the_data();
buf[filesize] = 0;

 

 This is needed as a c-string holds a null terminator to signify the end of the string.

Share this post


Link to post
Share on other sites
Bregma    9199


shouldn't a text file which is basically just a really long char have the 0 at the end

A file on most modern operating systems is just a stream of bytes.  Technically there is not really such a thing as a 'text file', just a stream of printable characters mixed with line-end markers (like LF on Unix, or CR/LF on DOS) and storing null-terminated C strings in a file will make it unreadable by things that expect traditional text, like text editors.

 

So, no, string a null byte at the end of a string in a 'text file' would be unexpected.  The conversion to C-syle strings needs to be done when the file is read by a C program (or Pascal-style strings when the file is read by a Pascal program, or....).

Share this post


Link to post
Share on other sites
bradbobak    1825

 As far as i know, a text file is treated similar to binary file except that some os's do some linefeed, carriage return translations. Text files are sometimes parsed by reading line-by-line, which, in c++, is straightforward to do (std::getline()). You have the filesize, you know when you've reached the end of file, and you can read line-by-line if you want. Thinking from a text file point of view, i don't even think '\0' is valid in it (does getline() stop at the '\0'). I guess what i'm trying to say, a text file is basically a binary file except it generally contains printable text rather than some sort of binary data and a '\0' would be considered binary data.

 

 (did i get that right, lol) 

 

EDIT: ninja'd

Edited by bradbobak

Share this post


Link to post
Share on other sites
D.V.D    1029

Oh okay, makes sense. It seems weird that char's need a 0 at the end but I guess thats just implementation. Well, now my Resource Cache works so thanks a lot for the help guys!

Share this post


Link to post
Share on other sites
SeanMiddleditch    17565
Somewhat more generally:

There are multiple ways to represent a string. This includes both how the bytes are encoded (ASCII, UCS2, etc.) as well as how the length of a string is determined.

For the latter, C chose to use a sentinel value (the '\0' NUL character) and C++ sort of follows suit most of the time. Other languages (and some libraries built on top of C/C++) choose to use an explicit size field in addition to the contents of the string, and hence have no need for a sentinel. OS files fall in the second category; each file has an explicit length and so there's no need for a sentinel byte stored at the end of every file.

This means that text files in the OS are represented slightly differently than C's utilities expect strings to be formatted. That extra '\0' byte you have to append converts from the OS format to C's antiquated format.

Avoid any such assumptions that strings are sentinel-terminated in your own code if you possibly can and prefer using interfaces/objects that have an explicit size or range.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this