Jump to content

  • Log In with Google      Sign In   
  • Create Account

File Loading Adds Weird Characters to the end


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
6 replies to this topic

#1 D.V.D   Members   -  Reputation: 517

Like
0Likes
Like

Posted 19 May 2014 - 04:13 PM

Hello everyone, I got some code that loads a text file into a char* buffer and for the most part, it works! The issue is that it adds some weird extra characters to the end. Here is an example:

 

5n6sWXT.png

 

In this example above, the text file has the words "WORKS!" and the char buffer allocated is taken from the size of the file which in this case, is new char [6].

 

Here is the code:

bool DefaultResourceLoader::VLoadResource (std::shared_ptr <ResourceHandle> pResourceHandle)
	{
		std::ifstream loadedFile;
		loadedFile.open(pResourceHandle->GetResourceName());
		loadedFile.read(pResourceHandle->GetWritableResourceBuffer(), pResourceHandle->GetFileSize());
		
		if ( loadedFile.fail() )
		{
			printf("ERROR");
		}
		loadedFile.close();
		return true;
	}

GetResourceName returns the file name as a string. GetWriteableResourceBuffer returns the char* buffer which is of size of the file, in this case, new char[6]. GetFileSize also returns the file size so 6 for this example.

 

Is there any reason why those characters get printed out? I'm using unicode in my VS2010 settings, and I tried type casting it to char* after, or creating a string for it and printing it. I also tried printf and cout but both result in the same thing.



Sponsor:

#2 bradbobak   Members   -  Reputation: 1154

Like
3Likes
Like

Posted 19 May 2014 - 04:17 PM

 You need to add a '\0' c-string terminator. make the char buffer 1 bigger, then when you read it, assign the last char in the array a value of 0. 

 

int filesize = whatever;
char *buf = new char[filesize + 1];
read_the_data();
buf[filesize] = 0;

 

 This is needed as a c-string holds a null terminator to signify the end of the string.



#3 D.V.D   Members   -  Reputation: 517

Like
0Likes
Like

Posted 19 May 2014 - 04:35 PM

Oh awesome, that works! Thanks, but shouldn't a text file which is basically just a really long char have the 0 at the end as well?



#4 Bregma   Crossbones+   -  Reputation: 5498

Like
2Likes
Like

Posted 19 May 2014 - 04:59 PM


shouldn't a text file which is basically just a really long char have the 0 at the end

A file on most modern operating systems is just a stream of bytes.  Technically there is not really such a thing as a 'text file', just a stream of printable characters mixed with line-end markers (like LF on Unix, or CR/LF on DOS) and storing null-terminated C strings in a file will make it unreadable by things that expect traditional text, like text editors.

 

So, no, string a null byte at the end of a string in a 'text file' would be unexpected.  The conversion to C-syle strings needs to be done when the file is read by a C program (or Pascal-style strings when the file is read by a Pascal program, or....).


Stephen M. Webb
Professional Free Software Developer

#5 bradbobak   Members   -  Reputation: 1154

Like
1Likes
Like

Posted 19 May 2014 - 05:00 PM

 As far as i know, a text file is treated similar to binary file except that some os's do some linefeed, carriage return translations. Text files are sometimes parsed by reading line-by-line, which, in c++, is straightforward to do (std::getline()). You have the filesize, you know when you've reached the end of file, and you can read line-by-line if you want. Thinking from a text file point of view, i don't even think '\0' is valid in it (does getline() stop at the '\0'). I guess what i'm trying to say, a text file is basically a binary file except it generally contains printable text rather than some sort of binary data and a '\0' would be considered binary data.

 

 (did i get that right, lol) 

 

EDIT: ninja'd


Edited by bradbobak, 19 May 2014 - 05:02 PM.


#6 D.V.D   Members   -  Reputation: 517

Like
0Likes
Like

Posted 19 May 2014 - 07:39 PM

Oh okay, makes sense. It seems weird that char's need a 0 at the end but I guess thats just implementation. Well, now my Resource Cache works so thanks a lot for the help guys!



#7 SeanMiddleditch   Members   -  Reputation: 7261

Like
0Likes
Like

Posted 19 May 2014 - 07:51 PM

Somewhat more generally:

There are multiple ways to represent a string. This includes both how the bytes are encoded (ASCII, UCS2, etc.) as well as how the length of a string is determined.

For the latter, C chose to use a sentinel value (the '\0' NUL character) and C++ sort of follows suit most of the time. Other languages (and some libraries built on top of C/C++) choose to use an explicit size field in addition to the contents of the string, and hence have no need for a sentinel. OS files fall in the second category; each file has an explicit length and so there's no need for a sentinel byte stored at the end of every file.

This means that text files in the OS are represented slightly differently than C's utilities expect strings to be formatted. That extra '\0' byte you have to append converts from the OS format to C's antiquated format.

Avoid any such assumptions that strings are sentinel-terminated in your own code if you possibly can and prefer using interfaces/objects that have an explicit size or range.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS