Line Endings Problem in Script File

Started by
8 comments, last by WitchLord 15 years, 3 months ago
OK, I'm still starting out so maybe there's a setting in the Angelscript configuration I haven't found, but Windows-style line breaks don't seem to be compatible with the Angelscript parser. I can't possibly be the only one to have encountered this problem can I? And yes, I AM using the latest version 2.15! This script file builds without complaint:
void main(){float a = 1.0f;};
This one fails (I put the ";" at the end so I could have a break point when the parser got to an empty statement)
void main()
{
 float a = 1.0f;
};
In the first one, the function asCParser::GetToken sourcePos == sourceLength == 29 on the next call after the empty semicolon is cleared. In the second one, sourcePos == 33 sourceLength == 36 on the next call so it keeps reading and gets unrecognized tokens which cause the build to fail. --- I don't see any functions in Angelscript to change how it handles line endings, and I do know there are utilities to change between Unix-style and Windows-style line endings, but that's not the point. I want to allow users to write their own scripts and this extra step is unacceptable for that purpose. edit: accidental edit (but see next post, the issue is resolved) [Edited by - polaris2013 on January 3, 2009 12:19:37 PM]
Advertisement
Ok, I feel dumb.

I noticed that I tell it the script length when I was reviewing it. However, in my defense, I copied that code from the Angelscript documentation to calculate the script length, and that code doesn't work. So it's not Angelscript at fault, it's the Angelscript documentation. Either way, it resulted in a few hours of needless debugging to figure out that the problem was with the line endings in the first place.

	char* script;	FILE *f = fopen(szFile, "r");	  	// Determine the size of the file	fseek(f, 0, SEEK_END);	int len = ftell(f);	fseek(f, 0, SEEK_SET);	  	// Load the entire file in one call	script = new char[len];	fread(&script[0], len, 1, f);	  	fclose(f);


That adds 2 characters for each line ending, and the parser only reads back 1 character per line ending. Still would be simpler to have a setting in the parser, because I can't think of a better way to fix this than to read through the whole file and count the number of newlines. Or I guess I could just google something.

edit:
Ok simpler than I thought. Just had to change it to read binary instead of text, now both read 2 chars per line ending. This is the change:
	FILE *f = fopen(szFile, "rb");

Well, I just wasted a whole day changing one character in my game. What will tomorrow bring? =P

[Edited by - polaris2013 on January 3, 2009 11:32:38 AM]
Rather than fiddle with figuring out the size and loading and two separate steps, you can combine those steps together.
  std::ifstream script_file("main.cpp");  std::vector<char> script_contents((std::istreambuf_iterator<char>(script_file)),                                     std::istreambuf_iterator<char>());


Thanks SiCrane, but since I just got it working I'll stick with what I've got. However, now that I look at it, it makes no sense that my change fixed it. The length variable evaluates to 36 whether the file is opened as text or as binary. What changes is how the parser counts it's position (ends up at 36 when I open as binary, ends up at 33 when I open as text). That makes no sense to me at all!!

Angelscript shouldn't have any understanding of how I calculated the length, or how I opened the file. But somehow it does! This is very fishy. Something else is probably at work, but I'm tired of thinking about CR LFs for now so I'm going to leave it at that.

edit:
Hoping Witchlord reads this:
One thing about the Angelscript documentation is that it's got a great "magnifying glass" explaining how each function in used, but it lacks a good "overview" explaining how everything fits together, and the starting documentation is a bit spartan. If I may be so bold, I would be willing to write down my "newbie" FAQ and even go so far as to try to answer all of my questions (make a starter guide) to the best of my ability if it could be used to improve the documentation of Angelscript. Lord knows I've had enough problems getting started, I just hope everyone else doesn't have as much trouble as I've had! :-)

[Edited by - polaris2013 on January 3, 2009 12:02:53 PM]
AngelScript should just ignore the carriage return characters as whitespace. Thus if the line endings are represented by \r\n or just \n it should result in the same thing. Only if the line endings are represented by just \r will AngelScript fail to properly count the lines.

I'll have a closer look at what might be going on behind this problem.

The documentation in AngelScript is still very much in development. I try to write a couple of new pages for it with every release. Though I find it very difficult to foresee what information needs to be explained further.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by polaris2013
Thanks SiCrane, but since I just got it working I'll stick with what I've got. However, now that I look at it, it makes no sense that my change fixed it. The length variable evaluates to 36 whether the file is opened as text or as binary. What changes is how the parser counts it's position (ends up at 36 when I open as binary, ends up at 33 when I open as text). That makes no sense to me at all!!

Angelscript shouldn't have any understanding of how I calculated the length, or how I opened the file. But somehow it does! This is very fishy. Something else is probably at work, but I'm tired of thinking about CR LFs for now so I'm going to leave it at that.



ftell gives the byte position on the file but if the file is opened in text mode, the CR LF characters are translated to LF, which gives a shorter script. So, while the script loaded from the file is only 33 characters long (after line ending translations) you told AngelScript it was 36 characters long, thus causing AngelScript to try reading beyond the end of the script.

Because of problems like this, I always prefer to open files in binary mode. Unfortunately I forgot about this when writing the example in the manual.

Of course, if it wasn't for Microsoft, this wouldn't have been an issue, because it's only Microsoft's version of fopen() that has a text mode for reading files. At least they could have made the binary mode the default mode, but unfortunately they chose to use the text mode as the default mode.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by WitchLord
Of course, if it wasn't for Microsoft, this wouldn't have been an issue, because it's only Microsoft's version of fopen() that has a text mode for reading files. At least they could have made the binary mode the default mode, but unfortunately they chose to use the text mode as the default mode.

It's not actually Microsoft's fault. DOS' CR/LF behavior was inherited from CP/M making that Digital's fault and you'll have to blame the C standard for text mode being default.
Guess I should have checked the history first. :)

Though I thought the text mode was exclusive to DOS/Windows. If I'm not mistaken the t/b modifiers aren't even recognized on some platforms.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

It depends on what you mean by that. On almost every platform that isn't Windows/DOS based, text mode and binary mode result in exact same behavior. However, the C standard requires that fopen() accept "b" as a open mode, so a standard compliant C implementation has to recognize the "b" in the sense that it won't return an error or crash if it sees the "b". (Unlike, for example, passing "redrover" as the open mode to fopen(), which will crash a program compiled with MSVC.)
Ah, so then it's safe to always inform "b" to get a uniform behaviour on all platforms. That's useful to know. Thanks a lot.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

This topic is closed to new replies.

Advertisement