Line numbers for non-terminated string literal?

Started by
5 comments, last by BlackMoons 9 years, 9 months ago

With the following code:


string foo = "
string bar = "Hello";
string bar2 = "World";

Angelscript throws a compile error of "non-terminated string literal" line 3 char 21

When it really should be throwing the error on line 1 char 14.

In much larger more complex files, it seems to always throw the error on the very last string literal regardless where it happens in the file and does not complain about anything else. Makes it kinda hard to find out what string literal you messed up on.

Lead Coder/Game Designer for Brutal Nature: http://BrutalNature.com

Advertisement

The AngelScript parser works much the same way that the the syntax highlighter on this forum. It identifies two valid multi-line strings like this "\nstring bar = " and ";\nstring bar2 = ", and then only sees the last "; as non-terminated.

I understand the location of the error message isn't ideal, and I'll think about a way to improve this without breaking the support for multi-line strings.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

This is definitely a valid string:


string foo = "
string bar = "

But it's missing a semi-colon to close it. Could you detect string literals that are not followed by the appropriate closing symbol (comma, semicolon, closing-parenthesis, etc...), and flag the line that the string literal first opened on? Unexpected tokens (on line 37) after string literal originating on line 23. Expected ';' after string, got "Hello".

And how does this parse?


Hello";
string bar2 = "

Shouldn't it warn that 'Hello' hasn't yet been declared, and that there is a missing assignment operator?

Some languages require strings to be on a single line, and use special syntax for multi-line strings (usually two or more quotation marks in a row, with an equal number of quotation marks to close the string). This even allows non-escaped quotation marks within it.

Example:


string foo = """
This is
a "multi-line"
string literal"""

This syntax is especially nice for embedding stuff like HTML directly in code.

I'm sure the parser gave other errors than just the 'non-terminated string literal'. If it didn't it is definitely something that needs to be improved. I'll investigate what can be done to improve the error reporting.

AngelScript supports heredoc strings too, initiated with triple ", like your example. The difference between an ordinary multi-line string and a heredoc string is that the latter doesn't translate escape sequences.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

No, It does not give any other errors (Unless my code is somehow obscuring them). It does not appear to detect that "Hello" is not valid syntax if it did pick up the parsing wrong.

Even in a *very* large file, it gave no errors whatsoever except for 'non-terminated string literal' on a the very last string literal of the file. that is why I thought it was so odd. I would understand if the parser puked a few lines after the non-terminated string literal, but when it complains at the very last valid literal suggests that it something is going wrong in detection.

And yea, I didn't think that Angelscript supported multi-line string literals using ", I thought that only worked with """

Hence detecting a broken " string literal should be easy, Just look for a newline in the middle of a " and you know its wrong. I thought that is what the existing code did, but for some reason it kept going after the error and reported the error on the very last string literal to be parsed.

Lead Coder/Game Designer for Brutal Nature: http://BrutalNature.com

Ah, this must be because the non-terminated string is detected during the superficial parsing of a statement block of a function. In this first pass the parser is not evaluating the sequence of tokens, it is simply looking for the end of the statement block.

I can probably make the parser do a second pass when it finds a non-terminated string in order to get more information about the error. Since it is an error condition it is not something that will hurt the performance.

Multi-line strings with single " is supported through an optional engine property. By default it is turned off, but those who prefer this syntax can turn it on with SetEngineProperty(asEP_ALLOW_MULTILINE_STRINGS, true);

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Awesome. Yea its very likely none of my string literals contained ; to end a statement block, that may be why it never flagged any more errors?

Lead Coder/Game Designer for Brutal Nature: http://BrutalNature.com

This topic is closed to new replies.

Advertisement