Jump to content

  • Log In with Google      Sign In   
  • Create Account

Lexical Parsing With Meaningful Errors


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 Ectara   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 25 July 2012 - 01:55 PM

I have a lexical parser that I'm writing that parses as it reads the input with a shift and reduce parser that accumulates lexemes until it can determine that there is syntactic meaning to the topmost tokens on the stack, then it groups them as children to a node of the non-terminal's type, then pushes the new node. At the end of parsing, it results in an abstract syntax tree; however, if there is an error in parsing and it is malformed, I have the feeling that the tree will be incorrectly generated, with error information being difficult to recover. The tree generation uses the lexer's underlying internal functions for handling the input text, and returning the next lexeme. If the input is perfectly formed, it would result in a correct tree, without any line numbers or information associated with the input text. Is there a better way to parse the files that would preserve the information? With slight modification, I suppose I could track column and line numbers.

Sponsor:

#2 Telastyn   Crossbones+   -  Reputation: 3730

Like
0Likes
Like

Posted 25 July 2012 - 02:02 PM

At the end of parsing, it results in an abstract syntax tree; however, if there is an error in parsing and it is malformed, I have the feeling that the tree will be incorrectly generated, with error information being difficult to recover.


If there is an error in parsing, you should never reach the end. You should error with 'expected X saw ___' if possible. Otherwise, keeping track of the 'deepest' attempted parse is usually sufficient to identify where things went south. If your parsing has a bug on the otherhand... shrug.

From experience, it is important to carry enough information to provide a filename (or other source identification string) and a line/column number - all the way through to code generation. Not only does it make things easier to debug, but you need to eventually output debugging symbols, which use this info so your output itself is debuggable.

#3 Ectara   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 25 July 2012 - 07:59 PM

Right, I understand this. Is there a better way to parse the files that would preserve the information? I could get the lexemes one by one, and keep track of it easily. However, parsing from top to bottom seems silly, when parsing from bottom to top makes the most sense. However, parsing from the bottom up cannot see the big picture, and thus when an error occurs, you wouldn't quite know it for a while, and even then, it is hard to know what was really expected. The method I have would work, but it is the unexpected error that would break its behavior.

#4 Necrolis   Members   -  Reputation: 1369

Like
0Likes
Like

Posted 26 July 2012 - 01:17 AM

I would suggest having a look at how Clang does its error handling, cause not only does it have the most informative errors and warnings I've ever seen, its doing it for some of the most complex programming languages available (in terms of processing), plus it runs very efficiently (both speed and memory wise).

#5 Ectara   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 26 July 2012 - 01:22 PM

I'm reading through all of its source, but it is extremely difficult to digest, especially with just about all of the code written in its headers. Does this mean I'm on the right track with a LALR parser using a shift-and-reduce method?




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS