Lexical Parser

Started by
16 comments, last by Ectara 12 years ago
I see. If I use the tokens I already have, as I had though I would when I started, that would restrict me to using only only the container type that the preprocessor emits, and I'd have to write a whole new lexer or preprocessor for the next language. I'll likely use the C preprocessor heavily, but I wish there was an easier way that was language agnostic.

So, past that, I'd be reading in tokens, and building a tree token by token in the second stage?


Also, how would I determine which non-terminal is of the highest order, without scanning all of the tokens, if one terminal could be part of a non-terminal, and that non-terminal could belong to another, how would I best ascertain the highest non-terminal?
Advertisement

If I use the tokens I already have, as I had though I would when I started, that would restrict me to using only only the container type that the preprocessor emits, and I'd have to write a whole new lexer or preprocessor for the next language


Yeah. That's why I said parser generators were a bit more difficult to make. They're also tons more difficult to debug.


So, past that, I'd be reading in tokens, and building a tree token by token in the second stage?


Pretty much.


Also, how would I determine which non-terminal is of the highest order, without scanning all of the tokens, if one terminal could be part of a non-terminal, and that non-terminal could belong to another, how would I best ascertain the highest non-terminal?


If I understand you correctly (I'm horrible with formal terminology), the language specification should provide disambiguation rules for those scenarios. They might require some look-ahead to determine the correct interpretation.
Since you don't seem to have a whole lot of experience in writing these things, I think trying to write a general parser that can be used for multiple languages is too ambitious. It only makes sense to generalize a problem when you have solved individual instances several times.

It's a bit like the "write games, not engines" advice: Once you have a few games under your belt, you can consider what an engine might do that will help you with future games, but before you have that experience you are extremely unlikely to write a useful engine.
I recently set out to code my own Earley parser, and found the following helpful:
http://www.diku.dk/h...torbenm/Basics/
I think I understand. A lexer creates tokens from an input. If I can find a way to describe what constitutes a token as an input, I could construct a lexer. I'll look into that. And next in the parser, tokens should likely be put into a tree based on their syntactic significance of terminals and non-terminals, probably from a BNF input. Finally, I'd then iterate through the tree, and read in the data and act on it, or write it to a stream.

Lookahead seems like a viable solution to determine the highest ordinal non-terminal. That, I can do.
A bonus question, would regular expressions help me in tokenization? I can think of a way it might, but it might be too costly; I already have regular expressions working, so if I can use it to expedite the process, I'll probably favor it.
I strongly suggest you pick up a book or four on parsing theory. There's a lot of stuff that has been well-understood about parsing for decades, and you'd be doing yourself a huge favor to read up on it instead of trying to rediscover it all the hard way.

Just as an example, to your last question: regular expressions are well known to have the same expressive power as regular grammars. Since most lexing is done based on a character-level regular grammar, interchanging a lexer with a regexp engine is basically an isomorphism. And in fact, many lexer generators are already designed to use a regular expression language to generate the lexical analysis code.

At the very least, go read up on the Wikipedia articles on regular expressions, lexical analysis, parsing, and so on. You'll save yourself a tremendous amount of pain by learning from the experts instead of banging your head against decades of solved mysteries.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I'll look into finding a book on it soon. In the meantime, I'll work hard on figuring this out.

This topic is closed to new replies.

Advertisement