Building Syntax Trees

Engines and Middleware Programming

Started by evolutional May 20, 2004 06:20 AM

11 comments, last by evolutional 19 years, 10 months ago

1,397

Author

May 20, 2004 06:20 AM

After several hours of studying compilers and their theory I have built myself a fairly successful lexer using regular expressions. Now, I''ve hit a stumbling block - all the tutorials and articles I have read seem to use Bison/YACC to build a syntax tree of valid grammar and parse it. I understand how to create a YACC file can describe the context-free grammar (from the tutorial at Flipcode and Peroxide''s series) - I want to understand how to present valid grammar rules and build the tree myself. I''ve studied a variety of sources (Gamedev Fora/Flipcode/Peroxide/Crenshaw/Google, et al) for hours and have not come up with much. The things I''ve been reading from the search on GDNet seems to be pointing at Finite State Automata, but I''m not too sure. Has anyone else built their own syntax parser (using a stream of lexxed tokens as an input) without using YACC/Bison or similar that may be able to offer me a few pointers to resources? Thanks in advance

Arild Fines

968

May 20, 2004 06:25 AM

The term you might want to google for is "recursive descent parser". This is pretty much the only type of parser it makes sense to write by hand. A recursive does put certain restrictions on your grammar - it has to be LL(1), which, among other things, means you have to eliminate all left-recursive rules from your grammar.

The principle behind a recursive descent parser is rather simple: You have a function/method per non-terminal, in which you read the next token and use it to determine which function to call next.

--
AnkhSVN - A Visual Studio .NET Addin for the Subversion version control system.
[Project site] [Blog] [RSS] [Browse the source] [IRC channel]

--AnkhSVN - A Visual Studio .NET Addin for the Subversion version control system.[Project site] [IRC channel] [Blog]

evolutional

1,397

Author

May 20, 2004 06:47 AM

Hey thanks for that, it seems as if I understood the term RDP without knowing it by name. A google has turned up some Comp. Science lecture notes on the subject. I''ll look into those for the time being...

PeterTarkus

122

May 21, 2004 01:35 AM

You can build an AST or parse tree with Yacc as you go along, that is how I think most people use it anyways. At each terminal you push, at each expression you restructure the tree.

If you do it by hand I can explain how to easily build the tree yourself, it is not hard.

Create a binary tree structure. Have a stack that you can push the nodes onto it.

parse_term();
push_top_node();
parse_term();
push_top_node();
new_top->left = pop();
new_top->right = pop();
push(new_top)

I really don't know the *right* way to do it, or how they teach you in school, I taught myself and came up with my own ideas on how to do it. For some types of expressions, like a C for-loop, it has 3 terms, so I create something called a node and split them on it. The leafs contain the expressions.

[edit - diagram is fubared ]

For something like a function, I have a vector of parse nodes in the node, and just store *the statements* in there. It is easy and it works.

Peace

[edited by - PeterTarkus on May 21, 2004 2:42:15 AM]

[edited by - PeterTarkus on May 21, 2004 2:42:47 AM]

[edited by - PeterTarkus on May 21, 2004 2:43:29 AM]

[edited by - PeterTarkus on May 21, 2004 2:44:02 AM]

evolutional

1,397

Author

May 21, 2004 05:06 AM

quote:Original post by PeterTarkus
... interesting stuff here ...

Interesting - I''ve been looking at compiler theory books a lot over the past few days (and considering buying Game Scripting Mastery) and they never seem to describe it as simply as you do.

So let me get your right - I should be taking my token stream, looking for a terminal (eg: for/while) and creating tree nodes from there, linking the expressions as leafs.

I think I''m getting you - are you hard-coding these rules or do you have a grammar input definition like YACC seems to have?

evolutional

1,397

Author

June 06, 2004 04:51 PM

The best explanation I have EVER read about this is in the book "Game Scripting Mastery" by Alex Varnese (sp?).

It''s a must-buy for anyone wanting to roll their own

PeterTarkus

122

June 06, 2004 06:11 PM

No, it''s a recursive descent parser. I have to warn you that recdescent parsers become very hard to manage as your language grows. That is why most people use Yacc/Lex or DFA parsers or whatever.

It is very simple I will show you an example

void parse(){  while(1)  {    token = get_token();    switch(token)    {    case token_for: parse_for()    case token_while: parse_while()    .    .    .    }    while(token == tok_semicolon) token = get_token()   }}

A good book for seeing how to build a rec descent parser is "Writing Compilers and Interpreters" by Ronald Mak.

I have a project with a simple language I made you can examine it to see how I did things if you want : http://xano.sourceforge.net/cull.html

Recursive descent is only good for small languages, you could not hope to build a C++ compiler in it you would go mad.

clearsnake

122

June 06, 2004 06:42 PM

quote:Original post by PeterTarkus

snip

Recursive descent is only good for small languages, you could not hope to build a C++ compiler in it you would go mad.

I don''t think anyone here is aiming for the next C++... most people''s grammars could probably be left factored and have their left recursion eliminated in a reasonable amount of time.

PeterTarkus

122

June 06, 2004 07:29 PM

Yes you are correct :-)

Yeah, I was a bit **tired** before, building the syntax trees is pretty much like you got going. The syntax trees are used mostly for parsing expressions like 1+2*3+6*4 etc. The other structures
like for, while, etc are just conveniently stored in the structure.

struct ParseTreeNode{   Token token;   ParseTreeNode * left, * right;   vector<ParseTreeNode*> statements;}

For something like a while, I put the while into a node,
and then all the statements between BEGIN .. END go into the
statements list. This just makes it easy to manage.

The stack for expression parsing is as old as time you can find info on that in any good C/Pascal programming book.

Peace

[edited by - PeterTarkus on June 7, 2004 1:21:13 AM]

evolutional

1,397

Author

June 07, 2004 08:28 AM

Out of interest, which features of C++/C# would make it harder to parse? Are we talking class structures or pointer indirection?

Building Syntax Trees

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Building Syntax Trees

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines