Jump to content
  • Advertisement
Sign in to follow this  
Tjaalie

Parser help

This topic is 4081 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Dear Thread Reader, I finaly finished my lexer (totaly homemade) and now Im going to work on my parser. But I'm not 100% sure my idea of how to do it is right, so here is what I think the best way to do this. Acording to the first token in the line expect what the other will be, is the other different then reexpect till there is one that matches or when there is nothing left to check against (sintax error). Say we have this line:
//Shared means that other scripts will be allowed to acces the var.
shared int SomeInt = 5*4;

When the parser sees 'shared' it expects a var or function declaration or a function implention. When it checks the next one it still expects the same, after the name he still expects both but when he hits '=' it knows its a var difinition sends the rest of the tokens '4*5;' to the solver wich will solve the math and send the awnser to init the var to. This similar pattern will be followed for function calling or other 'lines'. Is this the right way to it? Are there better ways? I don't want to use a premade parser so there is no point in trying to convince me to use a premade one.

Share this post


Link to post
Share on other sites
Advertisement
What I'd suggest:

Read this .

Write down a BNF grammar for your language: now you have a formal definition instead of a fuzzy idea of what your language is.

Only then should you try to code anything.

Share this post


Link to post
Share on other sites
I'm sorry to say that that approach is not going to scale well at all. Unless what you are trying to do is pretty trivial, what I would suggest is formalizing your language as a grammar (in Backus-Naur form), and writing a recursive descent parser.

This is a somewhat complex process, but once you get your head around it, it is not too bad to actually implement. (I just finished a recursive descent parser for an independent study).

HTH

Share this post


Link to post
Share on other sites
that's right, BNF is the right way to go. Also, while it will be harder to get it going at first, if it's designed right, it's extremely easy to extend and make adjustment to your grammar.

...all that being said, I'm sure there's a tool out there that generates code for a parser, given a grammar. Though if you want the whole learning experience, it's worth trying it out for yourself first.

Share this post


Link to post
Share on other sites
Flex/Bison.
Lex/Yacc.

Note that a generic BNF form grammer is still too hard to parse, if I remember my parsing headaches right.

In effect, you want to form your grammer so that which branching of your grammer you are going down can be predicted based on the next token.

This can restrict your language in interesting ways.

Ie:
Chunk => scope type identifier ChunkTail;
ChunkTail => VarTail | FuncTail
VarTail => empty | = Expression
FuncTail => ( Arguements ) { FunctionBody }
...

Notice that from the very next token, you can tell which of the two possible grammer steps you follow. This is known as a LL(1) grammer, IIRC -- left lookahead 1 grammer.

Share this post


Link to post
Share on other sites
if you make your language syntactically simple it is easy to write a parser. I started with a fuzzy idea of how my language was going to work and ended up with a parser that is extremely simple and less than 650 lines. Here is some code from my language:


(int x) -> (int y) f
{
y = x*x
},

char c,

class C = (int a, int b, int c),

()->() main
{
print(f 5);
(int)->(int)@ fp;
fp=f@

}



The algorithm is simple:

first I find all of the top level grouping symbol pairs ( ) { } [ ] and parse them (recursively) Then I do the -> operator, then the "adjacency operator" (the lack of an operator between two expressions is considered to be an implied adjacency operator). Then I do the rest of the operators, including the , and ; which are just plain operators. I have no prefix operators to make things much easier, just infix and postfix. After these steps I post-process the parse tree to recognize certain language constructs. I guess this is the parse tree to syntax tree step. That's it.

This technique probably is more effective for highly orthagonal languages like mine. It was much easier to hand write the parser than to use a tool like yacc or bison. I had to refactor constantly. At one point I had a large section of code where essentially the same code was replicated three times and held together with gotos. It was awesome. I eventually refactored it into a few simple loops but I don't think that I could have written it that way initially. I regret taking out the gotos but they had to go when the triple code went.

Share this post


Link to post
Share on other sites
since NotAYakk mentioned Flex/Bison etc.. i might as well throw boost.spirit out there.

It generates a recursive decent parser at compile time from the bnf.

Example:

subrule<0> identifier;
subrule<1> modifier;
subrule<2> type;
subrule<3> expression;

parse(start_of_script,
end_of_script,
(
identifier = alpha_p >> *(alnum_p | '_'),
modifier = "shared" | none_p,
type = str_p("int") | "unsigned int" | "float",
expression = ......,
modifier >> type >> identifier >> !('=' >> expression)
));



Share this post


Link to post
Share on other sites
Note that recursive-decent is not the only way to parse things. Take a look at LR Parsers, which allow you to parse a larger class of grammers.

Also you appear to be mixing up interpretation and parsing. You don't want to do this. Form your entire syntax tree before interpreting things otherwise you'll make your parsing far too complex and hard to maintain.

Share this post


Link to post
Share on other sites
I forgot to mention the key part of how my parser works. After I load all of the Tokens into a container I turn them into Expression and throw pointers to them into a list. All of my parsing functions take a range (pair of iterators) in addition to their other arguments. Each parsing function gradually removes Expression*s from the list and in the end there is only one Expression* left in the list. Expression has a bunch of members but the key ones are:

Token token
Expression* left
Expression* right
Expression* contents

Initially each Expression is just a wrapper for the token that it was created from. Later an Expression representing the + operator will store pointers to its left and right operands in its left and right members. The left and right operands are removed from the list. If you have any questions let me know.

Also there is this program called Graphviz. I found it after I had finished the parser but it is so awesome. Give your parse tree the ability to write out a graphviz file, then use graphviz to turn it into an image showing you the nodes and pointers and contents.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!