Jump to content

  • Log In with Google      Sign In   
  • Create Account


Trouble in building parser for compiler.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 assainator   Members   -  Reputation: 636

Like
0Likes
Like

Posted 12 July 2013 - 05:12 PM

Hello all,
 
I have a question related to parsing in a compiler. I'm having troubles comming up with a proper approach to parse certain statements.
For example, take this grammar:
 
A:= literal | variable
B:= '+' | '-'
C_1:= A
C_2:= C, B, C
C_3:= '(', C, ')'
C = C_1 | C_2 | C_3
D:= 'return', C, ';'
 
example of parsable code:
return (5+a) - 3;
 
My question is, what approach can I best to convert these rules into functions?
My main concern is, how do I make sure that for "return 3 + 5;" the proper rules (D and C_2) are used?
After having started D, thus encountering the token '3', C_1 is also satisfactory. But then the parser encounters '+' instead of the expected ';'.
 
I would like to write this myself for learning purposes so parser generators like yacc/bison are a no go.
 
Every time I think I came up with something I end up with a function so long and ugly (and non operational) that I can't help but think I'm missing something. Does anyone have some pointers?
 
Thanks a lot in advance.

"What? It disintegrated. By definition, it cannot be fixed." - Gru - Dispicable me

"Dude, the world is only limited by your imagination" - Me


Sponsor:

#2 Nypyren   Crossbones+   -  Reputation: 3688

Like
1Likes
Like

Posted 12 July 2013 - 05:18 PM

Here are some options:

 

http://en.wikipedia.org/wiki/LR_parser

 

http://en.wikipedia.org/wiki/Recursive_descent_parser

 

 

I personally like the LR parser better myself (because it can be extended to parallel and unrestricted parsing much more easily), but it's harder to think about how to produce helpful syntax error messages.



#3 Krohm   Crossbones+   -  Reputation: 2961

Like
1Likes
Like

Posted 13 July 2013 - 12:08 AM

Personally - I know this is going to be unpopular - I suggest to not think at the grammar at all.

There has been a time in the past in which I was all around BNF and stuff. My last parser has been going for a while now and I still have no explicitly written grammar.

 

 


how do I make sure that for "return 3 + 5;" the proper rules (D and C_2) are used?

I'd do first a keyword match - just for stuff as for, which got pretty odd syntax for example. In this case we match return.

Then we find a literal. Looking ahead 1 token (the whole point of LR) we find a + token which is recognized as a operator. It's a binary op because we found it after a literal, so we fetch something else. The point is: don't be greedy! Don't switch just because you matched now

At this point we have this expression parsed: the compiler will have to find what 3 and 5 are so it can emit proper ADD instruction.

I actually do expression assembly in the compiler, someone could say because of poor design.


Every time I think I came up with something I end up with a function so long and ugly (and non operational) that I can't help but think I'm missing something. Does anyone have some pointers?

Are you trying to parse and compile at the same time? This will end in tears in my experience. Pre-tokenization is a must in my opinion (and no, I don't care about what GCC/CompilerX does).

Object orientation might help you - in my experience this comes at a negligible cost for example you can have a loop "by keyword match" which is very compact and yet dispatches the correct syntax without visible ifs. You'll need to provide a set of basic compiler features such as type lookup. I've done this successfully with an base interface.

Performance wise, I once had... a problem with my data import routines. So I made a Notepad++ script which would encode all that data in a program... which turned out like 3000 lines long. It took a while to process (like 10 secs) but it was acceptable as a band-aid solution.

 

edit: two small clarifications.


Edited by Krohm, 13 July 2013 - 12:09 AM.


#4 assainator   Members   -  Reputation: 636

Like
0Likes
Like

Posted 13 July 2013 - 06:04 AM


how do I make sure that for "return 3 + 5;" the proper rules (D and C_2) are used?

Looking ahead 1 token (the whole point of LR) we find a + token which is recognized as a operator. It's a binary op because we found it after a literal, so we fetch something else. The point is: don't be greedy! Don't switch just because you matched now

Thanks, that was my main thought-problem. For some reason, the idea of simply looking ahead didn't come to mind.

As for my 'long and ugly code', looking ahead makes the code a lot simpler.

 

Nypyren: Thanks for the links, I had read them before, but only in a theoretical context. Now I've read them in a practical context, thanks.


"What? It disintegrated. By definition, it cannot be fixed." - Gru - Dispicable me

"Dude, the world is only limited by your imagination" - Me





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS