Sign in to follow this  

Advice on (learning) parsing

This topic is 4481 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm interested in learning how to make and / or use a parser. What would be a good way to go? So far, I have written (from a book) only a very simple recursive descent parser (or so I am told), for a stack based calculator (in C++). Basically I'm looking for the big picture here: - Any useful tools. I've heard about yacc-lex, is it hard? - Any references, links, books. - What is involved in writing / using a parser that is extensible, what are the components? - How difficult is it, how much experience and time needed? What are the required programming skills? (I might drop the whole idea for now and learn those skills first) These are some project ideas that get me excited. Any comments are welcome: - Adventure games like the old Larry (eh excited, not intended). - Simple First Order Logic parser. I had an educational tool called Tarski's world where you could write FOL sentences about some geometric objects, whose truth-value would then be evaluated. In the long run, something like this in a game would be very nice. - Parsing and translating C++ in plain English. I suppose this is too much, but I like the idea.

Share this post


Link to post
Share on other sites
I recently went through and learned at least the basics of boost::spirit, and it was suprisingly easy. If you know regular expressions, at least a little, it's fairly easy to make something quite powerful with only a day or two of learning/experimentation.

Share this post


Link to post
Share on other sites
Quote:
Any useful tools. I've heard about yacc-lex, is it hard?


yacc stands for "yet another compiler compiler", it's used to compile a programming language grammar into a compiler. I don't think that will help.

My recommendation would be to read up on lexical analysis, which helps break a string into a sequence of tokens (or keywords), the tokens are then passed to a parser, that figures out what to do with the tokens.

Share this post


Link to post
Share on other sites
Thanks you both. I recently got boost installed and glanced over the regular expressions. Good Things Happen. Praise boost.
And eh, I really don't want to write yet another compiler indeed...

Share this post


Link to post
Share on other sites
Quote:
Original post by DeadXorAlive
Thanks you both. I recently got boost installed and glanced over the regular expressions. Good Things Happen. Praise boost.
And eh, I really don't want to write yet another compiler indeed...


Eh, just to be clear, I was talking about spirit, not boost's regular expressions. Spirit is something that takes a language definition [which is quite similar to, if not mathematically equivalent to regular expression 'code'] and generates a recursive descent parser from it.

Note that I'm no CS major, so am unsure if I'm using some of these terms right...

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
Note that I'm no CS major, so am unsure if I'm using some of these terms right...


Pretty much, boost spirit is a domain-specific embedded language (DSEL) in library form that allows near Extended Backus-Naur Form (EBNF) context free grammars directly in and with C++ constructs this gives the flexiablity to use the same enivoriment and mix it with other C++ libraries. BNF/EBNF is similar to regular expressions. Boost.Spirit achieves this using techniques such as expression templates, template metaprogramming, etc. Regular expressions can be used to specify a lexer for a language.

[Edited by - snk_kid on October 10, 2005 2:22:06 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
Eh, just to be clear, I was talking about spirit, not boost's regular expressions. Spirit is something that takes a language definition [which is quite similar to, if not mathematically equivalent to regular expression 'code'] and generates a recursive descent parser from it.


Even better. I understand that it's not neccesary to write the 'low-level' data structures like linked lists yourself, as I was doing with the calculator. I assume regular expressions are near universal, is that correct?
I think Java comes with regular expressions? They sure seem very useful to me.

Thanks for the links smr and snk_kid.

Share this post


Link to post
Share on other sites
Quote:
Original post by DeadXorAlive
Even better. I understand that it's not neccesary to write the 'low-level' data structures like linked lists yourself, as I was doing with the calculator.


Even if you needed a linked-list generally you don't need to write your own as languages typically provide them with there standard library, like C++ there is std::list, back to the point yes nowadays it is generally not necessary to write the structures/internals of a lexer/paser as it is such a mundane, repetitive task usually resulting in much boilerplate code. We have domain specific languages that are declarative and from this we can specify whats to be done as opposed how it should be done.

So we have libraries/tools that can take these specs and the code is automatically generated that does the work. Generally lexers will be specificed with regular expressions and parsers for programming languages will be specificed with context free grammers like BNF/EBNF.

Quote:
Original post by DeadXorAlive
I assume regular expressions are near universal, is that correct?
I think Java comes with regular expressions? They sure seem very useful to me.


I'm pretty sure in the lastest versions they do provide a regular expressions library.

Share this post


Link to post
Share on other sites
Regular expressions are... ubiquitous, though I might not go so far as call them universal. Not a few unix tools deal with them, some windows text editors, a few languages deal with them natively [perl for example], and most languages now can use them to some degree via library.

The actual syntaxes allowed are a little quirky between versions, though that is mostly historical as different implimentations developed independantly.

And yes, they're exceptionally useful; especially in little sys-admin tasks and command line voodoo.

Share this post


Link to post
Share on other sites
Quote:
Original post by Bordogg
yacc stands for "yet another compiler compiler", it's used to compile a programming language grammar into a compiler. I don't think that will help.

Have you actually ever used YACC or a YACC like program? YACC is used to build parsers, and is certainly easier than writing your own parser.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by Bordogg
yacc stands for "yet another compiler compiler", it's used to compile a programming language grammar into a compiler. I don't think that will help.

Have you actually ever used YACC or a YACC like program? YACC is used to build parsers, and is certainly easier than writing your own parser.


But boost::spirit is easier than YACC?
I gathered from spirit's documentation that in order of the complexity of the task one could even just use scanners or regular expressions for simple tasks and YACC for the more complex ones, like building a computer language. Spirit is useful for large projects but still relatively easy. Is that a correct overview?

Share this post


Link to post
Share on other sites
I find boost::spirit useful for small parsers, but lex and yacc together a better tool for more complex things. In my experience, spirit is just a pain to debug.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Have you actually ever used YACC or a YACC like program? YACC is used to build parsers, and is certainly easier than writing your own parser.


I won't use it till next my next school semester when I take compiler construction. After that it'll probably be my best friend.

EDIT: Thanks for the info too, I was unaware that yacc built parsers in general, my understanding was its use was limited to compiler construction.

Share this post


Link to post
Share on other sites
Sign in to follow this