Advice on (learning) parsing

Started by
12 comments, last by Bordogg 18 years, 6 months ago
I'm interested in learning how to make and / or use a parser. What would be a good way to go? So far, I have written (from a book) only a very simple recursive descent parser (or so I am told), for a stack based calculator (in C++). Basically I'm looking for the big picture here: - Any useful tools. I've heard about yacc-lex, is it hard? - Any references, links, books. - What is involved in writing / using a parser that is extensible, what are the components? - How difficult is it, how much experience and time needed? What are the required programming skills? (I might drop the whole idea for now and learn those skills first) These are some project ideas that get me excited. Any comments are welcome: - Adventure games like the old Larry (eh excited, not intended). - Simple First Order Logic parser. I had an educational tool called Tarski's world where you could write FOL sentences about some geometric objects, whose truth-value would then be evaluated. In the long run, something like this in a game would be very nice. - Parsing and translating C++ in plain English. I suppose this is too much, but I like the idea.
Advertisement
I recently went through and learned at least the basics of boost::spirit, and it was suprisingly easy. If you know regular expressions, at least a little, it's fairly easy to make something quite powerful with only a day or two of learning/experimentation.
Quote:Any useful tools. I've heard about yacc-lex, is it hard?


yacc stands for "yet another compiler compiler", it's used to compile a programming language grammar into a compiler. I don't think that will help.

My recommendation would be to read up on lexical analysis, which helps break a string into a sequence of tokens (or keywords), the tokens are then passed to a parser, that figures out what to do with the tokens.

Imagination was given to man to compensate him for what he is not.A sense of humor was provided to console him for what he is.-Horace Walpole
Thanks you both. I recently got boost installed and glanced over the regular expressions. Good Things Happen. Praise boost.
And eh, I really don't want to write yet another compiler indeed...
Quote:Original post by DeadXorAlive
Thanks you both. I recently got boost installed and glanced over the regular expressions. Good Things Happen. Praise boost.
And eh, I really don't want to write yet another compiler indeed...


Eh, just to be clear, I was talking about spirit, not boost's regular expressions. Spirit is something that takes a language definition [which is quite similar to, if not mathematically equivalent to regular expression 'code'] and generates a recursive descent parser from it.

Note that I'm no CS major, so am unsure if I'm using some of these terms right...

Quote:Original post by Telastyn
Note that I'm no CS major, so am unsure if I'm using some of these terms right...


Pretty much, boost spirit is a domain-specific embedded language (DSEL) in library form that allows near Extended Backus-Naur Form (EBNF) context free grammars directly in and with C++ constructs this gives the flexiablity to use the same enivoriment and mix it with other C++ libraries. BNF/EBNF is similar to regular expressions. Boost.Spirit achieves this using techniques such as expression templates, template metaprogramming, etc. Regular expressions can be used to specify a lexer for a language.

[Edited by - snk_kid on October 10, 2005 2:22:06 PM]
Let's Build a Compiler is probably my favorite source for information about writing parsers, interpretters, and compilers.
Quote:Original post by Telastyn
Eh, just to be clear, I was talking about spirit, not boost's regular expressions. Spirit is something that takes a language definition [which is quite similar to, if not mathematically equivalent to regular expression 'code'] and generates a recursive descent parser from it.


Even better. I understand that it's not neccesary to write the 'low-level' data structures like linked lists yourself, as I was doing with the calculator. I assume regular expressions are near universal, is that correct?
I think Java comes with regular expressions? They sure seem very useful to me.

Thanks for the links smr and snk_kid.
Quote:Original post by DeadXorAlive
Even better. I understand that it's not neccesary to write the 'low-level' data structures like linked lists yourself, as I was doing with the calculator.


Even if you needed a linked-list generally you don't need to write your own as languages typically provide them with there standard library, like C++ there is std::list, back to the point yes nowadays it is generally not necessary to write the structures/internals of a lexer/paser as it is such a mundane, repetitive task usually resulting in much boilerplate code. We have domain specific languages that are declarative and from this we can specify whats to be done as opposed how it should be done.

So we have libraries/tools that can take these specs and the code is automatically generated that does the work. Generally lexers will be specificed with regular expressions and parsers for programming languages will be specificed with context free grammers like BNF/EBNF.

Quote:Original post by DeadXorAlive
I assume regular expressions are near universal, is that correct?
I think Java comes with regular expressions? They sure seem very useful to me.


I'm pretty sure in the lastest versions they do provide a regular expressions library.
Regular expressions are... ubiquitous, though I might not go so far as call them universal. Not a few unix tools deal with them, some windows text editors, a few languages deal with them natively [perl for example], and most languages now can use them to some degree via library.

The actual syntaxes allowed are a little quirky between versions, though that is mostly historical as different implimentations developed independantly.

And yes, they're exceptionally useful; especially in little sys-admin tasks and command line voodoo.

This topic is closed to new replies.

Advertisement