writing a scripting language parser in C++

Started by
22 comments, last by jwezorek 14 years, 2 months ago
I've decided to fool around with boost::spirit a little bit to get a feel for it. Does anyone know of some sample code that parses a toy, but not trivial, language? I found this project which implements a JSON parser but it's a little too much to get my head around.
Advertisement
Boost spirit has an example that parses XML into an AST. Read ALL the tutorials, then that example will be your bible.
I should warn you, however, that having done what you just did its not that easy. If you just want to "get it done" then just code it up yourself.
Quote:Original post by Steve132
I should warn you, however, that having done what you just did its not that easy. If you just want to "get it done" then just code it up yourself.


Yeah, I know, I'm going to play with it today and over the weekend and then probably decide not to use it. :)
Quote:Original post by jwezorek
I've decided to fool around with boost::spirit a little bit to get a feel for it. Does anyone know of some sample code that parses a toy, but not trivial, language? I found this project which implements a JSON parser but it's a little too much to get my head around.


Epoch language parser - written using boost::spirit and decidedly non-trivial [smile]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

So I've been playing around with boost::spirit all day today, and, I don't know, it seems like doing anything slightly non-trivial is really a pain. I mean I want to like it but even in their miniXml example that thing is returning a recursive std::vector of boost::variants which isn't *that* complicated but still would be a pain to deal with.

Anything more complicated than a contrived example like that is going to naturally return a more complex composition of templatized gobbedly-gook, which the first thing you're going to want to do with is introspect, generate something useful, and throw away.

I mean, take expressions, if you extend the model that the documentation endorses your parser will end up returning vectors of variants that contain vectors of variants which seems pretty cumbersome to me. I don't see an easy way to get output that is a dynamically allocated expression node that contains a vector of pointers to expression nodes of various classes derived from an expression node base class, which is what I would want. I'm thinking it can be done with the [bind(f, _val, _1)] thingamajig; however, guess I'll try that tomorrow.

Also compile times are crazy. I'm just compiling little toy grammars that I'm learning with and it's taking 5 minutes to build. If the time explodes when I scale up to a real grammar I don't think this thing is practical for me to use in production code.
I think most of the compilation time is just in pulling in all those huge headers. It won't take much longer as your grammars expand.

You might use Boost.Tokenizer to simply get string tokens, then have a simple post-processing step turn it into your own token type.
So, I set out today to write an expression parser with Spirit that outputs a plain-vanilla dynamically allocated expression tree rather than a templated vector/variant thing. I got it working -- it produces the correct output -- but I know it's not quite right. I works correctly but I couldn't manage to get some of the code to be the way that your supposed to use spirit, and I was wondering if someone who has experience with Spirit could help me out.

Here's my code (with the implementation of the expression hierarchy omitted for brevity):
#include "stdafx.h"#include <boost/config/warning_disable.hpp>#include <boost/spirit/include/qi.hpp>#include <boost/spirit/include/phoenix_core.hpp>#include <boost/spirit/include/phoenix_stl.hpp>#include <boost/spirit/include/phoenix_operator.hpp>#include <boost/spirit/include/phoenix_object.hpp>#include <boost/spirit/home/phoenix/bind/bind_function.hpp>#include <boost/fusion/include/io.hpp>#include <boost/fusion/include/std_pair.hpp>namespace qi = boost::spirit::qi;namespace ascii = boost::spirit::ascii;using namespace std;/*-------------------------------- Expression class --------------------------------*/enum OpType {    Add, Subtract, Multiply, Divide};class Expression {public:    virtual int eval() = 0;    virtual void output() = 0;    virtual ~Expression(){}};struct OpExprPair {    OpExprPair(OpType op = Add, Expression* expr = 0) :        _op(op), _expr(expr) {}    OpType _op;    Expression* _expr;};class CompoundExpression : public Expression {protected:    Expression* _left_side;    std::vector<OpExprPair> _right_side;     ...public:    CompoundExpression(Expression* l_s, const std::vector<OpExprPair>& r_s) :        _left_side(l_s), _right_side(r_s) {}    ...};class NumExpression : public Expression {protected:    int _value;public:    NumExpression(int n = 0) : _value(n) {}    ...};/*-------------------------------Helper functions-----------------------------------*/                                 void CreateNumExpression(Expression*& value, int n) {    value = new NumExpression(n);}void MakeOpExprPair(OpExprPair& value, OpType op, Expression* expr) {   value = OpExprPair(op, expr);}void MakeCompoundExpr(Expression*& value,         Expression* left_side,         const std::vector<OpExprPair>& right_side) {    if (right_side.empty()) {        value = left_side;        return;     }     value = new CompoundExpression(left_side, right_side);}/*-----------------------------------Grammar----------------------------------------*/struct additive_op_ : qi::symbols<char, OpType> {    additive_op_() {        add            ("+", Add)            ("-", Subtract)        ;    }} additive_op;struct multiplicative_op_ : qi::symbols<char, OpType> {    multiplicative_op_() {        add            ("*" , Multiply)            ("/" , Divide)        ;    }} multiplicative_op;template <typename Iterator>struct ExpressionParser : qi::grammar<Iterator,Expression*(), ascii::space_type> {    ExpressionParser() : ExpressionParser::base_type(start) {        using boost::spirit::arg_names::_val;        using boost::spirit::arg_names::_1;        using boost::spirit::arg_names::_2;        using boost::spirit::int_;        using boost::phoenix::bind;        using boost::phoenix::push_back;                start            = '['                >> expression [_val = _1]                >> ']'            ;                  expression            = ( term                 >> term_list              ) [bind(MakeCompoundExpr, _val, _1, _2)]            ;        term            = ( factor                 >> factor_list              ) [bind(MakeCompoundExpr, _val, _1, _2)]            ;                factor_list            = *(factor_pair [push_back(_val, _1)])            ;                    term_list            = *(term_pair [push_back(_val, _1)])             ;                    factor_pair            = (multiplicative_op >> factor) [bind(MakeOpExprPair,_val,_1,_2)]            ;                    term_pair           = (additive_op >> term) [bind(MakeOpExprPair,_val,_1,_2)]           ;                    factor            =   int_ [bind(CreateNumExpression, _val, _1)]            |   '(' >> expression [_val = _1] >> ')'             ;     }    qi::rule<Iterator, std::vector<OpExprPair>(), ascii::space_type>         factor_list,         term_list;    qi::rule<Iterator, OpExprPair(), ascii::space_type> term_pair, factor_pair;    qi::rule<Iterator, Expression*(), ascii::space_type> expression, factor, term, start;};/*----------------------------------------------------------------------------------*/int main() {    ExpressionParser<std::string::const_iterator> expression_parser;        string str;    while (getline(cin, str)) {        if (str.empty() || str[0] == 'q')            break;        using boost::spirit::ascii::space;        Expression* expr;        if (phrase_parse(str.begin(), str.end(), expression_parser, expr, space)) {            expr->output();            cout << "\nParse successful\n";        } else {            cout << "Nope\n";        }    }    return 0;}


The biggest problem is that the semantic action [bind(MakeOpExprPair,_val,_1,_2)] shouldn't be necessary. I should be able to tell boost::fusion about my struct, OpExprPair, using BOOST_FUSION_ADAPT_STRUCT and then the output attribute of that rule should just be an OpExprPair automatically. I couldn't get this to work. Also couldn't get it to work using an std::pair as OpExprPair, and according to everyone that's supposed to just work naturally if you #include <boost/fusion/include/std_pair.hpp>.

Further, I have a feeling that there is a way to get a rule like
factor_list = *(factor_pair)
to output a possibly empty vector of OpExprPair structs without using a semantic action but I have no idea how to do it.

Anyway, if anyone has been down this road before, I'd appreciate some input.

[Edited by - jwezorek on February 13, 2010 9:49:52 PM]
I've always just used straight functors for the semantic actions; those functors then access a wrapper class for the parser state, which decouples the generation of the AST from the spirit code. I don't really see the benefit of doing all the bind junk when you can just specify a trivial functor and get the same results more efficiently.

My code is linked above.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by ApochPiQ
I've always just used straight functors for the semantic actions; those functors then access a wrapper class for the parser state, which decouples the generation of the AST from the spirit code.


I was trying to use regular functors to begin with but I couldn't get it to work. I'll have to look at your code again now that I understand spirit better.

I started to think that you have to use the phoenix::bind stuff to get a function called from a semantic action unless you are using boost::spirit::classic or whatever it is called -- is this not the case? Anyway, the _1 and _2 argument placeholders don't work with regular functors because they're phoenix entities (which makes sense to me now), so I didn't know how to get input to a naked functor from a semantic action. If it's always called on the output attribute of the parser object that the semantic action is associated with, does that mean that if you want to call a functor on the result of a sequence parse you have to make your functor take a boost::fusion::vector as an input argument?

[Edited by - jwezorek on February 13, 2010 11:38:29 PM]

This topic is closed to new replies.

Advertisement