• entries
557
1237
• views
423617

# Untitled

92 views

At last! Expressions are parsed! Currently it doesn't do much with them, but it parses brackets and handles operator precedence correctly.

Here's what I ended up doing:

First, read tokens from the stream until we reach an unmatched bracket or a semicolon. That gives us a stream of tokens like so:
2 + 3 * (5 + 4) / (7 + 6)

Next, search for brackets. If an open bracket is found, recurse to this step again starting from the next token.
When we find a close bracket, evaluate the expression between the brackets (See below), and generate I-code. Return from this function.

When we evaluate expressions, we just go through each operator precedence level and evaluate them.

So, sample:
2 + 3 * (5 + 4) / (7 + 6)

Seek for brackets, find first one so recurse:
5 + 4) / (7 + 6)

Find close bracket, so evaluate and return:
CODE(5, 4, +) / (7 + 6)

Seek for brackets, find first one so recurse:
7 + 6)

Find close bracket, so evaluate and return:
CODE(5, 4, +) / CODE(7, 6, +)

That gives us:
2 + 3 * CODE(5, 4, +) / CODE(7, 6, +)

So, evaluate in precedence order, left to right:
-> 2 + 3 * CODE(5, 4, +) / CODE(7, 6, +)
-> 2 + CODE(3, 5, 4, +, *) / CODE(7, 6, +)
-> 2 + CODE(3, 5, 4, +, *, 7, 6, +, /)
-> CODE(2, 3, 5, 4, +, *, 7, 6, +, /, +)

I think that's right anyway...
That leaves a stack of operands and operators in reverse polish notation, which, evaluates as:
2, 3, [5, 4, +], *, 7, 6, +, /, +
=2, 3, 9, *, 7, 6, +, /, +
2, [3, 9, *], 7, 6, +, /, +
= 2, 27, 7, 6, +, /, +
2, 27, [7, 6, +], /, +
= 2, 27, 13, /, +
2, [27, 13, /], +
= 2, 2.08, +
[2, 2.08, +]
= 4.08
Which is the correct answer. Yay!

I need to add support for finctions, conditionals, etc etc and test it more, but it's looking promising. Wheeeeee...

Cool stuff. Expression parsing is a pain. [smile]

Do you intend to support string literals/macros? I'm attempting to support macros at various levels (eg before or after splitting to tokens) which is causing me grief (having to reparse the string at various points).

Quote:
 Original post by benryves Do you intend to support string literals/macros? I'm attempting to support macros at various levels (eg before or after splitting to tokens) which is causing me grief (having to reparse the string at various points).
My lexer parses "foo" into TOK_String / foo already (The lexer does the escaping and removal of quotes), and I'm considering puting macro support into it to (Although that might end up as a preprocessor)

Another approach would be to build a tree structure as you recursive descent parse the expression, then use a virtual recursive emit() or eval() to either spit out DruinkAssembley or evaluate an expression at compile time.

There's a recent entry in my journal for doing the latter as an experiment in C# if that is of any interest.

Actually, the main thing I have difficulty with is working out how to extend it past two precedence levels. I originally tried to modify the calculator example to have 9 versions of term(), one for each operator level (Actually, the function took an extra parameter which was the layer to search, but never mind).

I'd prefer a tree structure, I already have a nice fake allocator that works a little like a pool allocator, but without a free() function, so allocating lots of nodes isn't a problem. Also, a tree structure would let me do cool things like check if every primary in the tree is a constant, and if so I can evaluate it at compile time (To optimise a = 1 + 2 / 3 to a = 1 for example).

Also, my code doesn't work too nicely with nested brackets (I think it's a simple typo), and I haven't even tried function calls and conditional operators yet - I might just disallow them in expressions or something...

Multiple versions of the term() function is how my language compiler works. It started just compiling math, so you had:

expr: called norm
norm: handled add/sub and called term
term: handled mul/div and called prim
prim: handled constants, variables, functions, casts etc

This gave mul/div a higher precidence than add/sub by nature of the order stuff is scanned in.

When I came to add conditionals, I was just able to add in a cond() between expr and norm, and have expr call cond() instead of norm. cond() looked a bit like:

// rop_n - relational operator node
// sc - the global text scanner class

node *cond(bool get)
{
node *n=norm(get);

while(true)
{
switch(sc.type)
{
case sc.eq: n=new rop_n(rop_n::eq,n,norm(true)); break;
case sc.neq: n=new rop_n(rop_n::neq,n,norm(true)); break;
case sc.gt: n=new rop_n(rop_n::gt,n,norm(true)); break;
case sc.gteq: n=new rop_n(rop_n::gteq,n,norm(true)); break;
case sc.lt: n=new rop_n(rop_n::lt,n,norm(true)); break;
case sc.lteq: n=new rop_n(rop_n::lteq,n,norm(true)); break;

default: return n;
}
}
}



Similarly, to add in logical operators like AND and NOT, it was just a case of adding in a logi() function to be called between expr() and cond(). I found this quite extensible.

Equally, when I decided to add in a C++ style ?: ternary operator, it was just a case of adding a bit of nonsense to the end of the expr() function:

node *expr(bool get)
{
node *n=logi(get); n->check();

if(sc.type==sc.qust)
{
n=new tern_n(n);
n->ln=expr(true); sc(sc.colon,false); n->rn=expr(true);
}

return n;
}



That's how I got arbitrarily complex expression compiling working anyway. Blathering on a bit in someone else's journal. Sorry.

## Create an account

Register a new account