Sign in to follow this  

Compiler design: assignment and lvalues

This topic is 4807 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm writing a bytecode-interpreted scripting language for fun and education and have trouble implementing the parsing/compiling of assigments cleanly. Currently the language only supports assignment of form
 <identifier> = <expression> 
and this is clearly rather limiting, for example, you can't assign into an array element. The problem in allowing more complex expressions as the left-hand side is that reading from and writing to an lvalue must be compiled differently, and the parser can't know whether an expression is an assignment until it has already parsed the lhs and hits a '=' token, and then it is too late. Or is it? I can think of a couple of ways to solve the problem, but they are more or less hacky and involve dynamic_casts which I'd like to avoid as much as possible. Thanks, -Johannes

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Why is it 'too late' once the parser hits the =? Generally, the parser will split your code up into tokens, then you'll make an sytnax tree from that (AST).

Share this post


Link to post
Share on other sites
One token look-ahead is typical in any parser.

Note that token != character. Thus, most parsers will first scan the text for tokens, which are returned to the semantic parser, that builds an expression tree based on the tokens that come in. You can make it using recursive descent, or using a stack based state machine.

Try using a compiler construction tool set, like bison/flex. They're designed to make these things easier. There are even C++-specific tools that are more modern than b/f.

Share this post


Link to post
Share on other sites
Yes, my parser reads a stream of tokens produced by the lexer, and builds a syntax tree using recursive descent. The syntax tree consists of polymorphic nodes that the compiler traverses using the visitor pattern. This is quite nice and clean, this far.

Now, when the parser encounters the assignment operator, it has already created a syntax tree representation of the left hand side, and that subtree doesn't "know" (contain the information) that it is the lhs of an assignment, and thus the compiler thinks it is a load, not a store operation (by default, an identifier gets compiled to a load, as does an array element access).

When the parser hits the '=', it could set a flag in the subtree that the compiler could check, and compile the tree differently depending on the flag's value. Implementing this cleanly is what I have problems with.

As the project is largely to gain insight into the world of compilers, I'd rather not use any automatic lexer/parser generators.

The source (admittedly messy at places) can be found here (tarball here). The relevant files are parser.h/cpp and compiler.h/cpp.

Share this post


Link to post
Share on other sites
I'd say you basically need to unify the mechanisms. The subtree shouldn't just yield a value, but should also yield a storage location if it's possible to use it as an l-value. Maybe it'll help if you think of such an expression as being one that takes a given memory location and, after applying modifiers to it, returns a new memory location.

Share this post


Link to post
Share on other sites
I could be misunderstanding the problem, but it sounds to me like all you need to do is write the grammar as
<expression> = <expression>
and make sure that the left hand side is an lvalue in the semantic phase. (this is basically a matter of recursively asserting that the expression only contains dots and array subscripts)

From there, you just need to emit a load and a store.

Share this post


Link to post
Share on other sites
SpeedBump: The problem is lack of context. The function to compile an <expression> doesn't know (and, IMHO, shouldn't need to know) if the expression happens to be an lhs of an assignment or not. The compiler could set a flag when compiling <assignment> and compile <expression> differently depending on its value, but IMHO it is a somewhat kludgy solution, particulary considering that everything else in the compiler is completely context-independent.

Kylotan: Yep, I have been thinking about something similar. What about this: when loading a variable to a register, instead of copying its value, load a pointer to it. AFAICS this would work correctly in all situations. Gotta try it out.



Share this post


Link to post
Share on other sites
Yeah, basically you need to allow some operations to yield the actual variable rather than the variable's value. This lets you apply offsets (eg. for arrays) or membership operators (eg. for objects/classes). Then you read the value only at the very point that an operation needs it. A pointer is the low-level way of doing this but as long as you're preserving the variable semantics throughout then you'll be ok.

Share this post


Link to post
Share on other sites

This topic is 4807 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this