Trouble interpreting a custom language.

Started by
8 comments, last by deus.ex.nova 11 years, 4 months ago
This problem has been driving me insane. I am trying to figure out how to interpret a simple, custom language. Basically I need to feed a text file that has the code, and then tokenize and execute the code accordingly. Here are the rules of the language:

  • START and FINISH blocks define different levels of scope.
  • COM represents single line comments.
  • VAR indicates a variable declaration (i.e. VAR myVariable). Variables and their values are stored in a hash table.
  • PRINT Prints to the console either the value of a variable (e.g. PRINT myVariable) or the result of an expression (e.g. PRINT myVariable + 25).
  • Expressions are limited unary and binary arithmetic, and values are integers.
  • All tokens are separated by a space or newline.
  • Operators include:

    • Assignment " = " (e.g. myVariable = 15)
    • Increment " ++ " (e.g. myVariable ++)
    • Decrement " -- " (e.g. myVariable --)
    • Addition " + "
    • Subtraction " - "
    • Multiplication " *"
    • Division " / "
    • Modulo " % "
    • Power/Exponent " ^ "


I'm having trouble figuring out how I should handle tokens that operate in different ordering, like PRINT myVar vs. thirdVar = firstVar * secondVar (where the operators lie between the operands). Lately I've been trying to recursively execute a function (returns an int) that does a different operation depending on the token that is extracted, but I run into some problems of tracking what the operand was preceding an multiplication operator.

I also had the idea of pushing tokens that I extract onto 2 separate stacks - one for keywords/operators and the other for variable and literal values. Then after tokenizing the line of code, the program would continuously pop a keyword/operator off the stack and execute an "if" block that matched that token. I can't exactly remember what problems I encountered with this approach, because I obviously didn't use it.

Is there a better way to structure the interpreter, assuming that I cannot use 3rd-party libraries like boost? This problem has been plaguing me for the past 2 days... sad.png
Advertisement
The hardest part of this is evaluating expressions. The usual solutions are:
A) Convert to reverse polish notation (the algorithm is not hard to find) and execute using a stack-based machine (if you have used HP calculators or programmed the 8087 FPU you'll get a good idea of how this works).
B) Write a recursive-descent parser.
C) Use a parser generator, like bison or yacc. I am sure there are newer ones, but I am a dinosaur. ;)
D) Use Boost.Spirit.

You already said you can't use D, and it sounds like C might not be acceptable either. B is a great exercise and I strongly recommend you go that route. If you try it and have trouble with it, post your attempt and I can help you fix it.

By the way, why the restriction of not using third-party libraries? Homework?

A) Convert to reverse polish notation (the algorithm is not hard to find) and execute using a stack-based machine (if you have used HP calculators or programmed the 8087 FPU you'll get a good idea of how this works).

I never considered converting expressions to reverse polish form. I have created a reverse polish calculator before. This might be an option.


B) Write a recursive-descent parser.

I took a quick look at this and the implementation seemed more complicated than I felt I had time for, but at this point I might just have to do it the right way since I'm not going to make the deadline anyways.


By the way, why the restriction of not using third-party libraries? Homework?

Yup, this is my last assignment for this semester. I didn't realize that parsing this custom language would be this difficult to figure out on my own. It wouldn't be so frustrating if I didn't have finals this Tuesday...


If you try it and have trouble with it, post your attempt and I can help you fix it.

Thanks for your help.
I'm trying to understand the example Reverse Descent Parser from that wiki.

  1. Since they did not provide an implementation of getsym() function, I can only assume that it takes in a string extracted from the source code by some external means (i.e. a tokenizer), matches the string to a Symbol enum, and then simply updates the sym variable with the corresponding enum? If the string that was fed in does not pass the chain of if-else statements, then sym gets some miscellaneous value? The thing I don't understand is how it's getting token unless there's some other global variable holding the code.
  2. accept(Symbol s) simply checks to see if the symbol is one defined in the enumeration, and returns an int that acts like a boolean value representing success/failure.
  3. expect(Symbol s) is simply in charge of reporting an unexpected symbol.
  4. factor(), term(), and expression() are much less clear to me. Much of my assumptions are based on the names. I'm assuming factor() does a lot of the low level work, as in doing the calculations from the expressions? term() and expression()'s names do not make sense to me, since it seems like one is checking for addition and subtraction, while the other is checking for multiplication and division. Why not bundle them together - unless this is supposed to represent order of operations?
  5. condition doesn't seem relevant in my situation, since I won't be doing any boolean expressions.
  6. statement() manages syntax for regular statements and control logic.
  7. block() isn't clear to me either.

I've never tried to follow a recursive algorithm with more than one function before, so this is a bit difficult for me.
I haven't looked at the code in the Wikipedia page. When I have implemented this myself, I also had functions to parse factors, terms and expressions separately. The idea is that the grammar works as follows:
* An expression is formed by adding and subtracting terms
* A term is formed by multiplying and dividing factors
* A factor is either a number, or a variable, or an expression in parenthesis.

I would start by implementing just a calculator that can handle that kind of thing. You can build up to a full interpreter without too much trouble from there.
I see. Thanks for clearing up the relationships between expressions, terms, and factors. I've never heard of those terms, aside from 'expression', used in this kind of context. I was sort of implementing something like a reverse descent parser, but everything was in one big, ugly function with a bunch of if statements. I'll see what I can do from here, thanks.
Quick thread hijack here. Any recommendations on free online resources on the subject of writing compilers and interpreters while we're at it? I can write CPU/bytecode interpreters (although I have never looked at stack based, only register based) but nothing more.

B) Write a recursive-descent parser.

This was our final project in my Principles of Programming Languages course during my undergrad. Though, we were using LISP, which was made for stuff like this. Are you language-bound? Actually, if introducing a new language is required, forget that, don't want to make the task that much harder.

Anyhow, this is spot on:

A factor is either a number, or a variable, or an expression in parenthesis.

So at its core, your recursive parser just needs to be able to evaluate expressions made up of numbers, variables, and operands, which means parsing out a string based on expression levels, and an expression with no operands (special literals you designate, like + and %) is a return case at the bottom of a recursion stack: it's either a literal, or a lookup in a variable table, and is likely needed to evaluate the next expression up the chain.

Good luck!

Hazard Pay :: FPS/RTS in SharpDX (gathering dust, retained for... historical purposes)
DeviantArt :: Because right-brain needs love too (also pretty neglected these days)


Quick thread hijack here. Any recommendations on free online resources on the subject of writing compilers and interpreters while we're at it? I can write CPU/bytecode interpreters (although I have never looked at stack based, only register based) but nothing more.


The best I ever came across (and I did manager to write a compiler based on this) is the Let's Build a Compiler series. It's complicated a bit by it being written in Pascal and generates executable code for 68000 processers, but it is relatively easy to follow and convert into C code and make it generate byte code.

Are you language-bound? Actually, if introducing a new language is required, forget that, don't want to make the task that much harder.

Yeah this language is just something that my professor came up with for the assignment. I think I've come up with an alternative solution, but I want to try an implement the RDP after I turn the assignment in for the sake of learning, and doing something cool and proper. :) I'll revisit this thread once I have something to show, and maybe I can get some feedback on how to design it better.

This topic is closed to new replies.

Advertisement