Sign in to follow this  
Followers 0
deus.ex.nova

Trouble interpreting a custom language.

9 posts in this topic

This problem has been driving me insane. I am trying to figure out how to interpret a simple, custom language. Basically I need to feed a text file that has the code, and then tokenize and execute the code accordingly. Here are the rules of the language:
[list]
[*][b]START [/b]and [b]FINISH [/b]blocks define different levels of scope.
[*][b]COM [/b]represents single line comments.
[*][b]VAR [/b]indicates a variable declaration (i.e. VAR myVariable). Variables and their values are stored in a hash table.
[*][b]PRINT [/b]Prints to the console either the value of a variable (e.g. PRINT myVariable) or the result of an expression (e.g. PRINT myVariable + 25).
[*]Expressions are limited unary and binary arithmetic, and values are integers.
[*]All tokens are separated by a space or newline.
[*]Operators include:
[list]
[*]Assignment " [b]= [/b]" (e.g. myVariable = 15)
[*]Increment " [b]++[/b] " (e.g. myVariable ++)
[*]Decrement " [b]--[/b] " (e.g. myVariable --)
[*]Addition " [b]+ [/b]"
[*]Subtraction " [b]- [/b]"
[*]Multiplication " [b]*[/b]"
[*]Division " [b]/ [/b]"
[*]Modulo " [b]% [/b]"
[*]Power/Exponent " [b]^ [/b]"
[/list]
[/list]
I'm having trouble figuring out how I should handle tokens that operate in different ordering, like PRINT myVar vs. thirdVar = firstVar * secondVar (where the operators lie between the operands). Lately I've been trying to recursively execute a function (returns an int) that does a different operation depending on the token that is extracted, but I run into some problems of tracking what the operand was preceding an multiplication operator.

I also had the idea of pushing tokens that I extract onto 2 separate stacks - one for keywords/operators and the other for variable and literal values. Then after tokenizing the line of code, the program would continuously pop a keyword/operator off the stack and execute an "if" block that matched that token. I can't exactly remember what problems I encountered with this approach, because I obviously didn't use it.

Is there a better way to structure the interpreter, assuming that I cannot use 3rd-party libraries like boost? This problem has been plaguing me for the past 2 days... [img]http://public.gamedev.net//public/style_emoticons/default/sad.png[/img]
0

Share this post


Link to post
Share on other sites
The hardest part of this is evaluating expressions. The usual solutions are:
A) Convert to [url="http://en.wikipedia.org/wiki/Reverse_Polish_notation"]reverse polish notation[/url] ([url="http://en.wikipedia.org/wiki/Shunting-yard_algorithm"]the algorithm[/url] is not hard to find) and execute using a stack-based machine (if you have used HP calculators or programmed the 8087 FPU you'll get a good idea of how this works).
B) Write a [url="http://en.wikipedia.org/wiki/Recursive_descent_parser"]recursive-descent parser[/url].
C) Use a parser generator, like bison or yacc. I am sure there are newer ones, but I am a dinosaur. ;)
D) Use Boost.Spirit.

You already said you can't use D, and it sounds like C might not be acceptable either. B is a great exercise and I strongly recommend you go that route. If you try it and have trouble with it, post your attempt and I can help you fix it.

By the way, why the restriction of not using third-party libraries? Homework? Edited by Álvaro
3

Share this post


Link to post
Share on other sites
[quote name='Álvaro' timestamp='1355695749' post='5011369']
A) Convert to reverse polish notation (the algorithm is not hard to find) and execute using a stack-based machine (if you have used HP calculators or programmed the 8087 FPU you'll get a good idea of how this works).
[/quote]
I never considered converting expressions to reverse polish form. I have created a reverse polish calculator before. This might be an option.

[quote name='Álvaro' timestamp='1355695749' post='5011369']
B) Write a recursive-descent parser.
[/quote]
I took a quick look at this and the implementation seemed more complicated than I felt I had time for, but at this point I might just have to do it the right way since I'm not going to make the deadline anyways.

[quote name='Álvaro' timestamp='1355695749' post='5011369']
By the way, why the restriction of not using third-party libraries? Homework?
[/quote]
Yup, this is my last assignment for this semester. I didn't realize that parsing this custom language would be this difficult to figure out on my own. It wouldn't be so frustrating if I didn't have finals this Tuesday...

[quote name='Álvaro' timestamp='1355695749' post='5011369']
If you try it and have trouble with it, post your attempt and I can help you fix it.
[/quote]
Thanks for your help.
0

Share this post


Link to post
Share on other sites
I'm trying to understand the example [url="http://en.wikipedia.org/wiki/Recursive_descent_parser"]Reverse Descent Parser[/url] from that wiki.[list=1]
[*]Since they did not provide an implementation of [b]getsym()[/b] function, I can only assume that it takes in a string extracted from the source code by some external means (i.e. a tokenizer), matches the string to a [b]Symbol [/b]enum, and then simply updates the [b]sym [/b]variable with the corresponding enum? If the string that was fed in does not pass the chain of if-else statements, then [b]sym[/b] gets some miscellaneous value? The thing I don't understand is how it's getting token unless there's some other global variable holding the code.
[*][b]accept(Symbol s)[/b] simply checks to see if the symbol is one defined in the enumeration, and returns an int that acts like a boolean value representing success/failure.
[*][b]expect(Symbol s)[/b] is simply in charge of reporting an unexpected symbol.
[*][b]factor()[/b], [b]term()[/b], and [b]expression()[/b] are much less clear to me. Much of my assumptions are based on the names. I'm assuming [b]factor()[/b] does a lot of the low level work, as in doing the calculations from the expressions? [b]term()[/b] and [b]expression()[/b]'s names do not make sense to me, since it seems like one is checking for addition and subtraction, while the other is checking for multiplication and division. Why not bundle them together - unless this is supposed to represent order of operations?
[*][b]condition[/b] doesn't seem relevant in my situation, since I won't be doing any boolean expressions.
[*][b]statement()[/b] manages syntax for regular statements and control logic.
[*][b]block()[/b] isn't clear to me either.
[/list]
I've never tried to follow a recursive algorithm with more than one function before, so this is a bit difficult for me.
0

Share this post


Link to post
Share on other sites
I haven't looked at the code in the Wikipedia page. When I have implemented this myself, I also had functions to parse factors, terms and expressions separately. The idea is that the grammar works as follows:
* An expression is formed by adding and subtracting terms
* A term is formed by multiplying and dividing factors
* A factor is either a number, or a variable, or an expression in parenthesis.

I would start by implementing just a calculator that can handle that kind of thing. You can build up to a full interpreter without too much trouble from there.
1

Share this post


Link to post
Share on other sites
I see. Thanks for clearing up the relationships between expressions, terms, and factors. I've never heard of those terms, aside from 'expression', used in this kind of context. I was sort of implementing something like a reverse descent parser, but everything was in one big, ugly function with a bunch of if statements. I'll see what I can do from here, thanks.
0

Share this post


Link to post
Share on other sites
Quick thread hijack here. Any recommendations on free online resources on the subject of writing compilers and interpreters while we're at it? I can write CPU/bytecode interpreters (although I have never looked at stack based, only register based) but nothing more.
0

Share this post


Link to post
Share on other sites
[quote name='Álvaro' timestamp='1355695749' post='5011369']
B) Write a recursive-descent parser.
[/quote]
This was our final project in my Principles of Programming Languages course during my undergrad. Though, we were using LISP, which was made for stuff like this. Are you language-bound? Actually, if introducing a new language is required, forget that, don't want to make the task that much harder.

Anyhow, this is spot on:
[quote name='Álvaro' timestamp='1355715137' post='5011493']
A factor is either a number, or a variable, or an expression in parenthesis.
[/quote]
So at its core, your recursive parser just needs to be able to evaluate expressions made up of numbers, variables, and operands, which means parsing out a string based on expression levels, and an expression with no operands (special literals you designate, like + and %) is a return case at the bottom of a recursion stack: it's either a literal, or a lookup in a variable table, and is likely needed to evaluate the next expression up the chain.

Good luck!
0

Share this post


Link to post
Share on other sites
[quote name='6677' timestamp='1355859098' post='5012172']
Quick thread hijack here. Any recommendations on free online resources on the subject of writing compilers and interpreters while we're at it? I can write CPU/bytecode interpreters (although I have never looked at stack based, only register based) but nothing more.
[/quote]

The best I ever came across (and I did manager to write a compiler based on this) is the [url="http://compilers.iecc.com/crenshaw/"]Let's Build a Compiler[/url] series. It's complicated a bit by it being written in Pascal and generates executable code for 68000 processers, but it is relatively easy to follow and convert into C code and make it generate byte code.
1

Share this post


Link to post
Share on other sites
[quote name='BCullis' timestamp='1355861769' post='5012189']
Are you language-bound? Actually, if introducing a new language is required, forget that, don't want to make the task that much harder.
[/quote]
Yeah this language is just something that my professor came up with for the assignment. I think I've come up with an alternative solution, but I want to try an implement the RDP after I turn the assignment in for the sake of learning, and doing something cool and proper. :) I'll revisit this thread once I have something to show, and maybe I can get some feedback on how to design it better.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0