• Advertisement
Sign in to follow this  

Compiler: Code generation

This topic is 1236 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. I'm writing a little compiler for my custom language and arquitecture.

Until now i finally got the lexic and syntactic parser to work perfectly.

 

Im' using the LR(1) bottom-up parsing algorithm.

 

Anyway. I'm planning to do the semantics and the code generation together in one step.

The semantics will take care of the types system, the variables and functions names and addresses, parameters, etc.

At the same time i want to make the code generation. Up to now, i haven't found a clear way to this.

 

I'm writing the compiler on python.

 

The idea i'm using now is the following:

action = self.actionTable[(state, token)]
        # We have a reduce
        if action[0] == 'r':
            tempList = []
            symbolsToPop = len(self.grammar[action[1]]) - 1

            # Pop 2B symbols from stack
            for i in range(symbolsToPop):
                self.stack.pop()
                tempList.append(self.stack.pop())

            # Get top of the stack state
            state = self.stack[-1]
            leftHandText = self.grammar[action[1]][0].text
            leftHand = self.semanticAnalizer.getReduction(leftHandText, tempList)

            # Push grammar production left hand
            # Then push the new state from the gotoTable
            self.stack.append(leftHand)
            if (state, leftHand.text) not in self.gotoTable:
                print("Something wrong with goto Table", state, leftHand.text)
                raise LGESyntaxError()

            self.stack.append(self.gotoTable[(state, leftHand.text)])

This is a simplified version of my code. The point is that when i do a reduction i call the semantic analizer and give it the grammar rule and the tokens that got popped from the stack.

 

The problem is that i still don't know how to implement correctly the semantic analisis using this approach.

 

The obvious approach was to actually build the syntax tree and then traverse it in preorder to generate the asm code. But many texts mention that it can be done without building the tree. After reading Cooper & Torczon they mention the approach i am trying, but they don't mention anything about how to actually implement the semantic analysis.

 

My first attempt was somethin like this:

def getReduction(self, leftHand, rightHand):
        value = None
        if leftHand == "grammarRule1":
            # Check which rule it applies
            # Maybe add some value to the token??
            # Maybe add some type??
        elif leftHand == "grammarRule2:
        .
        .
        .
        return Token(leftHand, value, type)

With this, i have to create an if for each rule in my grammar. And still worse, there are different rules with the same leftHand. So the code will end with hundreds of if-elif statements.

In C++ i would use a switch case, but still think this is not the best solution.

 

Any ideas how to "elegantly" implement this???

Share this post


Link to post
Share on other sites
Advertisement

Just use a parser generator. I don't know what tools have Python support, but Bison, Antlr, and Marpa are examples of tools that makes what you're trying to do easy.

Share this post


Link to post
Share on other sites

I forgot to mention that i am not willing to use a parser generator. Although, the parser already works. I am looking for an elegant way to implement code generation, not syntactic analyis. By the way, i have read about attribute grammars, and i don't think that's the best solution for this case.

 

I'll keep reading. Meanwhile, any help is very welcome.

Share this post


Link to post
Share on other sites
I recommend against trying to do semantic analysis/type checking/etc. in the same pass as code generation. It may sound like the simpler approach but in reality it will work against you as the language gains complexity and power.

Consider the case where you have code like this:

void FunctionA () {
    FunctionB(42);
}

void FunctionB (int param) {
    // do something
}
Your semantic analysis process is not a simple linear pass over the code; instead, you need to resolve things in unpredictable orderings based on the source code. In the above case, if you do a linear pass and analyze FunctionA before FunctionB, you will not be able to validate that FunctionB even exists. How will you check its parameter types, for example?

Unless you want to really cripple the language, semantic analysis usually is best implemented as depth-first traversal of the AST. During semantic analysis, in addition to finding errors, you should be marking up the AST (or generating a new one in a different IR) to contain things necessary for code generation.

Once you have all of your semantic data accumulated and stored in some suitable IR, code generation is pretty much a matter of walking the IR and converting nodes to instructions or instruction sequences. Depending on the architecture target of your compiler, this can range from trivial to messy. Code gen is also often implemented as a series of many passes, such as in LLVM.

Share this post


Link to post
Share on other sites

That was really helpful.

Fortunatly my language has a define-before-use restriction. But you made me think of othe type of problems that my arise if i try this approach. At the end i will have to create some kind of tree(as an IR) which, as you say, contains the necessary semantic information. After the tree or whatever the IR is, use it to generate the code.

 

Thanks a lot for the help.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement