• Advertisement
Sign in to follow this  

Compiler Back-End

This topic is 3764 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Alright. I've got a hand-rolled lexer and parser happening (because, well, they're as easy as pie), giving me a beautiful parse-tree. That's all dandy and wonderful. I've also got my target (NASM), assembling and creating wonderful PEs (for the windows implementation, of course). But I'm rather stumped on going from the parse tree to the ASM code. Don't get me wrong, I understand ASM fine. That's not the problem. The problem is the code generation. The part that takes up 90% of compilers, really. I'm looking for, maybe, someone telling me the list of steps (in order!) to take the parse-tree and make it ASM code. As it stands I have a (very) minute subset of this toy language I want to compile, but I want the practices to scale when I start extending the back-end to deal with subsequently larger portions of the parse-tree. Any help? P.S: No, I don't have the Dragon Book, and I won't be getting it anytime soon (no money!), so please, please, don't take the easy way out and offer a list of books. I know they'd be a good investment, but it's an investment I can't afford to take right now.

Share this post


Link to post
Share on other sites
Advertisement
The rough idea is you take your AST (Abstract syntax tree, distinct from your normal parse tree in that in an AST you'd have say an add node, with two numbers as children, in the parse tree you might have an expression node, with an add node as a child, which has two children, a number node, and an expression node, which in turn have children which are the two numbers), and for each node in it generate some code for it recursively. E.g. if you had an expression e1 + e2, to generate the code for the + node, you'd generate code for the e1 and e2 nodes (this is the recursion), getting them to store their data somewhere and the add node would then emit code which adds the two results. Generally you do not immediately generate code for the target platform, instead you use some form of intermediate language that you make up yourself. You could use a stack based language as the intermediate language where everything is done using the stack, e.g. say you were generating code for (1 + 2) + 3, you have something like:

push 1
push 2
add
push 3
add

Here push pushes things to the stack, add pops off the two things on top of the stack, adds them and pushes the result back on to the stack. So this little fragment of code would leave (1 + 2) + 3 = 6 on top of the stack (which could then be used by some other code).

You could also use a language where you have an infinite number of registers.

Once you've generate the intermediate code, you need to transform into real code. Here you'll need to map stuff on the stack (in the stack based language) to registers, and in the infinite registers language map them to a finite set of registers. This can be done using Register Allocation.

The idea of the intermediate code is it's simple and easy to generate from your AST. You may have several versions of intermediate code, each 'lower level' than the last, e.g. from AST->1st IL you may have constructs in the intermediate code that directly support various language features, later on you'd expand that into more primitive code that would eventually be transformed into x86.

That's a very rough overview but hopefully it should give you a vague idea of where to start from.

Share this post


Link to post
Share on other sites
It would probably be a good idea to consider using LLVM (llvm.org). You would still have to learn how to create code for LLVM's intermediate format from your AST, but you could forget about most of the grunt work, such as register allocation, and still get efficient code.

First you create code for an abstract machine that has an infinite number of registers and uses LLVM's instruction set. LLVM will then map these phony registers and instructions into real hardware registers and instructions for you.

You could do this by for instance letting each AST node contain an optional variable resReg in which it stores the value it computes, if any. When you want to generate code for "a + 1" you would ask the plus node to generate code for itself, which would in turn ask the variable reference node and the constant to generate code for themselves.

Also, I would recommend against buying the dragon book since there are much better books available nowadays. It is mostly recommended for historical reasons. I've read or at least leafed through almost every book on the market, and the best ones I've come across are "Engineering a Compiler" and "Modern Compiler Implementation". After that you can proceed with Muchnick's book.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement