Sign in to follow this  
Khaos

Compiler design

Recommended Posts

Khaos    170
Let's say I wanted to create a compiler for a little language of my design. Could someone explain the general process? Should the compiler take my code and turn it into assembly code, and then use an assembler, such as NASM to assemble it, or write my own assembler and assemble it? Either way, that doesn't seem like the right thing to do. Also, I don't want to use those compiler design libraries. But anyway, how do the current compilers do it? All I see, even with home-made compilers, is a source code to executable conversion.. and no middle step (assembly). I would like to do this, but I am not sure what they are doing. So, in general, what is the compiler converting my code into. Say for example: x=10; x=x+1 -> What might this be converted into, in general? I don't know how to ask this clearly. If you know what I mean, please help. I am not asking for specifics, just the general idea so I can get started. As simple as possible for now, nothing big, nothing fancy. Thank you.

Share this post


Link to post
Share on other sites
Roboguy    794
Probably the easiest thing to do would to either use a VM or compile into assembly (IIRC, GCC always compiles to assembly, then assembles it with the assembler specified in it's configuration file).

Share this post


Link to post
Share on other sites
nprz    692
In my compiler class, we didn't have enough time to get from interpretor code to assembly, but there are so steps that make the process a bit easier.

Scanner: Determine symbols or combinations or symbols that aren't allowed, like <> wouldn't be seen in C, but is used in VB. Change the code into tokens, so that your parser can read it easier.

Parser: Determines that your grammar is correct. If you have a const put into a byref function, the parser should reject this.

Symbol Generator: Symbols need to be generated for everything. (x + 1) will be converted to a temporary symbol.

Intermediate Code or Assembly Code: You can turn all of your source code into assembly code in this step.

You should also make sure that your grammar is in a determinate form, such that looking at one token at a time, you know which path to take. This part is hard for me to explain though.

Share this post


Link to post
Share on other sites
timw    598
in general a compiler will take code and convert it into intermediate representation like an AST or some sort of tree structure. some compilers will optomize this, it's like turning your code into a tree graph, really handy when you want to do optomization. that (optomized tree) is then traversed and assembly is generally produced. then it is usually fed into an assembler, writing your own is a huge waste of time, considering debuging a compiler is no easy task, youll have pleanty of work to do anyway. read up on using comiler compilers like bison or jcup. these tools are invaluable to learn if you want to write a compiler, as they will save you much time in the long run of the project.

Share this post


Link to post
Share on other sites
Helter Skelter    332
Quote:
Original post by Khaos
Let's say I wanted to create a compiler for a little language of my design. Could someone explain the general process?

Traditional compilers are broken down into several parts starting with lexical analysis. This takes an input stream (memory buffer, file, etc) and tokenizes the input. In the case of operators that are represented by more than one symbols (say '==' or '!=') a single token is used to represent them. For languages such as C, C++, and Java the lexical parser usually skips over whitespaces, new line characters, and comments rather than returning them as tokens (which makes parsing easier).

The next component is the parser which takes the tokenized input and matches it against the grammar of the language. In many cases the parser and lexical tokenizer work in parallel with the lexer returning one token at a time and the parser checking, accepting, rejecting, or skipping tokens in the input stream based on the langauges grammar. A good number of compilers use this step to create an abstract syntax tree. This is a tree of data structures representing the tokenized input in such a way that represents their position in the input stream.

Next you might run the abstract syntax tree through an optimizer that would either modify or remove certain nodes in the tree. This might be done to accomplish common subexpression elimination, constant folding, or loop unrolling depending on the type of target (VM or CPU).

From here you have several options to decide upon. If you are using an abstract syntax tree you might convert it into intermediate code that represents the operations in a sequential fashion rather than that of the AST. You also might generate assembly code or directly emit a binary object or java like class file.
Quote:
Should the compiler take my code and turn it into assembly code....Either way, that doesn't seem like the right thing to do.

This is a common approach and is by far one of the easier methods and quite acceptable. Compilers that don't use an external assembler to generate .obj or java like .class files generally generally have an internal component that does virtually the same thing.
Quote:
Also, I don't want to use those compiler design libraries. But anyway, how do the current compilers do it?

While you can certainly create all the lexical and grammar parsing parts of the compiler by hand I don't recommend it. Tools such as flex, bison, antlr, etc. are extrmely helpful and make langauge development much easier - especially since you will likely end up duplicating code in hand made parsers.
Quote:
So, in general, what is the compiler converting my code into. Say for example: x=10; x=x+1 -> What might this be converted into, in general?

Depending on the operations performed on x before and after the code given you might end up with something like this:

; unoptimized
mov dword ptr [ebp - 4], 10 ; set x = 10
mov eax, dword ptr [ebp - 4] ; get x
add eax, 1 ; add 1
mov dword ptr [ebp - 4], eax ; set x

; partial optimization
mov dword ptr [ebp - 4], 10 ; set x = 10
inc dword ptr [ebp - 4] ; do x = x + 1

For code generation look into using a burg implementation which, like bison and flex, takes an input file describing the operations to generate code for and automatically creates a code generator.

For an example of a compiler containing hand coded lexer and parser along with a burg code generator take a look at LCC.

Share this post


Link to post
Share on other sites
SiCrane    11839
Well, that's slightly debatable since the wikipedia link also has links to the various parts of compiler design inside wikipedia. :)

Share this post


Link to post
Share on other sites
Extrarius    1412
Well, better for GameDev (if you use the buy links) at least =-)

While on the topic of compiler books, are there any more modern books that are good? I recently got an email from amazon about a new book on virtual machines, and that caused me to look around a little. I found quite a few newer books on compiler creation (many of which seemed to be college textbooks books based on prices).
For example:

Virtual Machines : Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design) - Jim Smith; Hardcover
Constructing Language Processors for Little Languages - Randy M. Kaplan; Paperback
Compiler Construction: Principles and Practice - Kenneth C. Louden; Hardcover
Engineering a Compiler - Keith Cooper; Hardcover
Modern Compiler Design - D. Grune; Paperback
Optimizing Compilers for Modern Architectures : A Dependence-based Approach - Ken Kennedy; Hardcover
Advanced Compiler Design and Implementation - Steven Muchnick; Hardcover

None of the above books have enough reviews on Amazon to make an informed decision about. Have you read any of them, or do you know of a site that either has a large number of reviews or reviews from authorities? I find most review sites have reviews from people that are forced to use a book for a college course they aren't interested in and reviews from people that are greatly impressed by the book solely because they haven't bothered reading any other books on the subject.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this