Archived

This topic is now archived and is closed to further replies.

Writing a compiler?

This topic is 5399 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

For my university project next year, I am planning on writing a compiler for a language I created. At this point I guess I am a bit ahead of myself, but I have over a year to research the topic and understand it. I have a few years C/C++ experience, half a year of C#, and I know a bit of assembly. I guess I will be writing the compiler itself in C#. My question is, does anyone know any resources (websites/books) that would help me? I guess I would need information about the windows exe protocol (so I know exactly what makes up an executable), and how exactly I could convert my high-level statements into assembly. I have already done some research in tokenizers and how my code should be parsed, the area which I think I am lost in is the step after that, actually converting to assembly and builing an executable. (although I have done excercises that involved converting loops and conditions into assembly etc.) Any help appreciated. [edited by - FearedPixel on March 4, 2003 7:46:06 AM]

Share this post


Link to post
Share on other sites
Look here for some additional info

http://www.scifac.ru.ac.za/compilers/conts.htm

You should probably check out the Lex & Yacc link in my sig too..

If it''s a interpreted language you would check out the script language tutorials here at gamedev and I think they got one at www.flipcode.com too.. Check out the ''Lex & Yacc'' link in my sig too

God speed.

#define Email Lex & Yacc Function Pointers Virtual Terrain Knowledge Base Real Programmers
"Evolution is NOT a mistake"

Share this post


Link to post
Share on other sites
I agree that the Dragon Book (Aho et al.) is excellent.

I used it to write my first interpreter and it gave me (what I feel is) a very strong understanding of the underlying concepts behind compilers, languages, and language design.

I absolutely reccomending checking it out of the library (80 dollars is steep for a book!)

Share this post


Link to post
Share on other sites
Thanks everyone for the help.

I would intend to make it compiled, however if it turns out to be too difficult I will look into making it interpreted, and produce exe''s which are simply the interpreter with the program attached to the end of it.

Either way, I definitly want to be able to produce windows executables.

Share this post


Link to post
Share on other sites
quote:
Original post by FearedPixel
I guess I will be writing the compiler itself in C#.

Why? C# isn''t noted for its text processing capabilities, nor for low-level operations (though you really don''t need low-level operations to write a simple compiler - you do to write a competitive compiler, though). Believe it or not, Perl would be a better option to write a compiler in than C# (syntax validation, tokenization, pattern recognition, assembly source generation... all text operations, all easier/better done in Perl).

Not that you can''t do it in C# or that C# is a poor choice. I''m just wondering if you''re constrained by, say, the languages you know as opposed to the best tool for the job...

quote:
My question is, does anyone know any resources (websites/books) that would help me? I guess I would need information about the windows exe protocol (so I know exactly what makes up an executable), and how exactly I could convert my high-level statements into assembly.

I don''t think you need to know anything about Windows'' executable "protocol". All you need to know is how to generate correct (and preferrably fast) assembly and then invoke the assembler.

quote:
I have already done some research in tokenizers and how my code should be parsed, the area which I think I am lost in is the step after that, actually converting to assembly and builing an executable.

If you''ve parsed the code, then you''ve gathered the information necessary to determine the unambiguous intent of the code (if there''s ambiguity not covered by the language definition, spit the code in the user''s face) and can emit the appropriate assembly language constructs, taking care of issues like register allocation, the frame pointer and so forth. Personally, I dislike writing assembly (except on a SPARC!) so I would have written my compiler to emit C and then invoked GCC. That''s how cfront (the original C++ frontend) worked.

Another thing you might want to consider is to define your language such that it can both be runtime interpreted (very useful during development/debugging) and compiled (usually smaller distributable, [marginally] faster execution) - maybe even JIT compiled! Support reflection and introspection would also be cool, and very instructive.

Are you required to have designed the language, or can you extend/implement an existing language (eg adding multipass reference resolution to C++, maybe packages too, to eliminate #includes or at least eliminate the need for forward referencing, or write a native compiler for Python)?

*HTTP 500
*HTTP 500
*Proxy 11001

Share this post


Link to post
Share on other sites
check out flex++/bison++ they are good tools for creating scanners and parsers (things that all compilers need)

I''m in a compiler course right now, and we use these tools extensively. They wouldn''t be useful for a C# project however, but maybe C# has something similar.

Share this post


Link to post
Share on other sites
Doesn''t your uni have a compilers class/set of classes? These were pretty much the most difficult offered at mine, but were also some of my favorites (definitely in my top 5 classes).

Can you work with a partner? Just talking about compiler issues helps tremendously; I can''t imagine doing it all on your own the first time around without having someone to bounce ideas and designs off of.

DESIGN DESIGN DESIGN. If you fail with your design, your compiler will fail. Period. You will not be able to do anything "tricky" in the later stages of compiling if your design sucks.

Get the Dragon book, as mentioned.

Check out Lex and Yacc, as mentioned.

You wrote the language? Do you have a grammar for it? You will need it. Do that.

I would think that you would be much much better off taking compiler courses if offered; without some lecture notes to get me started desiging in the right direction, I don''t think I would have ever finished.

Let me get this straight...the rest of this year and all of next year? If you are by yourself, never having done this before, and do not have an instructor teaching the material and forcing deadlines on you...you have just about the right amount of time. Don''t wait. It will kill you. I found slacking off and getting by was possible in most CS classes. Not so with this one. I am so glad I took it seriously.

Maybe some of the affore mentioned scripting books would be a good investment. I am actually interested in picking up game scripting mastery, more so now.

Out of curiosity, why the fascination with the Windows exec? (I did all of my dev under Solaris, but then I actually wrote the compiler in Java)

Good luck. Compilers rules. It is just daunting your first time around.

The Tyr project is here.

Share this post


Link to post
Share on other sites
Sorry, sadly I lost my favorite book, all I can tell you is that it goes from parsing to machine code on a pretend langauge "c-".

I''ll find it later, maybe.

The dragon book is very high level, and goes on a top down appoach, but if you learn better from code, this book goes from a bottom up approach.

Share this post


Link to post
Share on other sites
Here are two other, absolutely fantastic books on compilers:

Programming Language Processors in Java: Compilers and Interpreters
ISBN#: 0130257869

Modern Compiler Design
ISBN#: 0471976970


The first book listed offers a fantastic introduction to writing compilers and interpreters; easily the best compiler book I've ever come across. The second book is more advanced and a bit harder to read but very informative as well.




[edited by - Digitalfiend on March 4, 2003 11:48:56 AM]

Share this post


Link to post
Share on other sites
Also, C# is a perfectly fine language for writing compilers. It might not be the best choice (I''d opt for C++) but contrary to what some believe, C# has very strong text manipulation functionality (a lot of which comes from the .NET Framework.)

Share this post


Link to post
Share on other sites
I''ve already written 2 compiler (Pascal and C), all based in the Aho`s Dragoon Book. Its probably one of the best if not the best around. You will need to know something about code format of the operating System you will use. I recomend compiling elf code (BSD and Linux, since is the most well documented).

Also I dont think c# is a good chice. A compiler is not the kind of stuff that PURE OOP shines most...(not that does not work... it works.. but does not improve too much) and memory management is sometimes critical (too much to leave it to garbage colectors). C++ is a good choice for data structures and fast mem control. Lisp would be also a good choice since it works pretty much like the sintatic analiser does.. so a sintatic driven compiler wold be very fast to build.

To the ones that not beleive in my statements about speed .. my classmates made same compiler in Java, Smaltalk, C#. I and some colegues made ours in C and C++. The C and C++ were 7 to 12 times faster than the fastest of others.

Share this post


Link to post
Share on other sites
quote:
Original post by dede
Sorry, sadly I lost my favorite book, all I can tell you is that it goes from parsing to machine code on a pretend langauge "c-".



If the book you''re talking about used Tiny as well as C-- as the example languages then I think you mean "Compiler Construction: Principles And Practice", by Kenneth C. Louden. It was the course book for my compiler course, and is very good.

HTH,
Andrew

Share this post


Link to post
Share on other sites
Thanks for the replies. I managed to get a hold of the ''dragon'' book from my library, and started reading it.

My reasons for using C#:
1. Experience, I want to improve in it. I feel I know the language quite well, however I have not used it much.
2. Fast development time, although this might not apply as much for low level projects such as a compiler.
3. I dont know perl at all, while im sure it would be great for my needs.
4. OO seems to be quite useful in compiler writing from some research I done, the portable.net is written in C# afaik. (maybe its some other C# port, not portable.net, cant remember clearly).

In my library, there are about 20 books on compiler design/development, so I will look through them. I have plenty of time (I think/hope) so I might just be able to get something done. It is for a language I designed.

Share this post


Link to post
Share on other sites
quote:
Original post by Oluseyi
Not that you can''t do it in C# or that C# is a poor choice. I''m just wondering if you''re constrained by, say, the languages you know as opposed to the best tool for the job...

The only thing Perl is the best choice for is obfuscation.
.NET has a set of very good regular expression classes, offering the syntax of Perl 5 regexes without the atrocious ugliness of Perl.



"If there is a God, he is a malign thug."
-- Mark Twain

Share this post


Link to post
Share on other sites

Im currently following a compiler course:
http://www.it-c.dk/courses/PFOO/F2003/index.html

The teacher makes some very good slides(under Lecture plan) and has lots of example code in SML.
Btw, using a functional language(like SML(which i personally like), ,LISP or Perl etc) is a really good idea when writing a compiler. The development time will be shortened A LOT compared to using an imperative lanuage!( just thought id mention it, because you have development time as one of your reasons to use C#).
(if you doubt me take a look at some of those slides and see how small the abstract syntax datatype and the interpreter("fun eval
") is)

Share this post


Link to post
Share on other sites
Wow, thanks everyone.

The way I am planning to do it now, is that my compiler will compile to assembly. I guess that initially (and possibly permanently) I will use a third party assembler to then go from assembly to machine code (.exe). To implement the various built-in language functions like printing to screen, reading input etc I will create dll files in C that contain the functions for my standard commands, so basically when such a command is encountered by my compiler, in the generated ASM, it will call the associated function from the libraries (dll''s) that I wrote in C.

Can anyone recomend an assembler I should use that would be appropriate for me? I know MASM has a lot of macros and other functions, however because the aim of this project is to learn, and it is a piece of coursework, I would ideally aim to compile to low level ASM, rather than the macro enhanced high level one.

At the moment I just need to assemble console exe''s that work under windows, and then eventually actual windows exe''s.

Again, help is much appreciated.

Share this post


Link to post
Share on other sites
quote:
Original post by FearedPixel
Can anyone recomend an assembler I should use that would be appropriate for me?


Take a look at nasm.
quote:

At the moment I just need to assemble console exe''s that work under windows, and then eventually actual windows exe''s.


A "console exe" IS an actual Windows executable.



"If there is a God, he is a malign thug."
-- Mark Twain

Share this post


Link to post
Share on other sites