Jump to content
  • Advertisement
Sign in to follow this  
ApochPiQ

Behold! The compiler which compiles itself(*)

This topic is 2079 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement

Please, please, please tell me you have a script that combines all that code into one file and that you don't edit the 10k monstrosity as a whole!

 

But very cool. What kind of grammar is Epoch? Context-free? I ask because I'm studying compilers right now and am curious in these things. I tried to tell from your parsing code, but it's... too verbose for me :) Actually, shoot, I'm gonna ask a bunch of compiler-related questions:

 

Do you do much static analysis or optimizing? How many passes over the program tree do you do, and how many different intermediate representations do you go through? What kind of code do you generate? x86? LLVM? And are there any other questions I should have asked but didn't?

Share this post


Link to post
Share on other sites

Please, please, please tell me you have a script that combines all that code into one file and that you don't edit the 10k monstrosity as a whole!

 

But very cool. What kind of grammar is Epoch? Context-free? I ask because I'm studying compilers right now and am curious in these things. I tried to tell from your parsing code, but it's... too verbose for me smile.png Actually, shoot, I'm gonna ask a bunch of compiler-related questions:

 

Do you do much static analysis or optimizing? How many passes over the program tree do you do, and how many different intermediate representations do you go through? What kind of code do you generate? x86? LLVM? And are there any other questions I should have asked but didn't?

 

Yes compiling itself is a fine stunt (and shows a modicum of language versatility), but if your compiler turns out inefficient code and its self-compiled runtime IS  used normally --- and is intended in  for use with projects with lots of code (to me 100K actual code lines is not overly large)   then thats a negative.

 

"Compiles to native code for maximum performance"  does NOT mean its efficient anywhere as much a a well developed optimizing compiler will do (for  the compile runtime)

 

Your library generation likewise.

Share this post


Link to post
Share on other sites
I seriously thought about writing a combiner script, but it honestly turns out that having everything in a monolithic file isn't all that bad. 10KLOC is large but not unmanageable, and it's actually easier to remember how to find things in a single file than you might expect. Of course, it helps that the Epoch implementation of the compiler is about 1/4 the size of the C++ implementation, so remembering things is 4 times easier to begin with ;-)


The grammar should be deterministic-context-free. I think. I'm honestly not strong enough in parser theory to prove it for sure, but that feels about right. The parser itself is just a DFA written in classic recursive-descent style, so based on my understanding of the parser/grammar categories, the grammar is DCF. I could be utterly wrong, though.

This is actually only a chunk of the compilation process - a significant chunk, given that it produces bytecode for an abstract machine, but not everything. The remaining code (which is all C++) translates the bytecode emitted from this layer into LLVM bitcode which is then turned into native machine code more or less on the fly. I decided to do it this way because interfacing with LLVM from non-C++ languages is slightly less than a total nightmare, and this is already a big enough project.

Static analysis is a big part of the language. The code above implements a (mostly) full type checker as well as limited type inference support. It also permits function overloading and pattern matching. (In fact, using pattern matching is why the code is so compact compared to the C++ version; certain forms of code become a lot more succinct when you can express them as pattern matches. As the pattern matching support gets richer it'll get even more compact.)

Optimization is basically left entirely to LLVM at this point, mainly because once you hit LLVM bitcode there's very few categories of optimizations that LLVM can't already do. A few high-level things are done by the compiler based on type information but that's about it. There will probably be a lot more type-informed optimization done later as things get richer in the language.

Right now the parser outputs everything in a single IR in the Epoch side of things. This IR is traversed and decorated by the compiler itself, using a rather ad hoc traversal pattern, but in essence every node of the IR is visited and processed exactly once, even though it may strictly be "visited" many times (and then the decorations used as a kind of cache to avoid recomputing type information, say). From there a final traversal of the IR outputs the abstract machine bytecode, which goes through a simple mechanical conversion to LLVM bitcode before being handed off to a suite of dozens of LLVM passes for optimization.


There are probably many other questions worth asking, although I'd be happy to ramble about my baby for a million years, so you're probably better off not asking them all unless you want an earful :-P

Share this post


Link to post
Share on other sites

Yes compiling itself is a fine stunt (and shows a modicum of language versatility), but if your compiler turns out inefficient code and its self-compiled runtime IS  used normally --- and is intended in  for use with projects with lots of code (to me 100K actual code lines is not overly large)   then thats a negative.
 
"Compiles to native code for maximum performance"  does NOT mean its efficient anywhere as much a a well developed optimizing compiler will do (for  the compile runtime)
 
Your library generation likewise.



I'm not entirely sure what most of this means, but it should be pointed out that I'm not writing my own native code generation or optimizations. I'm using LLVM for all that business.

For what it's worth, I wrote a raytracer in Epoch that marginally edges out a comparable C++ implementation for raw performance on my laptop.

Share this post


Link to post
Share on other sites

I seriously thought about writing a combiner script, but it honestly turns out that having everything in a monolithic file isn't all that bad. 10KLOC is large but not unmanageable, and it's actually easier to remember how to find things in a single file than you might expect. Of course, it helps that the Epoch implementation of the compiler is about 1/4 the size of the C++ implementation, so remembering things is 4 times easier to begin with ;-)

How were you doing it in C++? Were you using Flex/Lex and Yacc/Bison at all?

The grammar should be deterministic-context-free. I think. I'm honestly not strong enough in parser theory to prove it for sure, but that feels about right. The parser itself is just a DFA written in classic recursive-descent style, so based on my understanding of the parser/grammar categories, the grammar is DCF. I could be utterly wrong, though.

Cool. Sounds context-free to me. I'm working on an implementation of parsing with derivatives in C++ for my undergraduate thesis, and assuming I actually finish it maybe I can use Epoch's grammar (or a subset of it) as a demo and case study.

All very cool. I'll see if I have more questions that come up during the week.

Share this post


Link to post
Share on other sites
I used boost::spirit, which was a great rapid prototyping tool, but has long since become a liability. A while back I did some performance tuning to get it to suck less (about a 1000x speedup in the parser alone) but it's still pretty slow, clunky, and takes forever to compile.

For comparison, it takes about 20 minutes to do a full rebuild of the project in C++. It takes less than 200 milliseconds to build the entire Epoch implementation of the compiler.

Share this post


Link to post
Share on other sites

It weighs in at just under 10,000 lines right now or I'd post it inline. You can see the evil monstrosity here and gawk in awe at its massive hideousness.

That's pretty damn sweet.

I really need to find the time to complete my little language project. You would not believe how much I regret writing the compiler in a dynamically-typed language...

Share this post


Link to post
Share on other sites

 

Yes compiling itself is a fine stunt (and shows a modicum of language versatility), but if your compiler turns out inefficient code and its self-compiled runtime IS  used normally --- and is intended in  for use with projects with lots of code (to me 100K actual code lines is not overly large)   then thats a negative.
"Compiles to native code for maximum performance"  does NOT mean its efficient anywhere as much a a well developed optimizing compiler will do (for  the compile runtime)
Your library generation likewise.


I'm not entirely sure what most of this means, but it should be pointed out that I'm not writing my own native code generation or optimizations. I'm using LLVM for all that business.
For what it's worth, I wrote a raytracer in Epoch that marginally edges out a comparable C++ implementation for raw performance on my laptop.

 

Ok, so your language is a front-end using a relatively mature backend to do the code generation ....

Share this post


Link to post
Share on other sites

You are a hero. I've long given up self-hosting.

And to be honest, I'm considering giving up context-free grammars as well. I'm not sure they buy me something in the real world.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!