Epoch Self-Hosting Progress
I've been working off and on for a few months now on creating a self-hosting compiler for the Epoch programming language.
In brief, this means that the language itself is robust enough to implement a complete compiler, which in turn is an implementation of the Epoch language. (Presently, Epoch is implemented by a compiler written in C++.)
Progress is steady, but the workload is immense. Instead of starting with a parser and gradually adding language features, I decided to take a somewhat reverse approach to the problem. The first phase was to remove the bytecode emitter from the C++ compiler and replace it with one implemented in Epoch; that was relatively easy and completed a couple of months ago.
The current phase involves replacing the entire code generation infrastructure with Epoch versions of the logic that currently exists in C++. In more classical terms, this is the entire back-end of the compiler: an Abstract Syntax Tree goes in, and binary executable code comes out.
As of tonight, all but 2 of the tests in the compiler test suite pass using the new back-end. There are two minor features left to implement (one test apiece): storing function pointers in structures, and value-based pattern matching. Once these are completed, the compiler will effectively be half Epoch and half C++.
After the back-end is replaced, there are two major phases remaining before the Epoch compiler is fully self-hosted. First and most importantly, I need to replace the type inference logic and the type system validation code. These are easily the most complex pieces of infrastructure in the existing compiler, so I fully anticipate that it will take a hefty amount of time to complete this phase.
Last but not least, once the type system is reimplemented in Epoch, it'll be time to redo the parser and AST generation code. This is kind of unfortunate given the amount of effort I've sunk into making the C++ parser fast, but the flip side is that I've got a lot of optimization tricks up my sleeve and a lot of solid code to iterate on.
The biggest change will be departing from the use of boost::spirit for parsing, which has been a mainstay of the Epoch parsing system since the early days. I'm not entirely sure I'll miss it, given the hideous compilation times and terrible error diagnostics. Hand-rolling a recursive-descent parser will give me much finer control over syntax error reporting, which is literally non-existent in the current compiler.
Right now the compiler and JIT code weigh in at about 37,000 lines of code. The Epoch compiler back-end, by way of contrast, is merely 3800 lines - a tenth of the code for roughly a third of the work. Adding some crucial language features and fixing some niggling bugs will probably serve to slice a decent chunk of that code away as well, meaning the final compiler will likely still be an order of magnitude more compact than the C++ implementation. Considering that the Epoch compiler source includes all the data structures and "standard" operations, that's pretty cool; if I threw the C++ standard library size into the mix, the actual code size would be far larger than the 37KLOC I've written myself.
Overall, the project is taking time, but shaping up nicely. In a few more days I should have all of the tests passing and the compiler generating itself from the back-end, at which point I'll move on to porting over the type system.
I'm mentally targeting the end of the year for reaching the fully self-hosted milestone. The amount of work involved in creating a new implementation of the type system is monumental, especially in a language with no standard library and plenty of weird runtime bugs left to squash. Rebuilding the parser is also a nontrivial amount of work, but shouldn't take more than a few weeks of careful effort to get at least most of the language supported.
And with that, it is time for some much-needed sleep. Stay tuned for more adventures!