Jump to content

  • Log In with Google      Sign In   
  • Create Account

The Bag of Holding

Self-hosting: major milestone on the path has been reached!

Posted by , 14 July 2013 - - - - - - · 747 views
I exulted over this rather vocally on Twitter, but in case you don't follow me (@ApochPiQ), I'll repeat it here:

At 12:15AM this morning, Epoch's compiler successfully generated itself, when provided with a decorated AST from the C++ compiler front-end.

What this means is that I have a fully working layer that translates the decorated Epoch AST into machine code (via transitional forms and LLVM, but that's just technical detail). The compiler was loaded as a plugin to the C++ compiler, compiled once using C++, and then the compiled version was executed on the compiler's source code.

The output is not bit-identical to the generated results of the C++ compiler, but it is functionally identical, which is more important.

The byte-level differences originate from things being reordered a bit during the handoff from the C++ side to the Epoch side (I blame unordered_map) and, curiously enough, from the C++ compiler emitting some vestigial bytecode that does nothing useful.

So the Epoch compiler is already better than the C++ one, in at least one trivial and irrelevant detail :-D

I've already started on porting the type system logic to the Epoch core compiler. After that milestone is reached, more compilation will be done in Epoch than in C++. Right now all the type system analyses and inference is being done by C++ code, but that is quickly changing.

One of the things that worried me the most was implementing a viable version of Shunting Yard for operator precedence reordering. Turns out that only took about an hour to port over and perfect, which is pretty cool.

The next big worrisome item is the process of overload resolution and expression type deduction, which are intimately tied together due to operators having overloads at the language level. (I still haven't decided if I want to expose operator overloading to the end programmer.)

That's a huge chunk of the C++ compiler by itself, and moving all the intricate details to Epoch will be tedious and finicky to say the least. I'm hesitant to make predictions on a timeframe because I have a feeling I'll need to step away from it more than once for extended periods to regain my sanity.

That said, with a couple of hacks and shortcuts in place, I did get two of the most trivial compiler tests to pass this afternoon using the new type system implementation (such as it is currently). So my end-of-year target for finishing all of self-hosting seems reachable.

As a minor recap, to get to full self-hosting, I need to port the type system and then also rewrite the entire parser. Redoing the parser is probably not going to be terribly hard, but it may take some time for the same reasons as above (i.e. I may need a lot of breaks). What'll be the most interesting aspect of it is getting rid of having any kind of codified grammar for the language; the implementation will be 100% ad hoc hand-rolled recursive descent, at least to start out. Writing an Epoch-compatible LR parser generator does not appeal to me right now.

So those are the happenings... I'm also racking up a huge mental checklist of things I want to improve on for the language as a whole, so if nothing else, this is a tremendous opportunity to dogfood my own work and figure out how to make it appealing as a larger ecosystem.

Epoch Self-Hosting Progress

Posted by , 06 July 2013 - - - - - - · 604 views
I've been working off and on for a few months now on creating a self-hosting compiler for the Epoch programming language.

In brief, this means that the language itself is robust enough to implement a complete compiler, which in turn is an implementation of the Epoch language. (Presently, Epoch is implemented by a compiler written in C++.)

Progress is steady, but the workload is immense. Instead of starting with a parser and gradually adding language features, I decided to take a somewhat reverse approach to the problem. The first phase was to remove the bytecode emitter from the C++ compiler and replace it with one implemented in Epoch; that was relatively easy and completed a couple of months ago.

The current phase involves replacing the entire code generation infrastructure with Epoch versions of the logic that currently exists in C++. In more classical terms, this is the entire back-end of the compiler: an Abstract Syntax Tree goes in, and binary executable code comes out.

As of tonight, all but 2 of the tests in the compiler test suite pass using the new back-end. There are two minor features left to implement (one test apiece): storing function pointers in structures, and value-based pattern matching. Once these are completed, the compiler will effectively be half Epoch and half C++.

After the back-end is replaced, there are two major phases remaining before the Epoch compiler is fully self-hosted. First and most importantly, I need to replace the type inference logic and the type system validation code. These are easily the most complex pieces of infrastructure in the existing compiler, so I fully anticipate that it will take a hefty amount of time to complete this phase.

Last but not least, once the type system is reimplemented in Epoch, it'll be time to redo the parser and AST generation code. This is kind of unfortunate given the amount of effort I've sunk into making the C++ parser fast, but the flip side is that I've got a lot of optimization tricks up my sleeve and a lot of solid code to iterate on.

The biggest change will be departing from the use of boost::spirit for parsing, which has been a mainstay of the Epoch parsing system since the early days. I'm not entirely sure I'll miss it, given the hideous compilation times and terrible error diagnostics. Hand-rolling a recursive-descent parser will give me much finer control over syntax error reporting, which is literally non-existent in the current compiler.

Right now the compiler and JIT code weigh in at about 37,000 lines of code. The Epoch compiler back-end, by way of contrast, is merely 3800 lines - a tenth of the code for roughly a third of the work. Adding some crucial language features and fixing some niggling bugs will probably serve to slice a decent chunk of that code away as well, meaning the final compiler will likely still be an order of magnitude more compact than the C++ implementation. Considering that the Epoch compiler source includes all the data structures and "standard" operations, that's pretty cool; if I threw the C++ standard library size into the mix, the actual code size would be far larger than the 37KLOC I've written myself.

Overall, the project is taking time, but shaping up nicely. In a few more days I should have all of the tests passing and the compiler generating itself from the back-end, at which point I'll move on to porting over the type system.

I'm mentally targeting the end of the year for reaching the fully self-hosted milestone. The amount of work involved in creating a new implementation of the type system is monumental, especially in a language with no standard library and plenty of weird runtime bugs left to squash. Rebuilding the parser is also a nontrivial amount of work, but shouldn't take more than a few weeks of careful effort to get at least most of the language supported.

And with that, it is time for some much-needed sleep. Stay tuned for more adventures!

ArenaNet Internships

Posted by , 13 June 2013 - - - - - - · 1,358 views
ArenaNet, internship and 1 more...
Wanting to break into the game industry but lacking vital work experience? Come check out ArenaNet's 2013 internship program.

  • Paid internship program
  • Do real jobs that are vital to company success, not busy-work
  • On-site in Bellevue, WA
  • One year duration

Hit the link to find out more.

Full speed ahead!

Posted by , 05 May 2013 - - - - - - · 370 views
My big project at the moment involves rewriting the Epoch language compiler for what feels like the millionth time.

The good news is, instead of yet another C++ incarnation of the compiler, this time around I'm moving towards a self-hosted model, where the compiler for Epoch is itself an Epoch program.

As I've detailed elsewhere, I decided to take a somewhat unconventional approach to this project. Instead of starting with a parser and building the compiler from the ground up, I'm creating a series of plugins that replace sections of the C++ compiler implementation, starting from the back-end and moving towards the lexer/parser as a final step.

The creation of bytecode streams has been ported for a while; currently, the big bulk of the work centers around turning the compiler's internal representation of a program into a sequence of calls into the bytecode emitter. There are two aspects to this that merit a little more detail. First, in order to do this, it's necessary to actually have an internal representation of programs within the Epoch compiler. This is actually the bulk of the work. Second, once the representation is close enough in features and robustness to the parallel C++ version, it should be possible to turn IR into bytecode using either Epoch or C++.

Actually creating the bytecode sequences is pretty trivial; it's just a matter of traversing the IR tree structures and popping out a few blurbs of bytecode for each node. What really gets complex is porting the complete tree structures to Epoch in the first place. Part of what makes it tricky is that the only data structure I'm actually using in the Epoch compiler port is a singly linked list. This isn't so bad once you get used to it, but it's still a mental shift and it's not the most efficient code in the world.

It may seem a bit weird to have such a constraint on the implementation; fact of the matter is, it would have taken considerable work to implement other data structures, because the way references work in Epoch doesn't easily allow for controlling the lifetime of objects, nor for re-seating references (i.e. making it impossible to do things like actual red/black trees, hashmaps, etc.). I wanted to move to self-hosting first for two reasons. First, this limits the amount of C++ code maintenance I have to do; and second, it makes the self-hosting process a little easier because there's less functionality to reimplement.

Right now, code generation in Epoch passes 33 of 62 compiler tests. Most of the remaining stuff has to do with automatically generated functions such as aggregate constructors, with a smattering of little bits and pieces here and there. I think, unfortunately, I've hit a point where it takes a lot of effort to make each additional test pass - the low-hanging fruit has been thoroughly scavenged.

On the plus side, I basically doubled the number of passing tests over a weekend, so there's that.

Once code generation is done, it'll be time to tackle semantic validation and type inference. Those two processes are deeply tied together in Epoch, and both operate in-place on the compiler's IR. Since code generation basically requires about 85% of the IR to be implemented, extending the data structures should take minimal effort. The bulk of the job will comprise algorithmic work to actually reimplement the validation and inference logic.

As with any software task, estimating the amount of effort it will take to finish the self-hosting port is... difficult, to say the least. The cool thing is that I already have a complete battery of compiler tests to throw at this, so it's pretty straightforward to pinpoint bugs in the new compiler implementation. The much less cool thing is that I don't have any real debugging tools for Epoch programs yet, so the stickier issues generally tend to require a lot of manual print() calls and sifting through logs.

That said, given the pace of development thus far and the mountain of work left in front of me (keep in mind that after semantic validation/type inference I still have to write a complete lexer and parser for the language) I think it's fair to say that I should hit self-hosting completion by the end of the year. I'm trying to pad that time generously so that when I inevitably get tired (or busy), I can leave off for a while without feeling like I'm going to slip a personal deadline.

If you're morbidly curious about what a self-hosted compiler in progress looks like, check out the source (just under 2000 lines as of now).

More self-hosting goodness

Posted by , 22 April 2013 - - - - - - · 607 views
Bytecode generation is done... more or less. I have a feeling there's a few instructions that are still missing but the compiler test suite passes so obviously there's plenty of stuff that does work...

Instead of spamming my journal here with noise about this every couple of days, I'm going to keep a running thought log over on the Epoch wiki.

Code generation from the IR is next, and it's scary.

Self-Hosting Progress

Posted by , 20 April 2013 - - - - - - · 587 views
As I've discussed previously, my goal for Epoch Release 15 is to get the compiler self-hosting. In a nutshell, that means that an Epoch program will be used to compile all other Epoch programs, including itself.

To do this, I'm working backwards from the compiler back-end first to the lexer/parser last. This allows me to retain all of the language's features at all times, without worrying about anything falling through the cracks along the way - which would be a serious concern if I rewrote everything from the front-end out.

Tonight I finished a nice milestone in the process: bytecode generation is now handled (mostly) by an Epoch program instead of a C++ program. This is probably the simplest piece of code to move over to Epoch because all it really amounts to is a bunch of byte sequences streamed into a buffer and then spat back out.

A handful of instructions aren't supported yet, but I'll be working on that over the weekend, and ideally sometime soon I'll have the entire compiler test suite passing with the new bytecode emitter.

Once that's in place, it's time to work on the code generator itself, which traverses the marked-up semantic IR of the language and generates bytecode streams from that. That's a much bigger piece of work and will take a lot longer, but it should be entertaining either way.

Epoch Plans for the Future

Posted by , 17 April 2013 - - - - - - · 1,210 views
Release 14 of the Epoch programming language is now live!

That brings us to the pertinent and slightly bothersome question: what will be worked on for Release 15?

There are a number of features I'm interested in improving and/or implementing, ranging from object lifetime semantics to parallelism functionality. Strictly speaking, any and all of these features could be done in the existing compiler/runtime framework in a relatively straightforward manner.

However, given the recent success of my aggressive push to destroy the Epoch VM, I'm rather in the mood to continue decreasing the amount of C++ code I have to support in the project going forward. The next obvious candidate for removal from the C++ pool is the compiler itself. I want to rewrite the compiler in Epoch and use that as the foundation for future feature work.

In other words, it's finally time to start working on self-hosting Epoch.

I'm pondering which direction to attack this from. On the one hand, I could start by writing a basic parser, parsing simple Epoch files, and then build out the back end and slowly re-add functionality. However, I've done multiple compiler rewrites in this fashion, and every time I do one, I lose fragments of functionality for a long time.

In the interests of avoiding this sort of regression, I'm seriously considering a new tactic. Instead of starting with a new compiler front-end, I'm thinking about starting with the back-end instead. The parser already interops with Epoch nicely (I use it for syntax highlighting in the Era IDE prototype) so I could potentially leave the parsing in C++ for a while and build a new code generation layer in Epoch.

Here's roughly how that would work:
  • Replace the bytecode emitter with an Epoch program
  • Replace the code generation layer with an IR traversal that shells out to an Epoch program which in turn uses the new bytecode emitter
  • Replace the AST->IR conversion with Epoch code
  • Replace the parser->AST generation layer with Epoch code
  • Finally, replace the parser itself
The advantages to this are twofold. First and most importantly, I never lose any existing Epoch features. Second and almost as interesting, I can release at any time once one of these steps is finished. This gives me a perfect opportunity to ensure the language doesn't move backwards in terms of features, while simultaneously moving it forwards toward the eventual goal of self-hosting.

Another perk of this approach is that if I do decide to add language features to help make self-hosting more practical, I can do so during any phase with minimal disruption to whatever code currently exists. I can also progressively delete code from the C++ codebase as I finish various steps instead of having to retain two independent "forks" of development.

All in all, I think this approach makes the most sense, and I'm already excited about the possibilities.

Once Epoch is self-hosting, the next step is to start on development tools. Era will be an interesting project for sure, given that my goal is to more or less invent a new IDE paradigm in the process of creating it. It's tempting to work on a debugger and other facilities prior to self-hosting, but the advantage of waiting is that I can prove that the entire tool-chain is self-contained end to end, rather than a hideous Frankenstein blend of Epoch and C++ components.

In some ways, I feel like self-hosting and creating a decent IDE for Epoch are the last two steps to having a project that is ready for serious prime-time usage. There's an almost infinite amount of refinement and feature work that can go into any language, obviously, but a language that can compile itself and offers robust tools for creating new software is hard to argue with.

It'll be fun to see if the rest of the programming world decides this language is as fun as I think it is :-)

Release 14 imminent...

Posted by , 14 April 2013 - - - - - - · 582 views
So I've been looking over my past notes and decided that I'm pretty happy with the state of affairs over in Epoch land. The garbage collector is running, tuned for reasonable performance, and successfully keeps the native-code realtime raytracer clamped at a decently small degree of RAM usage.

The only thing that really bugs me is that the GC has imposed a serious performance hit on the raytracer - to the tune of halving its old performance. I still need to do a full release build of the runtime to make sure I can't improve on that any, but I suspect the GC root spilling in LLVM is taking a toll on optimization passes. I might need to work on improving the front-end of the compiler to generate more optimal code in the first place so that I don't have to spill so many registers onto the stack all the time.

It'll take some analysis and study of the emitted machine code, but I'm optimistic about being able to reclaim at least some of the performance.

If nothing else, this is a good motivation for introducing a "manually reclaimed" semantic to object allocations in Epoch, so programs that don't want to incur GC overhead can bypass it entirely.

[As I wrote this, I kept alt-tabbing over to Visual Studio and mucking around. I managed to suppress garbage collection invocation for a large percentage of functions, which got me back to 21FPS - compared to 24FPS previously. I'm happy.]

Victory is mine!

Posted by , 14 April 2013 - - - - - - · 624 views
It was an epic fight, but I finally managed to subdue the last few LLVM garbage collection bugs and get a full test suite pass.

I'm kind of tired (it's 5:30AM and I've been running all night) so I'll try to reword that a little more clearly:

Epoch is, as of right now, passing all compilation and runtime tests (all 62 of them) with full, aggressive, and accurate garbage collection, running as 100% native code.

This support includes correct handling of circular references, correct handling of algebraic sum types (aka discriminated unions), correct reclamation of resources besides pointer types (strings and buffers, in Epoch parlance), and even works across multiple-dispatch calls.

I have a feeling there's some edge cases missing from the test suite, but for now, I'm damn happy with the results of the last several days of work. I figured on GC being a nightmare (and don't get me wrong - it was) but overall the experience has been completely worth it.

I still want to ponder submitting an LLVM patch to make all this less hackish, but I also sort of don't really care because my changes are really minimal and doing them "right" would involve a lot of heavy lifting. Maybe I'll tackle it some other time.

For now, I'm going to just sit here and bask in the glow of all 62 tests passing, and maybe drool on myself a little bit as I fall asleep.

January 2017 »

22 232425262728