The Making of Epoch, Part 3 - Prototyping
Epoch Making Of
The Making of a Pragmatic Programming Language for the Future
Part 3: Prototyping
The first part of creating any major project is prototyping, or at least, that's how it should be. When first embarking on an undertaking like building a major piece of software, it is essential to scope out the size and shape of the problem, gain an intuitive understanding of what the hard parts will be, and get some basic feeling of how things need to come together and what kind of organization "feels right" for the code.
Apprentice carpenters may make dozens of small things - jewelry boxes, stools, simple tables, and so on - before getting into fancy cabinetry or large structural frameworks. In the same way, Epoch needed some practice work before the big problems could be tackled. I needed to do some rough sketches before painting the masterpiece, as it were.
In large part, Epoch is still in the prototype phase. The compiler is not self-hosting, the VM is primitive at best, and the tools are meager to put it mildly. There remains a vast amount of work to be done before the language is ready for prime time work.
However, this is also something of an advantage. Epoch as a language is still very much just a concept floating in my brain. Much of the syntax is totally undesigned, and many critical features like the contract system simply don't exist at all beyond a vague notion of how I'd like them to work. As such, flexibility is key. The syntax of the language is in flux, and the semantics of many of its notions are far from etched in stone. Keeping things at the prototype level is important in that it allows me to change things that really ought to be changed.
For example, many of the releases of the language have actually regressed in terms of feature support. This is because the core implementation is still undergoing a lot of refinement and improvement. Things like automatic cross-compilation to CUDA have been pulled back out of the language because, beyond the initial proof of concept, they're not ready to be permanent features yet. There are better ways to implement them, and taking advantage of those opportunities takes time.
To illustrate, let's take a look at the (rough) history of how Epoch's implementation has evolved.
Stage One: Test Driven Hackery
The first thing I did in working on Epoch was to build a basic execution model. This consisted of a highly idealized environment where every program operation could be represented by a class. Each operation derived from an abstract base class which offered a simple interface: execution.
Need some arithmetic operators? Derive a class for each one, and override Execute() on it. Implement the guts, and away you go. Want functions? Bind a group of operation objects into a container, then implement Execute() so that it iterates over the container and invokes Execute() in turn on each operation. How about flow control? The process is similar.
Program state was kept in two parts, much like any other language: a stack, on which functions placed their internal variables and ultimately their return values, and a free-store, which was basically just a giant pool of memory with rudimentary handles used to access various blocks. This context was wrapped in two classes, one for each storage type, and along with some other basic data, passed around to each Execute() function.
Epoch began as a set of unit tests for executing some of these simple operations, predominantly arithmetic, and ensuring that the stack or free store contained the right data afterwards. The test harness was crude at best, but effective; within a few days I had a working core of a language.
Stage Two: Parsing
At this point, it was time to add some syntax - actual code you could write that would map to these operation objects. It was possible to build simple programs by manually creating the tree of operations involved, but awkward and clumsy and still all in C++. I began examining my options for parsers.
To make a long and mostly dull story short, I decided against ANTLR at the last possible moment and chose boost::spirit instead. This was inspired entirely by laziness; I didn't want to have to run an external tool every time I tweaked my language grammar, because I planned on doing a lot of tweaking. Turnaround time was key, and that meant keeping everything in one place as much as possible. Spirit offered exactly that: specify your grammar using C++ syntax (heavily abusing operator overloading), and it magically turns into a working parser.
As it turned out, this ended up being a very fortunate choice, but we'll get back to that later.
In the original Spirit implementation, everything was done by tagging bits of the grammar with "semantic actions." These were basically functions or function objects which allowed the parser to invoke code as it recognized and consumed various tokens. It's a simple but effective model for certain scales of grammar; for a complete language like Epoch, it was a formidable mess.
Epoch actually had a brief stint as an interpreted language, believe it or not. I rigged up a basic grammar that would, as it was parsed, semi-execute fragments of code. It was a terrible hack and led down a road that wound up being a bit of a mistake in the long term, but it inspired me to continue, and in that regard served a very important purpose.
Soon, though, interpretation was replaced by proper conversion of the syntax into operation objects. There was no abstract syntax tree; code simply went in the parser, and via semantic actions, the parser constructed operation objects out of thin air as it went along.
This approach was very problematic. It was slow, error prone, involved a lot of boilerplate code, and most of all, it didn't scale well. As the language features got more complex, it became increasingly difficult to extend the semantic action based model. Backtracking and handling syntax errors was a nightmare, riddled with terrible exception handlers and a thoroughly inscrutable execution flow.
Sadly, it worked - which meant that I didn't feel the urge to fix it and do it "right" for quite a while. So the ugly parser structure remained up until Release 11.
Stage Three: Virtual Machinery
At this point, I grew tired of reparsing the programs every time I wanted to test them. It was also painfully difficult to diagnose problems in the compiler, because it only generated a bunch of operation objects in memory; there was no way to look at the objects easily outside of the debugger.
This motivated the next major phase of Epoch's growth, which was the introduction of serialized code. Essentially, raw operation objects could be serialized into string form, which vaguely resembled an assembly language. From there, I built a tool to turn this set of strings into a binary stream; the first draft of Epoch bytecode was born.
It took a while to get the VM to load operation objects from the bytecode, but it eventually happened. That was three years ago this month, as it turns out.
From there it was a short step to embedding the bytecode into an .EXE file and adding a simple bootstrapping stub that would load the VM, parse the bytecode, and then begin executing the program. Release 5 of Epoch was the first prototype suitable for creating and distributing Windows programs - although, without much in the way of interoperation features with the operating system, it was pretty crippled.
Stage Four: Parallelism
The next couple of Epoch releases focused strongly on language features. The compiler and VM really only changed insofar as it was necessary to support new, cool stuff. Release 7 introduced the first rudimentary threading support, and release 8 extended that with GPGPU capabilities. Release 9 took things a bit further but was mainly an excuse to put together a package for that year's GDC so I'd have something to show off. There's really not much fascinating about this process, aside from watching the language grow - which, in retrospect, is a bit like watching paint dry.
Stage Five: Ripping it all apart
At this point, I became thoroughly dissatisfied with the VM. It was slow, cumbersome to extend, and basically impractical in almost every possible way. Release 10 focused on reimplementing most of the language more or less from scratch.
Part of the new hotness in Release 10 was the entity system, partially empowered by the new dynamic parser I had built. The new parser leveraged a certain boost::spirit feature that allows you to manipulate the grammar at runtime. In essence, Epoch libraries could register new keywords and thereby extend the functionality of the language.
All basic flow control was implemented this way, and operation objects went away forever. Epoch was now fully bytecode driven, using a true artificial instruction set running on a true, more classical virtual machine. Release 11 focused on reimplementing many of the formerly operational language features in this new paradigm; it was largely successful, if you exclude all the interesting bits like parallel processing.
The dynamic parser was an interesting diversion, and laid much of the groundwork for the structure of the parser in the new Release 12 rewrite. The R12 compiler is far more interesting, though, so I'll give it its own proper treatment in a separate installment of this series.
Stage Six: Ripping it all apart. Again.
After Release 11 shipped, I began to acutely feel the pain of the terrible parser model that had somehow grown roots in the Epoch project. The process of rewriting the compiler for the upcoming Release 12 has been extensively documented here and on the Epoch wiki, so I'll leave off the gory details for now.
I plan on continuing this series for a while. I'd like to delve into three areas in particular:
- How the new compiler works, in detail
- How the VM works, in detail
- What I plan on doing with the language in the future