• entries
  • comments
  • views

Epoch, capsules, and separate compilation

Sign in to follow this  


The recent project I've undertaken to optimize the Epoch compiler has left me with plenty of open-ended questions. On the one hand, it's important to restore functionality as soon as possible and get the (hopefully faster!) compiler design in place. The flip side, though, is that rewriting the compiler gives me a golden opportunity to make some forward-looking decisions about how to make it easier to implement certain features later on.

One of the big ones is separate compilation. Once a program grows past a certain size, the amount of it that changes between any two recompiles is going to be relatively small. Therefore, the theory goes, it is wasteful to compile everything from scratch, because at least hopefully most of the results will come out identical.

Separate compilation addresses this by dividing a program up into some kind of units, which are more or less independently compiled and then linked into the final output program. Some languages are fairly free-form, like C++ and its header/implementation idiom; others are a bit more strict, such as Java's a-class-is-a-file philosophy. Many languages connect files using simple textual inclusion; if you want to reference stuff in file A from file B, you denote the common bits in a shared file C, and then basically automatically stuff the contents of C into both A and B. This is the canonical method of spreading out implementation in C and C++, for instance.

Other languages take a bit more of a robust approach, such as C#. You can refer to stuff anywhere in the program (subject to access and namespace rules of course) but there's no need for "header files" or any other shared cruft. This makes life a lot easier and eliminates the duplication of all the shared declarations necessary in C and C++. D's modules are a similar mechanism.

Epoch will take this one step further. Instead of "modules" or "classes" or any other coarse-grained unit of code, the separate compilation mechanism will literally work on a per-function level. If you change one line of code in one function in a 2 million line of code project, precisely one function will get recompiled. (Of course, compiler optimization may still examine a large part of the program beyond that, but in principle you don't mess with re-parsing or doing semantic validation or whatnot on the rest of the code. Only if the function signature changes do you need to examine anything outside the function itself.)

Making this work will rely heavily on the Era IDE; you will still be able to write .epoch files in any text editor and get file-level separate compilation, but the real power comes when you embrace the IDE's mechanism of providing separate compilation. Internally, each function gets written to a separate file, identified by a GUID so that the file name remains stable even if the function name changes. When compiling the program, only the GUID-tagged files with changes are actually examined. A dependency graph is maintained which tracks how functions are used, so that if a function signature changes, minimal recompilation is possible - that is, only the functions who call the changed function are recompiled.

A project is then simply a metadata file which lists all of the GUIDs included in the program, optionally organized into namespaces for convenience. The IDE itself can group functions together into "capsules" as the programmer wishes; there is no need to treat each function as an isolated unit from the perspective of writing and reading code. The separation of functions into atomic units is done entirely transparently for optimal compilation speed.

There are other benefits to this approach. For example, consider version control. No longer is it necessary to track changes by file; you can optionally now view the history of a single function, for instance. Since functions are not physically tied to a larger file, moving a function between capsules or namespaces or other organizational units will not obliterate the function's change history, as is the case in contemporary revision control systems. This offers the possibility to view higher-level details about a changeset; instead of only seeing what files or modules have been affected, it is trivial to view an exact list of the functions that were modified.

Moreover, because of the dependency graph maintained by the IDE, the full impact of changes can be seen trivially - just select how many degrees of separation you want to view, and you can see all functions affected by a given change, out to any number of parent calls. Dependency graphs also provide free code coverage information and dead code detection.

The bottom line is that by making a tiny change in the physical representation of code, Epoch's tool environment will be able to provide a rich set of analysis features for larger projects. These sorts of tools are strictly possible in existing languages, but deploying them for Epoch will be far easier and more maintainable.

This is precisely the sort of pragmatic revolutionary attitude that Epoch was founded with. We are not content to sit in the well-worn grooves of other languages and their design and implementation decisions; wherever possible, we will willingly depart from tradition if it can offer us substantial gains to productivity and clarity.

Perhaps the best part is that this entire feature set can be entirely opt-in. Don't want to use it? Just configure Era to save your code into flat files, and work just like before. This allows maximum integration with existing workflows, and ensures that adoption of Epoch is never more disruptive than you want it to be.
Sign in to follow this  

1 Comment

Recommended Comments

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now