Self-hosting the Epoch Compiler: Day One

Published December 11, 2013
Advertisement
As I've written about here previously, I have a personal goal of self-hosting the Epoch language compiler by the end of 2013. The other night I actually ran the first attempt at passing the compiler source code through itself; the results were underwhelming, to say the least.

My main enemy out of the gate was the garbage collector. I've had a very naive heap traversal algorithm implemented since ages ago, but it turns out that O(n!) algorithms fall down hard when presented with values of n in the millions. Who knew?

Once I fixed up the garbage collector implementation, parsing went from "interminable" to about 40 seconds per invocation of the compiler. This is depressing because the C++ implementation can do a complete parse in about 40 milliseconds. But optimization can come later; for now I want to focus on getting the compiler to just build cleanly.

The next problem I ran into was stack space. Epoch programs like to use recursion a lot, and because of some awkward implementation artifacts, tail call elimination doesn't always work nicely (actually it almost never works). So compiling a 310+ KB source file when all you have to traverse data structures is recursion... yeah, that was painful.

Thankfully it's easy enough to increase the stack space for a process, so I did that, and fired up the compiler yet again. This time it barfed someplace deep in the code, because I forgot to implement support for some operator or other; after a few rounds of this kind of nonsense, I managed to get it to actually start doing heavy-weight semantic analysis of the code. Progress!

Semantic analysis has always been one of the slowest parts of compilation in Epoch, and it wasn't surprising to see the compiler chug for a couple of solid minutes trying to analyze itself. Unfortunately, after a little while, it crapped out with no stack space again.

So I bumped the stack space to 8MB, in hopes that such a gratuitous amount of space would be enough to get things to complete cleanly.

Meanwhile, the compiler is actually emitting a ton of errors - mostly about type checking failures. It seems that there are some rules about the Epoch type system that aren't fully implemented; I'll have to go back and write some more test cases to pin that down.

And of course there's a handful of barfs because of things like hex literals which for some reason I never finished implementing support for... or built-in/standard-library functions that aren't wired into the compiler yet... and miscellaneous junk like that.


The compiler is steadily getting further along in its attempts to analyze itself, but once again crashed out due to a missing operator implementation. I'm trying to fix the little stuff as I go, not just because I'm impatient to see a complete compile, but also because it saves me having to write a massive to-do list of junk to clean up later.

One thing is for sure: this compiler is devilishly slow. It'll take some major work to get it fast, I suspect, but I've been down that road before with the C++ compiler and the results were pretty encouraging, so I think I can do it again.


Most of the errors coming out now are related to type checking failures in template instantiations. I'm willing to bet there's a bug or three in the template implementation, since that was one of the last things to be added and has had the least time to bake. It's also easily the most intricate part of the compiler.


So plenty of work left to do, but I'm confident that a few solid hours of plugging away at each category of bugs will lead to a successful self-host within the next few weeks.
7 likes 3 comments

Comments

NetGnome

It may not be 100% appropriate for what you're personally trying to accomplish, but did you consider implementing Epoch on top of LLVM instead of pure C++? It always sounded like it made language creation a bit simpler (or at least not as painful).

December 11, 2013 10:46 PM
ApochPiQ

Epoch does all of its native code generation through LLVM. The compiler I'm referring to is really just a convoluted front-end for the LLVM back-end. It goes to an intermediate bytecode language that is only used by the JIT system to turn into LLVM bitcode and then machine code.

December 11, 2013 11:00 PM
NetGnome

ahhhhhh, gotcha!

December 11, 2013 11:09 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement