Jump to content

  • Log In with Google      Sign In   
  • Create Account

The Bag of Holding

All this pain over something so small...

Posted by , 08 January 2014 - - - - - - · 670 views
Epoch, language design
My mid-term goal for the Epoch language project remains the same: I want to get Era back up to speed and start developing a better IDE now that the compiler is more or less solid.

However, the short-term implication of this is that I need support for GUI apps in the compiler, and that means resources. The last couple of days have been eaten up by trying to get basic resource compilation support working so that I can do things like embed dialogs, menus, and icons in .EXE files.

This is more of a pain in the ass than it sounds.

Windows (and, apparently, most third-party resource editors) is extremely picky about the format of resources in an .EXE. Misalign one byte, or shuffle a pair of offsets, or write things in the wrong order and it will refuse to acknowledge that your resources exist.

(Funny story: I spent about an hour and a half just now trying to figure out why I could see embedded icons in my resource editing apps, but Explorer insisted that the .EXE contained no icons. Turns out I had two things in the "wrong" order in the resource table. Swapping them around caused everything to start working as expected, even though the data was totally sensible with the first ordering.)

I think I've downloaded every PE dissection utility ever written; some are useful, some are not, and many are useful for one specific thing and useless for everything else. At one point I almost broke down and just wrote my own so I could get all the features I wanted under the same roof, but that would have served only as an even bigger distraction, so I resisted the urge.

To give you an idea of how frustratingly nitpicky this stuff is, consider the fact that I already have a working set of code for doing resource compilation and embedding, written in C++. I wasn't hacking this out from scratch - this was just attempting to port existing code into Epoch. Despite my best attempts at faithfully reproducing the C++ compiler's behavior in the Epoch compiler, it took quite a few false starts to reach a point where the damned thing works.

At long last, I have basic resource compiling done... for icons. The next step is to add all the other resources that Era needs: accelerators, menus, manifests, and possibly also dialogs. Thankfully most of the finicky junk is taken care of, and adding new resource types is primarily a matter of making sure the individual resources' data gets written correctly.

On a totally unrelated note, I've done some thinking, and I'm pretty sure I know how I want to approach cleaning up the Epoch type system. Rather than making everything value-semantics by default, with opt-in for reference semantics, I'm going to go the opposite direction. Everything has reference semantics by default unless you specifically opt-out.

Opting out has two basic use cases: flattening structures to remove indirections, and allocating data on the stack instead of the free-store. Structure flattening is all about locality of reference, and stack allocation is all about speed. I'm pondering how to have these two situations interact with garbage collection, but I need to figure out how to opt out of garbage collection in a general way anyways, so they might end up getting intertwined a bit.

For now, though, I think it is time to relax.

Garbage! Garbage Everywhere!

Posted by , 04 January 2014 - - - - - - · 688 views
Epoch, language design
I've been slowly narrowing down my garbage collection issue, and I think I've landed on a reasonable hypothesis.

Background: I'm working on a mark/sweep-based garbage collector for the Epoch language. The full details are far too intricate for a single journal entry, but the basic theory is this: start with all the variables that are on the stack (aka "roots"), follow all the references they have, and keep following references until you stop discovering new objects. Anything you've found is still potentially in-use by the program. Anything not found can be deleted, because there's no way the program can get access to it.

First, I turned off all flagging of roots except for local variables, to eliminate complications from function parameters and calling convention differences. This still crashed. Next, I turned off actually deleting free objects, to ensure that the heap traversal wasn't interfering with the process. This also crashed. Finally, I took a close look at how roots are checked by the GC implementation itself, and noticed an interesting pattern.

In simple test cases, the root checker always works. It requires some particular degree of complexity in the code to actually trigger the crash. So I set out to look for how the generated machine code was interacting with the stack, in the hopes of finding some quirk that explained the mistakes in the root checker.

Eventually, after probing around for a while, I notice a telltale call in one of the crashing functions: an invocation of the runtime function __chkstk. This function is used to verify the integrity of the stack after a dynamic allocation occurs in stack space - i.e., it's a safety trap run after an alloca call.

alloca should never be called by Epoch programs; for one thing, it ruins the ability of the stack crawler to correctly locate stack roots. But it's also a performance liability. It's much better to allocate all required stack space for a function once, up-front. The fact that this is happening means that my generated LLVM bitcode is asking for extra stack space after a function has entered its main body!

Sure enough, a quick search of the offending function's bitcode shows the allocation plain as day. I'm somewhat relieved that it's something this simple, but also a little miffed that it took so darn long to figure out.

I audit the JIT code for places where this might be tripping, and find two candidates. Now I just need to figure out how to eliminate those extra stack allocations... hmm...

Sure enough, after a little rearrangement, eliminating the extra allocation causes the garbage collector to start behaving again.

Now to slowly reintroduce features until it's back up to 100% operation... First is LLVM optimization passes, which I turned off to help pinpoint the faulty code in the first place. These come back online with no problems.

Next it's time to actually free memory marked as unreachable by the garbage collector. This passes with flying colors!

At this point the GC is fully back in action.

It is, sadly, slower than dog snot, so I'll have to find a way to improve the string garbage collection mechanism in particular to get it to stop being so painfully bad. It also appears to be too aggressive in its collection, as the compiler, when attempting to self-host, eventually reports a string has gone missing unexpectedly.

I expect that this is due to disabling the marking of certain stack roots earlier, so I turn that back on (oops). Instead of waiting for another really slow compilation pass, I decide to go ahead and optimize the string collector a bit more.

Along the way I find another oversight that will help immensely with performance: the runtime was triggering a garbage collection pass after every single allocation of any kind! Dialing this back to its old value (every 1000th allocation) should improve the throughput a bit.

After all that, the compiler successfully self-hosts again, with the garbage collector fully running! Woohoo!

It takes about 10 seconds to fully build the compiler now. I don't have the heart to actually time it, since I was sub-second without garbage collection so very recently. However, that's still a reasonable accomplishment, and performance is something I can always come back to and work on any old time.

With everything on the compiler and runtime framework side operating again, it's time to get Era back in shape. This will require modifying the compiler to be able to emit Windows resources such as dialogs, menus, and icons; not terribly difficult, but time consuming, and somewhat boring. I think I'll leave that for some other time.

It'll be really nice to quit using Notepad for code editing. I'm already looking forward to making Era more powerful and feature-rich so I can spend more time in that environment.

After all these years, Epoch is finally a language I feel like I can write production-quality software in, and I definitely enjoy writing Epoch code a lot more than, say, C++. A little polish here and there to work out some of the less expressive kinks in the language, some more features, and a good set of tools... and I could really, really love this thing.

Which, hopefully, means someone else will, too!

Cleaning Up Garbage

Posted by , 01 January 2014 - - - - - - · 771 views
Earlier today I decided to go ahead and turn the garbage collector back on, and see just how bad things are.

On the plus side, the compiler still self-hosts in only a few seconds, so it's not nearly as horrid as it could be.

On the down side, there's a persistent crash deep in the garbage collector that seems to be related to getting invalid stack information from LLVM. This sort of thing has been a plague in the past, and I've lost a lot of hair trying to figure out similar bugs before.

So today's adventure is going to be debugging-heavy!

The first thing I need is a repro. Thankfully, self-hosting the compiler faithfully crashes in the same spot every time. Initially, I run the compiler under the debugger with a Release runtime, hoping it will be fast enough to not drive me insane.

Unfortunately, there's a nasty gotcha to running even Release-built programs under the Visual Studio debugger: all memory allocations go through a slow "debug" allocator. This slows the debugged runtime down to a crawl, and after ten minutes of waiting for the compiler to parse a single file, I said screw it and looked up a way to kill that behavior.

(For the record, set the environment variable _NO_DEBUG_HEAP to 1, reboot, and VS will quit doing this.)

Sadly, even with a fast runtime to test against, it's a Release build, which means that there's far too much optimization going on to easily unravel why this crash is occurring. So back to Debug builds we go!

I get a debug build running soon enough, and after waiting the requisite ten years for LLVM to start up in Debug mode, I quickly discover a new problem: stack overflow!

This is an interesting side effect of debug builds. They don't inline functions, so common Standard Library containers end up using an order of magnitude more stack space for simple operations than they do in optimized builds. Epoch, being very happy to implement lots of things using recursion, finds stack space to be at a premium. Part of the complicating factor here is that the garbage collector itself is recursive, so it too wants a lot of stack space.

So it looks like my next task is to rewrite the recursive garbage collector to be iterative instead. Hurray. Thankfully, it's not terribly hard, and only takes a few minutes. Now to run this debug build again...

After what feels like an eternity or three, the debugger halts - access violation! We're making progress!

A careful examination of the crashed process reveals that the cause of the bug is probably a recent change to the way pattern matching and type decomposition are implemented; there are now locations in the code where stack roots are not initialized to zero, meaning that the garbage collector will happily walk into garbage random memory addresses trying to discover reachable objects.

In order to confirm this theory, I need to dump out the LLVM IR for the compiled program - which weighs in at around 6MB.

Analyzing the LLVM IR seems to corroborate my suspicions. There are indeed edge cases where pattern-matched functions could wind up passing bogus stack roots to the garbage collector. There are two options for fixing this: either figure out a way to eliminate the edge case at code-generation time, or just remove the hack that caused them to be created in the first place.

After waffling for a few minutes, I choose to remove the hack. It'll cost a bit of performance, but stability is more important.

Of course, nothing can ever be that easy. Even with the optimization hack removed, the crash occurs. There must be something deeper going on here.

It takes some deep diving, but I eventually notice something that might be related to the problem. In LLVM, the meta-instruction for flagging a variable as being a stack root is required to appear at the beginning of the function's code (first basic block, to be strictly accurate). In the dumped IR, there are many locations where this is being violated. I wonder if that's stomping on the ability of the JIT to accurately locate stack roots... in any case, it merits investigation.

Examining my code reveals that the fault doesn't lie in the emission of stack root markers; there must be something in LLVM that rearranges them somehow, although disabling all of the LLVM optimization passes fails to correct the crash, so maybe it was a red herring all along.

I need a simpler repro case, or I'm going to go insane waiting for this slow code to run. Thankfully, getting a working compiler back is just a matter of turning the GC back off for the time being, so I can build a simple test-case program.

Annoyingly, the first test case I build works fine, garbage collection and all.

Eventually, the head-scratching gets old and boring, and I think it's time to take a break from this sucker for a while. I'm not making any progress and I've quit thinking clearly; more just staring blankly at the screen hoping for inspiration.

So yeah... we'll try this again later!

Death to 2013

Posted by , 31 December 2013 - - - - - - · 701 views
This time last year, I was obsessing over putting the finishing touches on the Epoch realtime raytracing demo. The emphasis then was on runtime speed, much like the recent focus has been on compilation speed. I think it's fitting that the work on runtime speed directly contributed to the ability to make the compiler as fast as it's gotten.

Along the way, I've had a lot of fun on this project. I removed the virtual machine from the Epoch runtime, building a full garbage collection engine on top of LLVM in the process. I started - and, at long last, successfully finished - a self-hosted compiler.

There's a lot to look back on this year. I'm not feeling excessively nostalgic, though, so screw all that. Let's talk about where this sucker is heading.

I've been resisting the itch for a long time now, but it's become apparent to me that Epoch has some problems with large-scale code. Particularly, it's really hard to keep track of how state mutates as it propagates through dozens of functions with reference-typed parameters. I'm thinking this is mainly an artifact of the language lacking more powerful composition features, but it seems like a bad thing either way.

So I'm going to sit down and think carefully about the type system and how reference-vs-value semantics will be implemented and how they will affect program design. This may turn into a huge rabbit hole, but I think ultimately the language will be a lot better off for it, and I'd rather take the time to fix fundamental design issues before there's a ton of Epoch code in the wild.

Aside from core type system changes, I really want to get back to working on the green thread/task model I proposed for the language a while ago. I suspect that the combination of fixing the type system and empowering better composition will make the language a lot more powerful and clean. This is also the perfect time to start working on parallelism support again, which is something I've wanted to hack on for years now, but keep getting side-tracked by other, more primary tasks.

I have a compiler written now, which is pretty cool by itself, but that's just the beginning. I want to build a really rich toolset for working on Epoch programs, wrapped in a nice and powerful IDE as well as command line modes. Some more details about the kinds of tools I'm contemplating will probably emerge in the next few months.

One more thing that I really ought to tackle is moving away from 32-bit support. I'd really like Epoch to be a native 64-bit language, but that's going to be a monumental amount of conversion work, so I keep putting it off. Maybe this year... we shall see.

Anyways, that's all later. For now, I just finished a rough implementation of support for separate compilation, so I can finally split the compiler out of its single 380KB file and into properly organized modules. So, huzzah.

Next up, it's time to fix some latent bugs in Era and get the garbage collector back online (finally) so I can start refining the IDE and stop using Notepad to edit my code.

Drink Your Ovaltine, Kiddies

Posted by , 29 December 2013 - - - - - - · 695 views
There is a special trick to optimizing code, one that I usually drag out as a weapon of last resort when all algorithmic stuff is taken care of and it's time to give up or start trying to micro-optimize.

The secret, of course, is to eliminate dynamic memory allocations. This isn't a huge revelation to most programmers, especially not games programmers, but it's such a royal pain that it often gets left on the table unless and until things get truly desperate.

Well, after my 740ms run earlier today, I was well and truly ready to do anything it took to speed things up more.

So I tried something devious: I hooked into the function which allocates memory in the Epoch runtime, and wired it up to print out the name of the function that called it. (If you're curious how this works, it's just assembly hacks - the return address is already on the stack, so we just read it off and pass it to a helper function which knows how to translate that back to a code function name based on debug information.)

With a little command line magic, I piped all of the output of this process into a text file, and got a timeline of every single memory allocation performed by the compiler.

Well, that file is 27MB of text, with over 672,000 allocations. That might explain some of the slowness...

The dominating offender appears to be linked lists of integers. Not terribly surprising, considering how commonly I use integers for things (string handles, type IDs, etc.). Unfortunately, I can't readily see a good way to get rid of them all, so this might be a major sticking point until I can implement better containers.

Another annoyance is that much of the compiler is written to do destructive mutation, i.e. update variables and data structures in-place during execution. This immediately puts a stop to my next idea, which is to cache off pre-allocated objects of commonly used types, and recycle them from a pool.

Ironically, I spend several hours trying to eliminate allocations with various trickery, and consistently fail to gain any significant speed. At this point it looks like I might need to try to attack this again with a fresher mind.

Still Addicted to Speed (no, not that kind)

Posted by , 27 December 2013 - - - - - - · 709 views
Epoch, language design
Over the past few days I've managed to hit what seems to be a local minimum in the compiler's speed: 1.1 seconds. Of course, this is slightly inflated, because the compiler is now substantially larger than it was before. If I do a pure apples-to-apples comparison, the compiler hits about 850 milliseconds, which isn't terrible, but isn't what I want either.

The C++ implementation of Epoch, by way of comparison, used to be able to build the Epoch compiler in about 250 milliseconds. Now, that took a huge amount of careful tuning and optimization, and I haven't invested nearly as much work into the new compiler yet, so I'm not totally abandoning hope. But it's definitely going to be a tough fight to get this thing another 3-4x faster.

Optimizing code at this level is an exercise in extreme patience, coupled with the discipline to be exactingly scientific about every single change, and profile the hell out of everything. It's also tricky because, like quantum mechanics, it is possible to disturb the speed of the program by observing it. Specifically, adding profiling code can cause ordinarily fast functions to become major bottlenecks if they are called frequently and exit relatively soon, and so on.

The profiling mechanism I'm using is crude and may not last much longer. I basically inject a timer into every function's entry and exit points in the JIT engine, and keep running track of how many milliseconds are spent in each function. When the program exits, this is dumped out in sorted order to the console, so I can see where things spend their time.

Analyzing these numbers is tricky because unlike many profilers this tactic gives me inclusive running times. Say we have a function Foo, which does nothing but call Bar, which takes a long time to run. You might expect Foo to show up with a low number and Bar with a high number - but in inclusive profiling, both Foo and Bar will be high-valued. So in order to find real hotspots, it's important to take into account all of the child and grandchild calls (and so on) a function might make, because all of those will contribute to the function's running time.

To supplement the timing data, I've started emitting a call count as well. This is a simple number that tells me how many times a function was entered over the lifetime of the program. Some are totally sane, like ParseCodeBlock with 2118 calls.

Others are ridiculous, like EnumerateOverloadsAndAddParameterTypes with over 3.5 million calls. There aren't 3.5 million statements in the compiler, so why is this getting called so much?!

Turns out I had a nasty problem where built-in functions would first search the entire list of programmer-defined functions looking for overloads. Every single time any overloaded function was called. Oops. Fixing the brain-dead bug dropped the call count to around 750,000. Still not great, but far better than 3.5 million.

The next major offender was the algorithm for trie searching. I use a trie data structure to hold the list of sttrings pooled by the program being compiled; this includes every identifier and string literal in the program, so naturally it gets rather large.

An iterative attempt at traversal actually turned out to be substantially slower, because of the way type decomposition is implemented. (It's complicated and I don't feel like explaining it right now, but if anyone cares, I'll be happy to dig into the details.)

My next improvement was to remove the string-length computation from the inner loop and call it once during the search setup, and cache it thereafter; this netted a tiny speedup, despite the 2-million-odd calls still occurring.

Inspired by my discovery that iterative traversals of some data structures is actually a performance liability, I tried A/B comparisons on the AVL-tree implementation next. Shockingly, a recursive implementation of that tree traversal would up almost twice as fast as the iterative version.

Not one to question good fortunate, I replaced several of the AVL-tree algorithms I'd implemented with recursive editions.

I take a moment to turn off profiling and get a raw speed reading. I'm pleasantly shocked to see the compiler set a new record of 963 milliseconds.

Of course, it only takes a few more minutes of probing through profiler results to drop that to 942.

At this point, the scheduler of the OS starts interfering with accurate measurement. If the compiler is context-switched out once too often, the compile time can jump by almost 20 milliseconds. It becomes painful to detect real speedups, since it takes dozens of runs to build an average that is trustworthy.

912 milliseconds stands as the current record.

I found a couple more places where I could take advantage of the binary search tree structure; this dropped things down to a blazing 860 milliseconds. Even at this speed, I'm craving for it to run faster. Until I'm in the ballpark of the C++ compiler, I will not consider my job complete.

Momentary panic... the compiler has stopped passing validation tests and is miscompiling code. I have to take a few minutes to revert some changes and see what caused the breakage. Eventually I pinpoint the bug and clean it up, but at the cost of regressing to 883 milliseconds.

At least, I thought I'd killed the bug... it seems to have returned with a vengeance! This turns out to be a nasty bit of fiddly semantics in the language itself. When passing a parameter of the "nothing" type, the compiler formerly assumed that the parameter would otherwise be passed by value (if it wasn't nothing). However, this is usually false, since most parameters are passed by reference. It's a dumb assumption and I'll have to revisit it and clean up the specification someday. On the plus side, I'm back to 870 milliseconds, which is nice.

It's time to go back to profiling. My first couple of attempts at locating and fixing hotspots go foul; I'm regressing instead of getting faster, which is both demoralizing and increasingly easy. When things are this close to the wire, oftentimes the cost of setting up complex data structures dwarfs the benefits they have over simpler constructs. Even though I'm using a lot of O(n) logic via linked lists, it's faster to traverse a set of simple nodes than to, say, try and maintain balance in an AVL tree.

Slowly it becomes clear that profiling is not going to glean much more speed from this thing, at least not without finding a different method of profiling. The instrumentation dramatically raises the cost of function calls and screws with inlining, meaning that the running code is not only much slower, it's slower in very different places.

None of my last several attempts at optimizing have actually netted any speed gain, primarily due to this skewing of the hotspot information. I'm going to need to devise a better way to time this code.

Out of curiosity, I do a quick comparison. I reach back through time, via the magic of version control, and retrieve the first copy of the compiler that successfully self-hosted. I then feed that through the current compiler - and it runs in 689 milliseconds.

Between the first self-hosting and now, the compiler source code has grown by 40KB. Obviously there's a lot of complexity represented there, because that 40KB translates into an extra 200 milliseconds of compile time. (For comparison, the entire compiler is about 375KB.)

Interestingly, the biggest difference in speed between compiling the old compiler and the new one is in semantic analysis - the phase of the compiler that makes sure all the data types are correct, expands templates, and so on. Parsing is barely a blip of difference, and code generation accounts for a small but appreciable chunk of the remainder. So semantic analysis seems to carry some bad behavior that slows it down disproportionately. I shall have to dig into this deeper when my brain is less fried.

Part of my obsession with speed is that I still haven't turned the garbage collector back on, and frankly I'm worried about what will happen when I do. I want the baseline to be as fast as possible so that GC doesn't make things painfully slow.

In any case, I'm too mentally tired to keep hacking on this for today.

Return to Fast-Land

Posted by , 22 December 2013 - - - - - - · 734 views
Epoch, language design
A few days ago, the Epoch compiler could self-host in about 60 seconds.

My last run of the self-hosting process clocked in at 6.59 seconds - nearly ten times faster than when I started out. That's not bad for a couple afternoons worth of work.

As I suspected, there was a lot of lazy nonsense in the compiler that led to the slowness. The only data structure I implemented was a singly linked list, so lots of things were O(n) lookups or O(n^2) or O(n!) depending on how stupid the surrounding algorithm was.

I've cleaned up the most egregious time-wasters, which accounted for a major portion of the speedup. In addition to just making things less dumb, I've been adding hints and other shortcuts to minimize the number of list traversals being done across the board. There's a lot of code that does bad things, like iterate across the list of all functions looking for the function's return type, and then immediately forgetting where that function lived in the list and looking up something else about it, traversing the whole list from the beginning.

A little more effort shaves off some time and leaves the high watermark of 6.52 seconds. When 0.07 seconds is enough to get me excited, it's probably time to start looking for other ways to optimize. I have a long way to go to reach my target of sub-second compiles.

One trick I've pulled out a couple of times is rigging up the JIT system to do a kind of crude instrumentation profiling; with a simple #define I can turn on a mode where all JIT code tracks how long it runs for. This greatly inflates execution times because the overhead is nontrivial, but the data is still mostly useful - and probably will continue to be, up until the point where I'm pushing under a second.

There's a serious danger to trying to make a compiler faster, and that is the dreaded miscompile. Whenever a compiler emits something that is just plain wrong, it's a scary thing; and when relying on a compiler to compile itself, it's easy to accidentally wind up with a chain of bad compilers that have slowly worsening bugs embedded in them.

I actually wasted a couple of hours earlier chasing miscompiles that I introduced during optimization attempts. So there's a fair bit of lost time that could have been spent making things faster, but oh well.

6.24 seconds. It's now 2:45AM and I'm starting to get fuzzy. The type checker is now down to less than 1.7 seconds of runtime, which is well past my goal of 2 seconds for the day.

The clock strikes 3:20AM and I've gotten to 6.18 seconds. A few more tweaks and I hit 6.14.

Twenty more minutes of hackery only nets me 6.13 seconds. One one-hundredth of a second is nice, but not nearly enough. It's time to break out the profiler and study the results closely.

Optimization is often an exercise in judicious caching; the compiler is no exception. Carefully storing off commonly-recomputed data gets me all the way down to 5.9 seconds. A little more and I'm clocking 5.68 seconds.

Another important optimization trick is knowing how to balance long chains of conditionals. If the most common case is buried in an "else" all the way at the end, the code has to run through all the less common cases to get to it. Rearranging a few conditionals so that the most common options are first gets me down to 5.59 seconds.

The lexer is particularly painful at the moment, so I take some time to clean up some of its performance sins. 5.27 seconds. Cleaning up similar sins in the parser gets things down to 5.12 seconds.

It takes until 5:30AM to reach 5.02 seconds. Progress is painfully slow, and I'm starting to get a little discouraged. I had hoped that there was enough low-hanging fruit to get further than 5 seconds, but it's starting to look like I'm going to have to dig real deep to get to my desired target time.

One thing that consistently shows up on the profiler as being nasty is the string table system. The Epoch runtime uses a pool of statically known strings for various purposes; during compilation, we need to do many lookups into this pool. Since my only data structure is a linked list... things are painful.

It turns out that converting the search function from recursive walking of the linked list to iterative traversal gains a decent chunk of speed; 4.89 seconds just as the 6:00AM hour rolls around.

I spend the next half hour or so implementing a basic prefix trie data structure. This will be used for lookups into the string table. It's a solid win over even the iterative search function - landing at 3.52 seconds. Throw in another half hour worth of minor fiddling, and we're at 3.46 seconds.

Another cursory skim-through of the profiler results shows that we're wasting a lot of time looking up metadata for variables; this can be trimmed back a lot to search a smaller subset of the data, netting a speedup that brings us to 3.42 seconds.

I found some easy improvements to my trie implementation; minimizing the amount of string processing going on helps attain a compile time of 2.95 seconds. The clock reads 7:15AM and I'm well past the point of being overly tired.

I will threaten to hang up my hat for the night (morning?) and post this entry. Don't be shocked if I wander back in soon and post a comment with another speed record.

What's Next for Epoch?

Posted by , 19 December 2013 - - - - - - · 603 views
Epoch, language design
As I wrote up in the previous several entries, the Epoch programming language recently achieved self-hosting. I've had a few people ask what happens next, so I figured I'd write up my plans for Epoch's immediate future.

First and foremost, I need to fix the garbage collector. It's currently stupidly slow, so much so that I turned it off entirely in order to run the self-hosting tests. Fixing this won't be terribly hard, but it may be difficult to hit the kind of speeds that I really want out of it without doing a lot of heavy lifting. In particular, the GC is currently backed by operator new() in the C++ implementation of the runtime, meaning that I don't get fast allocation, and I can't do pointer relocation. Pointer relocation is a major challenge and might be out of scope of what I want to do with the GC in the short term, so I'll probably settle for just optimizing things as much as I can.

Once the GC is in better shape, the compiler itself needs a lot of love. The code is disgustingly unorganized, and the basic algorithms and data structures are utter rubbish (although it is kind of fun to say I wrote an entire compiler using nothing but linked lists). It currently takes about 60 seconds to compile itself, which is just... no. The C++ implementation of the compiler, for comparison, can compile the Epoch compiler in about 600 milliseconds. So I need a hundredfold speedup in the compiler, which should be doable considering I've done 1000x in the past.

A tangential compiler feature that I desperately need is separate compilation, i.e. code in multiple files. The compiler will be a lot cleaner with that support in place, and it'll enable a lot of other stuff I want to do.

After the compiler is at a point where it hauls ass, it'll be time to start working on the IDE. I have some concepts for Era that I'd really like to prototype, but they require a ton of iteration so having a fast compiler will be important. I also want to start working on a generic UI framework so that I'm not writing a bunch of Win32 message processing garbage all over the place.

And all that should keep me busy for quite a while. Somewhere in there I'll package up Release 15, but I'm not sold on shipping a release just yet. I'd really rather improve some of the core language features before shipping another release, and that will probably wait until during the IDE work if not after.

So there you have it. As always, I will probably change my mind and wander around idea-land after about a week, so don't take this as a hard commitment :-P

Self-hosting the Epoch Compiler: Day Six

Posted by , 15 December 2013 - - - - - - · 901 views
Epoch, language design
Tackling problems with a fresh mind makes a world of difference.

The first thing I needed to solve this morning was a miscompile involving constructors. Deeper investigation showed that sum-typed members were to blame. Thankfully, this bug had an easy repro, so I built out a test case and set out to fix it.

A couple of false starts later, things were looking solid.

At 11:51AM PST, the self-hosted compiler passed its first test in the compiler test suite.

It's now time to run the full battery of tests on the shiny new compiler, and see if it passes enough of them to warrant attempting a second-order self-host.

At 12:02PM PST, the self-hosted compiler finished its first complete run of the test suite. There are a few failures, mostly dealing with pattern matching, which has always been the most fragile bit of the language implementation. Some of the failures I'm going to ignore because none of the compiler implementations can pass those tests yet, so it isn't really concerning that the self-hosted one can't either.

I'm not quite ready to proclaim self-hosting a success. The test failures are worrisome, and I haven't done a second-order self-host test yet (where I feed the compiler into the self-hosted version of itself... is this getting confusing yet?).

However, this is a huge milestone. To make sure the implications are clear, this is what happened:
  • I wrote a compiler, in C++, which compiles the Epoch language
  • I wrote a compiler, in Epoch, which does the same thing, using the C++ version to boot-strap it
  • The Epoch implementation of the compiler can compile itself
  • This "self-hosted" version of the compiler is almost completely accurate and stable
I still have just over two weeks of padding time before my self-imposed deadline for finishing this project. That should hopefully be plenty of room to solve the last few bugs and get this thing completely done.

For now, though, it is time to celebrate!

Self-hosting the Epoch Compiler: Day Five

Posted by , 14 December 2013 - - - - - - · 528 views
Epoch, language design
After a nice night of rest, it's time to come back in force and really hammer on this compiler.

The first bug I decided to tackle involved incorrect type annotations when passing sum-typed variables to a pattern matched function (say that ten times fast). The fix was to look more carefully at the original compiler's logic and try to replicate it more exactly in the Epoch implementation.

That turned out to be trickier than I first thought; as has often happened with this phase of work, I fixed some cases but made others worse. Thankfully, I did manage to get a compact repro case for this particular bug, so it shouldn't be as agonizing to fix as some of the others.

One frustrating discovery is that some of my earliest "fixes" were actually masking other problems, and complicating still more bugs. So a lot of the tail-chasing I've done in the past several hours of work has been wasted time. Of course, after finding that out, I'm back to trying to fix the original bugs correctly... which is proving challenging.

I decided to punt on this one by just rewriting the code to avoid tripping the miscompile. It's kind of disappointing to do that, but at least this way I can make progress on other bugs.

And then, it happened.

At 1:04PM PST, the Epoch compiler emitted a binary of itself that successfully executed, and promptly complained that I hadn't passed it any command line arguments (which is true). The next step is to run the compiler test suite on it...

Sadly, something is still terribly wrong. The self-hosted binary is not parsing any of the test suite programs correctly.

My first hunch is that it's related to a difference in the way the two compilers handle structure value semantics. The C++ compiler emits code that never copies objects by value, which is technically broken. The Epoch compiler, on the other hand, emits code that copies objects when passed by value, which is how it should be - but I suspect that one of my optimization hacks relies on the broken behavior!

That indeed appears to be the case, as screwing up the Epoch compiler's behavior causes the self-hosted compiler to start parsing correctly. However, now there seems to be an issue with pattern matching, because a few panic-asserts are going off when trying to parse even trivial programs. Sad.

At this point, it's time for a break.

After coming back, I smacked my head against the wall for several hours, chasing a couple of the same bugs over and over again with minimal luck. Eventually I realized that one of the test suite programs I was using to verify the compiler had a bug of its own! It wasn't failing due to miscompiles, but rather because the test itself was faulty. This had cost me a lot of running around in circles over the course of the afternoon.

My head is starting to hurt and I need food. I've already burned almost an entire day on this, so I'm not sure if I'll be back later or not.

... but of course I'm an obsessive bastard, and couldn't leave it alone, so I'm back again hacking away on the problems. It looks like one of the last remaining bugs involves pattern matching; it seems that the compiler is only emitting some branches of pattern matchers and not all of them, resulting in failed pattern match assertions when the compiler runs on any nontrivial code.

That turned out to be fairly straight-forward to solve; the next problem to crop up is a bogus pointer dereference somewhere deep in one of the automatically generated constructor functions. Joy.

After poking at it for a little while, I decided that my brain is too mushy and I'm liable to make the problem worse instead of better.

So it's time for some rest.

October 2016 »

1617181920 21 22