Jump to content

  • Log In with Google      Sign In   
  • Create Account

The Bag of Holding

Still rambling about self-hosting

Posted by , 18 September 2013 - - - - - - · 718 views
Down to 9 tests in the compiler suite that still need some love before they will pass.

For those not keeping score at home, this means that 58 out of 67 compiler tests are passing in the Epoch implementation of the Epoch compiler. In other words, Epoch is getting very, very close to being able to compile itself.

Fun facts:
  • The compiler in C++, not including standard library headers, operating system headers, and Boost dependencies: about 39,000 lines of code
  • The size of the compiler's Epoch implementation, in its entirely - i.e. every data structure, library function, dependency, and so forth: about 9,000 lines of code
  • Passing compiler tests: 58
  • Failing compiler tests: 9
  • Months since self-hosting began: 5
  • Number of major features remaining: 3
  • Number of programmers working on this project: 1
  • Number of vulgarities in the compiler source code: 0
  • Number of vulgarities uttered while writing compiler source code:
Exception: 64-bit integer overflow. Check that the value you specified is not too large to be stored in 64-bits (!)

Speculation and such

Posted by , 09 September 2013 - - - - - - · 725 views
Just fooling around.

// Define an Epoch task that does some work
task Worker :
    // This specifies a "message signature" which can be
    // used by and instance of this task to receive data
    // and commands from other code.
    Work : integer a, integer b
        // Do some computation
        integer result = a + b

        // Reply to the sender of the message with another
        // message, carrying the fact that we're done, and
        // the result of our computation.
        sender ~> Completed(result)

// Define another task, which is responsible
// for generating work. It also must contain
// the message response signature for taking
// the results back from the worker.
task Requester :
    // This message tells the requester to make up some work
    Request : Worker ref worker
        // Generate some data
        integer a = random()
        integer b = random()

        // Work on it!
        // This syntax directs the Work message to reply to this
        // task when it is done. The responses will be delivered
        // via Completed messages.
        worker ~> Work(a, b) ~> self

    // This message is received when the worker
    // is done doing its thing. It prints stuff
    // so work has some visible effect.
    Completed : integer result
        print(cast(string, result) ; " came from task!")

// Now we write a standard entrypoint procedure.
// We're going to create a worker pool, and then
// fire work at each worker a whole bunch.
entrypoint :
    // First our pool of workers...
    // This syntax allocates an object (task) which is garbage collected as necessary
    new Worker w1
    new Worker w2
    new Worker w3
    new Worker w4

    // Now a wrapper to handle requests...
    new Requester r

    // And now we get busy!
    integer workcount = 0
    while((workcount++) < 1000)
        r ~> Request(w1)    // Asynchronously pass Request message with no reply expected
        r ~> Request(w2)
        r ~> Request(w3)
        r ~> Request(w4)
Voila - 4 core random number summation engine!

Epoch self-hosting progress

Posted by , 25 August 2013 - - - - - - · 474 views
So the Epoch compiler is now written end-to-end in Epoch. There is no C++ left in the bootstrapping process aside from the runtime environment which does garbage collection and such.

Sadly this doesn't mean that we're quite to self-hosting just yet. Only about a third of the compiler test suite is passing, and a lot of the remaining work centers around getting the parser finished and rigging up the code generation to the AST traversal logic.

As you may or may not recall from previous updates, the code generator is already done and capable of generating the full Epoch compiler given a properly formatted AST; so what's remaining here is just glue between parsing and the code generation system to get it to where all the tests pass again and we can try the evil experiment of feeding the complete compiler into itself.

There are a number of parser features left to complete before this can happen:
  • Parenthetical expression support
  • Literal function parameters (used for pattern matching)
  • Higher order functions
  • References
  • Entities (flow control, etc.)
  • Function tagging
  • Sum type definitions
  • Templates
  • Some literal cleanup (floating point support being the biggest one for now)
After that, some general features still need to be wired into the compiler. These are things that need nontrivial logic to exist between the parser and the code generation layer:
  • Overload resolution
  • Type inference for assignments and parenthetical expressions
  • Chained assignments
  • Function pointer members in structures
  • Pattern matching on values
  • Pattern matching on types
  • Template support
  • Some built-in operators and functions
  • Type demotion (e.g. integer -> 16-bit integer where appropriate)
  • Standalone code blocks
  • Batch test framework
  • Executable binary generation
I also have a couple of known bugs to clean up before it's all said and done, but they're rather boring for the most part.

On the one hand, this amounts to a whole lot of work; on the other hand, it's a measurable checklist and I still have until the end of the year to hit my personal deadline for the project.

It's actually kind of a relief to see a finite list of things left to knock out. "Write a self hosting compiler" is kind of a tall order, and to finally be in a place where the end is clearly in sight is a great feeling.

There are also some other reasons for me to be excited about the future of Epoch, but I can't really go into detail just yet. Stay tuned, life's about to get interesting.

Epoch Optimizations and Garbage Collection

Posted by , 10 August 2013 - - - - - - · 447 views
Epoch, language design
Following the Great Garbage Collection Debug Spree of the past few weeks, I've noticed a general trend towards bad performance in the realtime raytracer benchmark. At one point it was down to ~6.5FPS peak performance, which is just unacceptably bad.

Profiling revealed two main causes of this slowdown. One is the garbage collector invocations themselves; Epoch is designed to have a "greedy" collector which trips frequently and collects aggressively. Unfortunately, the cost of walking the stack and heap for live objects is nontrivial, so tripping the GC all the time leads to lower performance throughput.

It's worth noting that in standard apps like Era, the difference isn't even visible; I've yet to see Era hitch noticeably with the fixed garbage collector. But for something that wants to peg a CPU core and go flat-out as fast as possible, triggering collection cycles all the time is a bad idea.

The other major cause of performance woes is register spilling. Without going into painful amounts of detail, one of the requirements of LLVM's garbage collection support is that you can no longer store pointers to GC-able objects in CPU registers. Instead, all objects that might be collected must be stored in memory on the machine stack. Moreover, any time you want to update those values, it has to be done on the stack, so there is overhead in moving from register values back onto the stack every time you touch a variable that might be garbage collected.

This may not seem like much, but it turns out to be pretty painful in a benchmark that relies heavily on computations done on objects. Since every object can potentially be collected by the GC, every object's pointer must live on the stack at all times. This winds up being a substantial perf hit.

On the upside, both problems can be fixed with a single tactical strike. The key observation is that many Epoch functions never do anything that generates garbage. A typical Epoch program is comprised of many small, nearly-trivial functions; and if those functions can't possibly create any garbage, there's no purpose checking the GC when they finish.

As a bonus, if a function never needs to call the GC, and never calls any other functions which need the GC, we can turn off register spilling for that function!

The Shootout: Epoch vs. C++
Tonight I implemented a quick little function tag called [nogc]. This tag tells the JITter two things: first, the function cannot generate garbage, and therefore should not invoke the garbage collector; and secondly, because the GC is never invoked from a [nogc] function, register spilling can be disabled in the body of that function.

The results are duly impressive: the raytracer is back to running at ~26.5FPS on my dev machine - a full 4x speedup over the worst-case performance it was exhibiting earlier today.

But it gets even cooler.

Out of sheer curiosity, I wrote a port of the Epoch raytracer in C++. It's not optimal C++, to be fair, and cuts a few corners to make it function more similarly to the Epoch program. But, line for line, it does basically exactly what the Epoch version does.

The C++ port of the raytracer runs at only 23FPS.


Posted by , 02 August 2013 - - - - - - · 263 views
Turns out my garbage collection woes are over.

I was strongly suspicious of the inliner in my last post, and it turns out that this hunch was (albeit indirectly) completely correct.

The obvious thing to do when facing a bug like this is to compare the code generated; dump out a listing of the version that works, and a listing of the version that doesn't, and run a diff tool to see what's mismatched between the two.

Thankfully, LLVM comes with a handy dump() function that lets us easily see what the IR bitcode looks like for a given program, so it's trivial to arrange this.

Combing through the diff, I noticed that there was indeed some inlining going on - but not of the functions that I suspected were causing problems. Moreover, suppressing inlining on the functions I did think were problematic made no difference!

As I looked closer, I noticed another interesting distinction between the working and broken versions of code: the @llvm.lifetime.start and @llvm.lifetime.end intrinsics.

It took some digging on the googles to figure out what exactly these mean. Semantically, they just define when two variables (supposedly) do not overlap in lifetime, and can theoretically be given the same stack slot. Except if we marked one of those stack slots as containing a GC root... well, I'm sure you can figure out the rest.

The intrinsics are inserted by the optimizers at the IR level but not paid attention to until native code emission phases, so all I needed to do was strip them out of the IR. Thankfully I already have a late-running pass over the optimized code for GC setup purposes, which is actually part of what generates the stack maps in the first place. I modified this pass to obliterate calls to those intrinsics, re-enabled function inlining everywhere I had tried to club it to death previously, and...

Everything works!

Incidentally, there are already bugs filed against the so-called "Stack Coloring" pass which uses these intrinsics. One that actually helped me track this issue down is filed at http://llvm.org/bugs/show_bug.cgi?id=16095

I'm pondering filing a second bug just to note that stack coloring/slot lifetime in the presence of GC roots is a bad idea.

The ongoing mission

Posted by , 02 August 2013 - - - - - - · 296 views
Nailed down some more quirks in the GC over the past day; unfortunately the last one is a real bugger.

It appears that LLVM's function inlining optimizations cause the GC to go slightly insane. I have yet to ascertain the exact interaction between the inliner and the GC, but disabling all inline functions makes the GC run beautifully.

I also remembered a nasty lesson that I should have had in mind a long time ago: deferred operations and garbage collection are a nasty mix. My particular problem was using PostMessage() to send a string to a widget. In between the call to PostMessage and the message pump actually dispatching the message, the GC would run. It would detect that nothing referenced the string anymore, and correctly free it.

Of course, when the message pump dispatched the notification in question, the control would try and read from the now-freed string, and bogosity would result.

Fixing that was just a matter of a find/replace and swapping in SendMessage, since I didn't really need the deferred behavior of PostMessage in the first place.

So my GC bug list is shrinking once again. I hope that I'm actually fairly close this time and not just shrugging off a "small" issue that turns out to require a major rewrite of something. I really don't want to ship without function inlining support, since that's a huge performance win in a language like Epoch where everything is a tiny function. However, I desperately need to find out why inlined functions cause the GC to vomit, and that's going to take a lot of time and patience I suspect. It may be worth taking the perf hit temporarily to get working software out the door. We shall see.

Now that things seem to be behaving themselves, and soak tests of memory-intensive programs reveal no issues, I have a few things I'd like to get back to working on besides garbage collection.

One is shoring up the Era IDE so that it's actually functional enough to use on a consistent basis. That'll likely involve making random additions and refinements over time, but there's some low-hanging fruit I'd like to hit sooner rather than later (mostly surrounding tabbed editing for now).

After Era reaches a point where I'm comfortable spending a few hours in it at a time, I want to go back to working on the self-hosting compiler and getting the parser and semantic analysis code up to snuff. The major semantic features left center around overload resolution and pattern matching; the parser needs to support basically the entire full language grammar instead of the trivialized subset it recognizes now.

Despite the lengthy diversion, I still feel like I'm on track to finish self-hosting by the end of the year. It may be a tight squeeze, and there may well be a number of bugs to sort out before it's 100%, but I think I can get the majority of the port done by then.

Once self-hosting is reached, there's a lot of simple but tedious optimization to do in the runtime itself. One major thing I've mentioned in the past is moving away from storing Epoch "VM bytecode" in the .EXE file and directly storing LLVM bitcode instead. That'll cut back on the dependencies needed to run Epoch programs, and reduce launch times substantially as well.

From there, it's mostly a matter of introducing language and runtime features. I suspect that once the compiler is done there will be a lot of crufty code that can be dramatically improved with some minor language tweaks. There's also the obvious stuff like separate compilation support, better data structures, parallelism...

Obviously still quite a lot to be done, but things are getting pretty exciting. I'm starting to have semi-realistic daydreams about people joining on with the project as it gains momentum and helping to smooth out some of the many remaining rough edges... hint hint :-P

Mostly though I hope that reaching self-hosting will be enough to gain some serious attention. The language is still very, very far from ready for prime-time, but the more people start playing with it the sooner it'll get there.

More adventures in garbage collection

Posted by , 31 July 2013 - - - - - - · 353 views
One of my least favorite types of bug to unravel is the Ball of Yarn.

A Ball of Yarn is not just one bug. It's a large number of bugs, all in close proximity, which influence and interact with each other. If you pull on one of the threads, one of two things will happen: either you will slowly gain insight into the mess, or (more likely), you'll inadvertently tighten a knot.

Garbage collectors are a prime source of Ball of Yarn bugfests. So many subtle factors can influence them going off the rails that when a GC blows up the cause is probably highly nontrivial. Here's a summary of the threads I've yanked on this past week.

The Heisenbug
These are always scary. You have a bug, that's bad enough. But then you go to investigate it, and the closer you peer, the more the behavior changes. This is a classic recipe for disaster and requires immense patience and focus to solve.

I had one earlier that was classic bizarre-o-world: if I ran my entire unit test suite against the compiler/runtime (and, by extension, the garbage collection system), certain GC-related tests would fail. Predictably. Every time.

If I ran the failing tests individually, they would pass. Predictably. Every time.


After scratching my head for a little while it occurred to me that a previous test in the suite must be setting up some kind of condition that affected the GC behavior. This was supposed to be impossible since I've gone to great lengths to isolate each unit test from the others.

Or so I thought.

Long story short: turns out global variables are bad, mmkay? No, but seriously. It wound up being a single innocuous global that did, in fact, alter the behavior of the GC indirectly. Resetting its value between unit tests fixed the Heisenbug.

Which way is up?
Epoch is compiled to an abstract machine language by the Ahead-of-Time compiler tools. This machine language is no longer spoken by anything, since I killed the VM that ran it a few months ago. However, it does serve as a first-pass form that is mechanically translated into LLVM bitcode and then JIT-compiled to native machine code when the program is loaded.

(Eventually I want to just serialize LLVM bitcode into the program image and load/JIT it directly instead of recreating it at runtime, but that's just an optimization for later.)

LLVM has a nice feature which is integral to doing precise garbage collection, known as stack maps. A stack map is simply a data structure that describes where certain variables live on the machine stack at a given moment in time.

This is vital because stack mapping allows us to know what roots exist in the program. Roots are the variables that, via traversing pointers/references, lead us eventually to all the valid objects that are allocated and need to be kept alive instead of recycled by the garbage collector. Without roots, our option is to do conservative garbage collection (i.e. if it looks like it might be a pointer to a legitimate object, treat it as one, even if it's just a lucky-shaped int or something). Conservative collection is gross and I refuse to do it for Epoch, so stack mapping it is!

Unfortunately, LLVM's documentation around how stack maps work is... lacking, to be extremely gracious.

I spent a lot of time this week waffling on how to actually use the data in a stack map from LLVM. Turns out that if a certain number is positive, you interpret it one way; and if it's negative, you interpret it an entirely different way.

Of course none of this is written down anywhere and trying to unravel the LLVM source to understand it was less than helpful. (Bonus points: another addition to the "less than helpful" camp is the LLVM community itself, which seems to treat any work on garbage collection as some kind of plague.)

Bottom line, it was hard enough to figure out my first semi-working attempt at understanding stack map data. Second-guessing myself all week did not help, and it turns out I was probably right the first time.

Validation is important, kids
A couple other fun bugs arose from not validating data correctly. The GC formerly had a few places where, if it couldn't recognize the type of an object, would just punt and assume it was legit and had some arbitrary type. This is a terrible idea.

What would happen is the GC would start traversing memory, and come across an object it didn't recognize. It would then dutifully add that to a list of things to go back and check up on later. In some cases, the bogus data would just be ignored, and everything would continue on as if nothing had happened. In other cases, the result was silent failures to correctly mark live objects, which would then be swept by the GC and destroyed while the program was still using them. Oops.

This kind of nonsense accounted for not one but three semi-independent bugs in the GC that I fixed over the past week.

FPO is evil
I discussed this in a previous entry, but the bottom line is that some compiler optimizations aren't worth the trouble. FPO is one of them.

I managed to work around the problem by introducing the concept of a "bookmark." A bookmark is a simple list of all the times we exit an Epoch stack frame and enter a foreign stack frame, such as when calling Win32 API functions. During garbage collection, we traverse stack frames that have been bookmarked in addition to any frames we can find using standard frame-pointer walking. This covers the gaps very effectively, at a small performance penalty. I'm planning on revisiting this someday and cleaning up the cases where bookmarks are stored so I can reclaim some of that speed.

But wait, there's more!
As of now, there's still at least one bug left in the GC. It either relates to algebraic sum types, variadic functions, or both. I'm not entirely sure how to resolve this last one, aside from just hammering away at it for a few days and hoping for the best.

Despite all this, there's definite and clear forward progress being made. I hope to wrap this up soon-ish and return to the self-hosting compiler project, although I have to admit it's been an entertaining diversion.

Hard lessons

Posted by , 25 July 2013 - - - - - - · 1,600 views
I've learned three nasty lessons this week.

First, exception handling in C++ is a knotted mess of undefined behavior, especially when you start blending JIT compiled native code with C++ code that can throw exceptions.

Second, as a result of this, when testing code you should always run it under a debugger so you can tell if an exception is getting thrown and then eaten silently by the undefined behavior demons.

Third, after learning the first two lessons and ripping out a lot of hair, I came across a third: frame pointer omission is the ultimate enemy of accurate garbage collection.

The third one may bear some explanation.

Let's start at the beginning. There is an informal convention on X86 systems that involves creating what are known as stack frames. When you call a function using this convention, it stashes two important pieces of information on the machine stack: a return address (i.e. where the function goes back to when it's done) and a frame pointer. This pointer has the location of the next pair of return address/frame pointer values on the stack, forming a singly linked list of stack frames. When you find a frame with a NULL next pointer or return address, you know you've reached the end of the list.

This is a powerful tool because it lets us construct a call stack. The call stack is like a breadcrumb trail of how your code gets to where it's at in any given moment. While that's a nice debugging aid, it also serves a broader purpose in my schemes: it gives us a way to track where we are in the program and what variables are active, which is central to doing accurate garbage collection.

For review, there are two basic modes of garbage collection: accurate and conservative. In the former, we know where all the variables live and how to peek into their values, so we can always (more or less) construct an accurate picture of what memory the program is using and what memory can safely be reclaimed. Conservative garbage collectors, on the other hand, take the view that we should look at every possible variable on the machine stack. If it smells like a pointer, we assume it is one, and don't free whatever memory it might be pointing to.

Conservative collectors are kind of gross, but ultimately necessary for languages that do not generate enough metadata to support accurate collection. There are also minor performance tradeoffs involved, such that you may not want to pay for accurate collection if your stack is sufficiently small and scraping it is not too expensive. Anyways, the theoretical benefits and cons are unimportant; Epoch uses an accurate garbage collector, which is what matters for this story.

Now, back to frame pointers. There's obviously a tiny bit of overhead involved in setting up this linked list structure, but it has another cost that isn't as apparent: it consumes a register, typically EBP, on the processor at all times. Since EBP is being used to help maintain the linked list, it can't be used for more general-purpose stuff. On a CPU architecture like X86 that is hideously short on registers, this can be a problem.

So way back in the day someone figured out that they could simply not set up stack frames, because compilers are smart and can juggle all the stack shuffling a program does without getting confused. This buys an extra register and shaves off three or four instructions per function call. The name of this trick is Frame Pointer Omission, or FPO.

Pretty nifty, eh?

Except it's actually evil. I'll pause for a moment and let you figure out why.

That's right: with no frame pointer list, we can't tell where our program is or how it got there. This in turn compromises our ability to tell what memory is in use, because we can't really see what the program is doing.

Simple solution: disable FPO in garbage-collected programs!

And that is indeed what I did, many moons ago, in working on the Epoch GC initially.

That served me well until I realized the first two lessons of the week. Once I started running Epoch programs under a debugger, I noticed intermittent crashes that were actually being eaten by the undefined behavior monster (!!) and silently allowing the program to continue running while secretly mangling more and more memory over time.

See, 99% of the time, we'd mangle something that was already garbage and about to be freed anyways, so there's no code to find out that its data is borked. However, that nice little 1% of the time, we'd mangle something that was still in use, and all hell would break loose.

The root cause of all this was that something was being prematurely garbage collected. Ugh.

So I set out to trace exactly why things were getting collected while still alive in the program. Weirdly enough, it became more and more apparent that a specific pattern was occurring:

1. This manifested only in the Era IDE
2. The crash always included references to the Scintilla widget in the call stack
3. The exact same type of object was being prematurely collected every time
4. This object type was instantiated only when passing certain messages to and from Scintilla's widgets

Something about Scintilla was buggering up garbage collection... but what, and how?

After a lot of poking around in raw memory dumps and spending an inordinate amount of time combing through the GC code for bugs, I made the key discovery that unraveled the whole problem.

Sitting in the stack data, obscured by layers of gibberish from Scintilla's internal operations, were two stack frames that got "lost." Essentially, the linked list of frames skipped over both of these functions, and made a hole in the call stack.

Closer examination revealed that the missing stack frames both pointed to my code - i.e. there was some kind of interaction with Epoch code during the lost time period. Turns out that in that gap, an object was instantiated - and, you guessed it, it was exactly the same object that later got prematurely freed and caused one of the silent "crashes."

And what caused those missing frames to vanish in the first place? Why, FPO, of course. Scintilla is compiled with FPO enabled, which in turn means that calls that interact with it will mangle the call stack linked list of stack frames. Why they get mangled in this precise way is still a mystery to me, but it suffices to know that interacting with an FPO-enabled module causes the garbage collector to fail.

In hindsight, the whole thing is painfully obvious, but when chasing a bug like this the obvious can become suspect. I actually theorized that this was the problem early on, but ruled it out for reasons that turned out to be false. Oops.

Solving this will, sadly enough, probably not be easy. The only solutions I can think of border on falling back to conservative collection, which I really don't want to do. The standard fix to FPO in garbage collectors is to store extra debug metadata that allows you to reconstruct the call stack without the frame pointers, which is akin to what debuggers do anyways when generating call stacks from symbol data. Unfortunately, that doesn't at all solve the case of external libraries for which debug information is not available, and it's a slow thing to do at runtime anyways.

So that's been my week in a nutshell.

Developing a language can be a real bitch sometimes :-P

IDE hackery

Posted by , 20 July 2013 - - - - - - · 600 views
As I continue to mess with self-hosting the Epoch compiler, I'm spending a lot more time actually editing Epoch code than C++ code. This is a wonderful thing overall, since I basically created the language precisely so I could quit writing apps in C++.

Writing a lot of Epoch code in Notepad is pretty brutal, and I was kind of tired of just working on the compiler anyways, so I decided to take this weekend and play with the Era IDE a little bit.

I mocked up a status pane and a project navigation tree, and added a real status bar so I could keep track of what line of code I'm editing (which is useful when the compiler spits out line numbers as error locations).

That was fairly easy going, but it's also just messing around with the Win32 API from Epoch, which isn't terribly adventurous. So I set out to make Era a tabbed editor.

The results:

Attached Image

Shown is a fragment of the self-hosted compiler that implements the Shunting Yard algorithm for operator precedence handling.

Still a long ways to go, both on the compiler itself and on Era, but it's fun stuff.

I'll just leave this here...

Posted by , 16 July 2013 - - - - - - · 582 views
Attached Image

Here's the kicker: the compiler that produced this working program is written entirely in Epoch.

Yes, the test program is extremely minimalistic, but it passed.

Self-hosting completion is now just a matter of getting the parser to understand the entire language grammar, and finishing out the last bits of semantic analysis/type system enforcement that still need to be done.

Which is not to say it's a small workload; to be sure, I'm still optimistically gunning for end-of-year to get it done. But it's now a visible target instead of a nebulous future possibility.

August 2016 »

28 293031