Background: I'm working on a mark/sweep-based garbage collector for the Epoch language. The full details are far too intricate for a single journal entry, but the basic theory is this: start with all the variables that are on the stack (aka "roots"), follow all the references they have, and keep following references until you stop discovering new objects. Anything you've found is still potentially in-use by the program. Anything not found can be deleted, because there's no way the program can get access to it.
First, I turned off all flagging of roots except for local variables, to eliminate complications from function parameters and calling convention differences. This still crashed. Next, I turned off actually deleting free objects, to ensure that the heap traversal wasn't interfering with the process. This also crashed. Finally, I took a close look at how roots are checked by the GC implementation itself, and noticed an interesting pattern.
In simple test cases, the root checker always works. It requires some particular degree of complexity in the code to actually trigger the crash. So I set out to look for how the generated machine code was interacting with the stack, in the hopes of finding some quirk that explained the mistakes in the root checker.
Eventually, after probing around for a while, I notice a telltale call in one of the crashing functions: an invocation of the runtime function __chkstk. This function is used to verify the integrity of the stack after a dynamic allocation occurs in stack space - i.e., it's a safety trap run after an alloca call.
alloca should never be called by Epoch programs; for one thing, it ruins the ability of the stack crawler to correctly locate stack roots. But it's also a performance liability. It's much better to allocate all required stack space for a function once, up-front. The fact that this is happening means that my generated LLVM bitcode is asking for extra stack space after a function has entered its main body!
Sure enough, a quick search of the offending function's bitcode shows the allocation plain as day. I'm somewhat relieved that it's something this simple, but also a little miffed that it took so darn long to figure out.
I audit the JIT code for places where this might be tripping, and find two candidates. Now I just need to figure out how to eliminate those extra stack allocations... hmm...
Sure enough, after a little rearrangement, eliminating the extra allocation causes the garbage collector to start behaving again.
Now to slowly reintroduce features until it's back up to 100% operation... First is LLVM optimization passes, which I turned off to help pinpoint the faulty code in the first place. These come back online with no problems.
Next it's time to actually free memory marked as unreachable by the garbage collector. This passes with flying colors!
At this point the GC is fully back in action.
It is, sadly, slower than dog snot, so I'll have to find a way to improve the string garbage collection mechanism in particular to get it to stop being so painfully bad. It also appears to be too aggressive in its collection, as the compiler, when attempting to self-host, eventually reports a string has gone missing unexpectedly.
I expect that this is due to disabling the marking of certain stack roots earlier, so I turn that back on (oops). Instead of waiting for another really slow compilation pass, I decide to go ahead and optimize the string collector a bit more.
Along the way I find another oversight that will help immensely with performance: the runtime was triggering a garbage collection pass after every single allocation of any kind! Dialing this back to its old value (every 1000th allocation) should improve the throughput a bit.
After all that, the compiler successfully self-hosts again, with the garbage collector fully running! Woohoo!
It takes about 10 seconds to fully build the compiler now. I don't have the heart to actually time it, since I was sub-second without garbage collection so very recently. However, that's still a reasonable accomplishment, and performance is something I can always come back to and work on any old time.
With everything on the compiler and runtime framework side operating again, it's time to get Era back in shape. This will require modifying the compiler to be able to emit Windows resources such as dialogs, menus, and icons; not terribly difficult, but time consuming, and somewhat boring. I think I'll leave that for some other time.
It'll be really nice to quit using Notepad for code editing. I'm already looking forward to making Era more powerful and feature-rich so I can spend more time in that environment.
After all these years, Epoch is finally a language I feel like I can write production-quality software in, and I definitely enjoy writing Epoch code a lot more than, say, C++. A little polish here and there to work out some of the less expressive kinks in the language, some more features, and a good set of tools... and I could really, really love this thing.
Which, hopefully, means someone else will, too!