Jump to content

  • Log In with Google      Sign In   
  • Create Account

The Bag of Holding

More hacking on native executables

Posted by , 31 October 2014 - - - - - - · 922 views
Epoch
I've started the long and tedious process of slowly but surely hooking up every single Epoch language feature to the new LLVM bindings, so that the compiler can emit 100% native binaries instead of JIT compiling the native code when the program is started.

So far the only thing that works is printing out strings, and only statically defined strings at that. But that's something, because it means that the import thunk table works, function invocation works, and the embedded .data segment works. In less obscure terms, the Epoch runtime can be called from a native program and use constant data that's stored directly inside the executable file.

The infrastructure for doing all this took a bit of work to set up. The module responsible for generating EXE files from compiled IR is, give or take, about 1000 lines of Epoch code. The C++ shim that talks to LLVM is another 600 or so lines of code. The runtime itself is dead trivial but that's only because it doesn't have 99% of the language functionality in it, nor a garbage collector, nor a threading/task switching model.

It may not sound like a particularly large volume of code, but every line represents a significant battle to figure out all the intricacies of the Windows Portable Executable program format, the way early-bound DLLs work, how to map Epoch constructs into LLVM constructs, and so on. The amount of effort put into every ounce of this code is tremendous, given that I only have a few hours a week to hack on this project typically. The biggest hurdle is losing my mental context every time I have to call it quits for the night; if I could concentrate a solid five or six hours of focused work on Epoch, I could probably triple my productivity. Sadly, I just don't have that luxury right now.

Given the constraints I'm under, I'm pretty happy with progress thus far. It may take a while to get all of the various language features back to a fully functional state, but the project as a whole is already benefiting immensely from the reduced complexity of the pipeline as well as the general flexibility and power of the new architecture.


For now, though, I desperately need some sleep.


Native Binary Project: Day Whatever

Posted by , 27 October 2014 - - - - - - · 716 views
Epoch
I continue to hack on the Epoch compiler, slowly shaping it into a powerhouse of executable binary generation. So far I've gotten the program string table and DLL import table built in a flexible and extensible manner. This is the first step towards getting actual code generation back online at full capacity.

Now that I can link to DLLs from native Epoch programs, I can start replacing the old runtime library with a newer, cleaner, and slimmer version. As part of the process, I'm rewriting the LLVM bindings, which means that code generation is going to be broken for a while, at least in part.

A few minutes ago, the compiler generated a rudimentary .EXE that imports the new Epoch runtime DLL and calls a function in it. This is all completely flexible - none of the data is hardcoded or hackish to make it work. This is in direct contrast to the old method of building Epoch .EXEs, which basically amounted to hardcoding a tiny program that launched the main program after JIT compiling it with LLVM.


My goal here is to do absolutely everything I can in Epoch, and slice out as much C++ code as possible. I'm still playing around with ideas in my head for how to make the LLVM bindings as slim as possible, but it shouldn't be hard to seriously cut down on the amount of fragile C++ hackery being done there.


It's kind of annoying how much sheer effort I can dump into a project like this, and then be able to summarize the entire affair with a single sentence. I'd rail on about all the detailed hacking and prodding that went into making this happen, but it's actually pretty boring and involves a lot of profanity. So instead, I'll just leave you with a short journal entry.


And now to bed, and now to bed.


Rewrite ALL the things!

Posted by , 23 October 2014 - - - - - - · 563 views
Epoch
OK so I'm not really rewriting everything... just... a lot of it.

The current runtime infrastructure for Epoch is precarious at best, and desperately needs some improvements. There are a number of things I want to do that all happen to coincide nicely with a desire to rewrite the runtime system:
  • Destroy the EpochLibrary DLL forever. This is an ancient crufty artifact that deserves death, and getting rid of it should simplify the entire build pipeline noticeably.
  • Improve program start times. I've posted about this before; basically the Epoch runtime JIT compiles your program every time it starts up, which in the case of the compiler leads to about a 6 second stall. Gross.
  • Emit true native binaries - tied in with the above point, I'd like to emit genuine native .EXE and .DLL files instead of mutant VM hybrid files. This will enable the use of things like dynamic linking which will be a huge win.
  • Separate the compilation pipeline into modular pieces. Right now the parser is "reused" by brute-force including the code in both Era and the compiler itself; instead, I'd like to make the parser and AST stuff into DLLs that feed data around to each other.
I have a loose mental strategy for doing all this, which looks vaguely like the following:
  • Build a new DLL (tentatively EpochLLVM) which wraps LLVM and allows the compiler to make simple C-API calls to set up LLVM IR and generate executable machine code from it.
  • Retrofit the existing Epoch compiler to use this new DLL when generating binaries.
  • Rebuild garbage collection and other runtime infrastructure in a separate DLL (tenatively EpochRuntime) or maybe a few DLLs.
  • Self-host the compiler through this new chain.
  • Convert Era to use the new chain.
  • Build support for emitting Epoch DLLs.
  • Proof-of-concept the DLL system by replacing the C++ lexer used for Scintilla syntax highlighting with the actual lexer.
  • Split the remaining parser, AST, and codegen logic into separate DLLs.
  • Self-host again using this new infrastructure.
  • Ship Release 16.
This ought to keep me busy well through the end of the year...

The benefits will be huge though. In addition to being able to write Epoch DLLs, getting faster start times, and cleaning up the modularity of the code a lot, this will pave the way towards integrating higher-level source processing tools with Era, as well as giving me an opportunity to revisit some historical but gross decisions, like the use of UTF-16. Last but not least, it gets even more of the implementation of the language and tools moved into Epoch instead of C++.

Overall it's a daunting amount of work, but I think it can be managed. The real trick will be staying interested in the process during the long dark period where nothing quite works. I hope my skeletal plan will give me plenty of moments to sit back and enjoy visible progress, but we shall see.


So, here goes nothing!


Slimming down the Epoch runtime and improving program start times

Posted by , 18 October 2014 - - - - - - · 544 views
Epoch
Right now a big limitation of the Epoch program execution model is the fact that .EXEs generated by the compiler are not in fact native binaries. They are compact bytecode wrapped in a thin shim that churns said bytecode through an LLVM wrapper layer each time the program is launched. This means that every single time you start an Epoch program, it is basically getting re-built and re-optimized from bytecode and then JITted to native machine code.

This means that the compiler, for example, takes about 6 seconds to start on my machine. Pretty sad.

The way around this is to build native binaries using LLVM and just stuff them on disk. The runtime still needs to exist in order to provide garbage collection support, marshaling into other native processes, and so on. However, it will be much skinnier and more efficient without carrying the weight of the LLVM wrapper all over the place. Net result should be faster startup times for Epoch programs and less memory consumption overall.


Easier said than done.

LLVM doesn't directly generate machine code that's really all that suitable for emitting to a binary on disk. It leaves out a lot of platform-dependent details like how to link to DLLs and how to handle global data. Much of the infrastructure is designed to assume you're emitting machine code for JIT runtime purposes, not necessarily for serialization.

Now obviously this isn't insurmountable; plenty of compilers depend on LLVM (notably Clang) and generate binaries just fine. So the real magic is going to lie in finding all the places where JIT code doesn't serialize nicely and working around them.


Off we go!


Epoch Mailing List

Posted by , 13 October 2014 - - - - - - · 495 views
Epoch
I've spun up a mailing list for the Epoch project, mostly because I'm tired of having conversations about it span half a dozen websites and PMs and blah blah.

Here's the link: clicky. Or you can email epoch-language@googlegroups.com to accomplish the same thing.

It should be open to anyone to come and talk about the language or just ask questions. I'll be seeding the discussion with a few subjects that are still open questions in my mind and hopefully people will jump in and kick some thoughts around.


Three toads and the spit of a badger

Posted by , 13 October 2014 - - - - - - · 423 views
Epoch
So Epoch has a couple of warts that I'm looking at smoothing out. One of them has to do with overloads that do type decomposition of algebraic sum types:

//
// How things look today
//
type Optional<type T> : T | nothing

Test : integer x
{
    print(cast(string, x))
}

Test : nothing
{
    print("Nothing")
}

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}
I'm thinking about modifying the parser so that overloads of a function can be chained together without repeating the function name, by just specifying the signature of an overload immediately after the body of the function, and following with the body of the overload:

//
// How things look with implicit overloads
//
type Optional<type T> : T | nothing

Test : integer x { print(cast(string, x)) }
     : nothing   { print("Nothing") }

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}
The last question I want to ask is what lambdas look like, and whether or not they can help compact this code even further. Here's one thought experiment:

//
// Experiment with lambda syntax
//

type Optional<type T> : T | nothing

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    var f = lambda
     :(integer x) { print(cast(string, x)) }
     :(nothing)   { print("Nothing") }

    f(exists)
    f(doesntexist)
}
I'm not totally sold on the lambda syntax yet. I do like the idea of allowing overloads of lambdas, specifically for the type decomposition thing. Or maybe type decomposition just really needs a first-class structure, like case:

//
// Experimental case statement
//

type Optional<type T> : T | nothing

Test : Optional<integer> in
{
    case(in)
        : integer x { print(cast(string, x)) }
        : nothing   { print("Nothing") }
}

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}



Watching syntax evolve

Posted by , 08 October 2014 - - - - - - · 688 views
Epoch
I'm still thinking a lot about the question of task syntax in Epoch. The more I contemplate the matter, the less I like the idea of having tasks be overloaded as the "only" way to do namespacing. I also don't like the idea of static methods/data very much.

I still really like keeping tasks as the single mechanism for packaging functionality with data. I also like the idea of tasks existing as siloed off entities that can't interact except via message passing; this gives tremendous power to the model and makes threading and distributed processing a lot easier to realize.

Really the only big thing I'm recanting on is the idea of tasks being the way to group named stuff. I think a package notion is more sensible, for a number of reasons. First, packages can contain related logic/data without demanding that they all share an instantiation (or are static). Second, packages handle the idea of related logic that doesn't share any state, such as a library of free functions. Third, packages can then in turn contain tasks, which I think is an appealing way to go.


So here's my latest concept of how all this might look:

//
// Define a namespace of related "stuff"
// Could also have structures and tasks inside it, or maybe even nested namespaces?
//
package Outputters
{
	console : string text { print(text) }

	logfile : string text
	{
		// Imagine an IO implementation here.
	}
}


//
// A task is a compartmentalized entity. Once it is
// created, it has no interaction with the rest of the
// world except via message passing.
//
// While task functions can return values, only those
// functions without a return value can be invoked
// via plain messages. However, I'm pondering some
// syntax for capturing task return values as futures.
//
task GenerateStuff :
{
	//
	// This is a valueless function (method) so it
	// can be called as a plain message from outside.
	//
	// Messages are not dispatched while the task
	// is already handling a message, so a caller
	// can pass a bunch of messages and they will
	// safely queue up and be executed in order.
	//
	go : integer limit, (output : string)
	{
		while(limit > 0)
		{
			output(cast(string, limit))
		}
	}


	//
	// This is a valued function. It cannot be
	// invoked as a standard function because it
	// lives inside the task, and must be talked
	// to via message. In order to get the return
	// value "out", we use a future; see below.
	//
	compute : integer in -> integer out = in * 2

}


//
// Standard Epoch entry function
//
entrypoint :
{
	//
	// Create an instance of the task
	//
	// Note the new syntax for creating something
	// with no data fields to initialize
	//
	GenerateStuff generator = ()

	//
	// Pass some messages, using stuff
	// from the package as an example.
	//
	generator => go(42, Outputters.console)
	generator => go(42, Outputters.logfile)

	//
	// Pass a message to the generator,
	// get a future back out, and wait on
	// the future to get its value.
	//
	future<integer> result = generator => compute(21)


	// This blocks until the message comes back with the return value of compute()
	integer foo = result.value
}



Epoch Release 15 is now available

Posted by , 06 October 2014 - - - - - - · 450 views
Epoch
Release notes.


I didn't get a ton of votes, but all the votes I got were for doing a release sooner rather than later. So here you go, with all the warts and incomplete stuff.

This is mostly a milestone preview for those (few) of you who are really watching the project; think of it as a sneak peek more than a stable development platform. Even still, you can do quite a hell of a lot with it.

I look forward to hearing all the things that suck about it.


Plodding ever onward

Posted by , 05 October 2014 - - - - - - · 414 views
Epoch
I've begun the mammoth task of refitting the Epoch compiler to believe in namespaces.

Up until now, all names have been global, and all named entities have existed as peers in the global space. (The exception of course being local variables and function parameters/return slots.)

So far, I've moved algebraic sum types and scope metadata to be tracked in a namespace. This doesn't yet entail moving away from global naming, because I need to have all aspects of the compiler "understand" namespacing before I can do that. But it's a major step forward, and I've already caught a couple of rogue compiler bugs and corner cases in the language's implementation.

As rewarding as it is to make progress, there's still a very long ways to go:
  • Type aliases
  • Weak type aliases
  • Structures
  • Function signatures
  • Function tags
  • Functions
  • Overload resolution hints
  • Type matchers
  • Function templates
  • Structure templates
  • Sum type templates
  • Instantiated function templates
  • Instantiated structure templates
  • Instantiated sum type templates
Wheeze.

Some of these are much more pervasive in the code base than others, such as functions and overload resolution hints. Some are already partially ready to go thanks to being implemented later in the compiler's authoring process, and having benefited from me learning how best to use the language along the way.

Once this list is all done, I'll need to go back in another pass and remove the direct usage of the global namespace from all the places where it's currently hardcoded. After that, I'll need to build logic for creating new namespaces and passing them around as appropriate, e.g. for tasks.

Hopefully all this stuff will take long enough that I'll get around to finalizing the actual task syntax before it's time to start implementing tasks in earnest. We shall see.


I haven't totally forgotten about the notion of doing a full language release, either. The plus side of doing a release now is that it would provide a much more realistic representation of the Epoch development experience than, say, Release 14 was. On the down side, it would still mean shipping a lot of subpar code and tools.


Maybe if there's actually some interest in me doing a release, I'll package one up. I'd just hate to go to the trouble of building a package and having a whopping two people download it and never comment on their experiences/opinions. It's been hard enough in the past to get serious traction on releases, so I kind of feel like holding off until I have a really awesome release to show.

Buuuuuuuut I also kind of want people to see all the progress that's been made without having to sync the code repository and go through the muddle of building the thing from sources.



Agh! Indecision.





October 2014 »

S M T W T F S
   1234
567891011
12131415161718
19202122232425
262728293031