Jump to content

  • Log In with Google      Sign In   
  • Create Account

The Bag of Holding

On Embedding General-Purpose Scripting Languages

Posted by , 22 July 2015 - - - - - - · 887 views

I spend a lot of time thinking about programming languages. Most often it's from the perspective of a consumer, i.e. a programmer using Language X. Sometimes it's from the viewpoint of someone who wants to explore Language Y, either for fun or for practical usage. Occasionally I see languages from the vantage of a language creator.

Recently, I've sunk a lot of brain cycles into thinking about languages as a non-programmer. More specifically, I've been thinking about the tendency for large (and sometimes small) games to use "scripting" languages to do various things.

I think there are basically two (often overlapping) desires in play when using an embedded scripting language. One is to make programmers more productive. Often this is believed to be possible by using "higher level" languages to write certain types of logic. The other common desire is to make non-programmers more productive, by enabling them to directly modify logic in the host program/game.


The more I ponder this, the more I'm convinced that in most cases, neither desire actually justifies the use of a general-purpose scripting language.


Lies About Language Productivity
It is fairly common in programming circles to hear assertions like "Language X is more productive than Language Y." A popular tag for this used to be "rapid development" although I hear that term less nowadays. I will not attempt to tackle the debate about whether or not X can be more productive than Y. However, what I would like to point out is that this is actually irrelevant!

If we're talking about embedding a scripting language into a host program, like a game, we aren't comparing X and Y. We're comparing "X+interop layers+Y" to Y by itself. This distinction may seem pedantic, but it's crucial. If we wanted to add X to our Y-based program in order to boost programmer productivity, we have to be careful here.

Depending on how the job is done, adding a language to a program can actually be a massive net loss of productivity. Suppose we have a C++ program, written by C++ programmers, who implement a Lua interpreter in their program. Now the average programmer working on the program must know not only two different languages, but must also be intimately familiar with how the two interoperate.

If we designed APIs like this, we'd revile the result as a leaky abstraction, and rightly so. I content that embedding a general-purpose language is almost always a net loss, for the following reasons:
  • Programmer knowledge is now required to span multiple languages
  • Programmers MUST be fluent in the boundary layer between languages
  • Interop is likely less performant than staying in a single language, especially if experts in the interop are unavailable
  • Impedance mismatches are guaranteed to crop up, i.e. languages think of things differently and have different idioms
  • The mental overhead of managing multiple languages is taxing and leads to higher bug rates and nastier bugs
So if you want to embed a general-purpose scripting language in your game to make development more productive, don't. Focus on making better tools and making better architectures. A good architecture allows you to build high-level logic almost as if it were scripting anyways. There is no need to do it in a different language.

Note that this is in no way an argument against using, say, C# to write your level editor and C++ to write your game.


What About Designers?
The other motivation I mentioned above for doing scripting systems is to allow non-programmers to do powerful things.

I'm going to point out something that feels painfully obvious to me, but in harsh experience, far too many people just cannot wrap their head around:

Shoving a general-purpose programming language in a non-programmer's face is a great way to lose team members and make people very frustrated.

I can safely assume that most of my readers are not astronauts. So if I were to cram you into an Apollo capsule and yell GOOD GODDAMN LUCK while slamming it shut, I can safely assume that you'd react negatively. You're not trained to think about the tools in front of you. You're not informed as to what they're for. You don't know what to expect after the door closes and the last echoes of my jerk voice fade away. You're in an alien environment and can't even tell what will happen next.

That is what putting a designer in front of a general-purpose scripting engine feels like.


Why do I keep mentioning languages as "general purpose"? Because there is an alternative, which works, and works well.



Domain-Specific Languages
Many people conflate DSLs with "very limited languages." This is unfair. A DSL can be immensely powerful and still be confined to a particular domain. It can even be Turing complete if you really want.

I half-jokingly posted on Twitter the other day that Excel combined with a verb system would make a great DSL for game designers.


And the more I think about it, the less I think that's a terrible joke of an idea.

I might just give it a shot someday.


MMOs! Y u no EZ?

Posted by , 05 February 2015 - - - - - - · 689 views
MMO
A common aspiration among nascent game developers is to make an MMO. It's actually a pretty common aspiration in many perfectly functional, well-established studios, as well; but at least on GDNet, we mainly see newbies who want to make the next World of Warcraft, and so it's easy to be misled into believing that the MMO fascination is somehow peculiar to those who "don't know any better."

The reality, though, is that plenty of non-newbies have tried and failed to make successful MMOs. Many projects begin and never see the light of day; still more launch and are ultimately failures as business ventures. There's a common cause here, I think: making an MMO is hard, and for reasons which are not altogether obvious. The newbie wants to make an MMO because he doesn't understand how hard it is; the veteran industry studio fails to make an MMO because they don't understand how hard it is, either.

To do something hard, and do it well, requires preparation and a lot of awareness. Even understanding why it is hard can be tough in and of itself. It doesn't help that basically every successful MMO developer ever is highly protective of their secrets - and rightly so.

As a successful MMO developer, I'm going to uphold the tradition, and not talk about the secret sauce that makes it possible to solve the hard problems of MMOs. But I will take some time to shed some light on what those problems really are, and why they are hard.


Content Volume
This is the one most people get hung up on, so I'll dispense with it right away. Producing the sheer amount of content that goes into a genuine MMO is hard, yes, but only because it's expensive. If you have a boatload of money and a lot of talented content creators, this problem isn't really that big of a deal. After all, huge single-player games get made all the time.

If you're not driving home in a dump truck full of hundred dollar bills, then this problem is obviously going to be pretty hard.


The Speed of Light
This is a genuinely hard problem because we can't change the speed of light. However, the speed of light (or more accurately, the speed of signal propagation in physical network equipment) represents a very real limit to how responsive a multiplayer game can be. The farther away your computer is from the "other" computer, the longer it takes to send a signal to that computer. For the sake of simplicity in this article, just assume that the back-of-the-napkin math limits us to some given latency in the tens of milliseconds for an average trip time. (I'm being very fuzzy deliberately.)

The bottom line here is that if I want to have realtime interactions with someone on the opposite side of the country, I'm going to have to wait an almost perceptible amount of time for my signals to reach them, and then an almost perceptible amount of time for their response to come back to me. Put together, this means that even in the purely optimal case where I can talk directly to other players, I might notice latency, or "lag", if they are far enough away.

So what about if I can't talk directly to my peers, but need a server?


Network Architecture
This is where things get truly fun. MMOs are traditionally client-server based, meaning that for me to talk to player B, I talk to my server first, who then relays my interactions to player B on my behalf. This is why server performance matters so much; if the server is busy and takes 60 milliseconds to forward my message to player B, I just made the speed-of-light problem that much worse.

But suppose I need to not just poke player B, but also demonstrate to the other 60 people in the immediate area that I did in fact poke player B. Now my server has to send 60 more messages. Each of those messages takes time to send - we don't have the luxury of sending multiple messages in parallel.

This means that the network that handles the game (i.e. typically a datacenter setup) has to be able to successfully blast out 60 of those messages to a potentially geographically diverse set of people, while maintaining minimal latency. Remember, again, every millisecond we pile on to the message trip time is a millisecond the player might end up feeling as lag.

Now, routing internet traffic is a pretty well-handled problem these days (in most of the world). Figuring out how to get our messages to each of our 62 players in a timely fashion is thankfully not that hard. What is hard is the fact that access to that level of internet beefiness is not cheap and not easy to attain. Internet Service Providers (ISPs) have peer agreements with datacenter installations, which govern how much traffic can come and go from the datacenter, and what quality of service it receives - i.e. how much latency and how much bandwidth will be involved.

A lot of people like to suggest "the cloud" as a solution here. The problem is that cloud providers generally can't compete with the quality of service that a good peering contract will give you. Yes, cloud services are great and still improving fast, but as of today, a dedicated datacenter is a better way to host a latency-sensitive game.

Datacenters, and peering agreements, are expensive propositions. This is back to the content volume problem, though - money can solve it. So let's look at other network related headaches.


The N^2 problem
Suppose we have a client-server model where 10 players are gathered to do whatever. Player 0 suddenly decides to jump up into the air. Now, players 1-9 need to see that, and player 0 needs confirmation that his character is jumping. So that's a total of 10 messages to relay.

Now suppose all the players jump simultaneously. Each player generates 10 messages, as we saw a second ago. Total up the messages for all the players: we must now relay 100 messages - 10 squared. This is generally referred to as "an N-squared problem."

N^2 problems occur all over the place in networked game construction. They have implications on many aspects of implementing an MMO: physics, game rule simulation, network throughput, security (think DDoS problems) and so on. There are ways to solve and/or mitigate the pain, but generally, every system that is affected by N^2 needs to solve the problem a little bit differently.

This means an interesting thing for software architecture: what solves your graphics N^2 may not solve your physics N^2, and both of those might fail to work with combat rules, and even then you need to figure out the network throughput issues, and so on and so forth. Suddenly you have a dozen systems that behave all slightly differently that have to be stitched together into a harmonious whole. That's not always easy, and in fact, it's one of the things that makes implementing a good MMO very difficult.


Nonlinear scaling
The final general principle I'd like to cover is that scaling is rarely linear. Let's say you make a multiplayer game. With 5 players, your server is doing great - 10% CPU usage. With 10 players, 18% CPU usage. With 20 players, 32% CPU usage. If you're following the trend, this sounds awesome! We can probably hit 60 players before running out of CPU! Except you add the 40th player and the CPU pegs out at 100%. What went wrong?

Scaling is rarely linear! We extrapolated that we could triple our 20-player numbers and get 60 players comfortably, but this extrapolation turns out to be flawed. What most people don't realize is that nonlinear scaling is everywhere in MMOs.

Take our N^2 scaling problem from earlier, even. It looks kinda linear if you don't look hard enough, but once you throw on the numbers, it bites hard. Believe it or not, there are worse ways to scale than N^2.

Network traffic, memory usage, CPU usage, database transaction volume, communication between back-end server systems... all these can scale in nonlinear ways. And when they do, you want to know about it long before it turns into a player-facing outage. This means testing, and load testing in particular. Good load tests are hard to do in general, but stress-testing an entire MMO infrastructure and proving it robust is particularly difficult. It makes the difference between a good launch and a bad launch, between profitable operation and burning money.


Parting Thoughts
There are many more challenges to making an MMO a successful business: security, legal issues, attracting a big enough player base, maintaining the player base (aka retention), monetization models, community management, and so on. Even for the issues I've touched on above, I've barely scratched the surface.

I hope this underscores the fact that MMOs are hard. Not impossible, certainly not impossible. Just very, very hard. It takes a diverse collection of skills and expertise to do them right, and sadly that mix just doesn't happen that often. I hope this writeup isn't depressing to aspiring MMO-makers; instead, I hope it gives you a start at understanding what exactly you will need in order to be successful.


Fun with templates. "Fun." Ha, ha.

Posted by , 10 December 2014 - - - - - - · 790 views

So I have an interesting situation where I wrote a math-intensive API to allow easy composition of various functions. The details are pretty easy so I'll just show an example of what the API usage looks like:

formula.Create()
       .Set(0.0f)
       .AddWithWeight(some_data_source, 0.5f)
       .Multiply(other_data_source)
       .Display();
The API is meant to operate on non-primitive objects, such as 2D arrays. This gives the end user a really powerful way to stitch together different operations without making the code unreadable or long.

Internally it does what you would guess: Create() allocates some working space to do the math in, and the operations pretty much do what it says on the tin, using that working space. Each function simply returns a reference to the "formula" object (*this) so that the operations can be chained arbitrarily.


Now, here comes the interesting bit: I want to change the API to support not just 2D arrays, but more sophisticated data structures, like, say, a quadtree. Operations should be able to mix and match types so that you can do things like prune the quadtree to only reference certain cells, grab all the objects in those cells, and do some calculus on some property of those objects.

Also, dynamic allocation is forbidden (inside the API implementation itself) and the syntax needs to retain the readability and succinctness of the original.


My first thought (being obsessed with compilers) was to build a sort of syntax tree that represents the operations and then traverse that to do the math. This is a failure on the dynamic allocation front, though, and also introduces a lot of overhead in terms of understanding how the code works. In other words, it involves a lot of machinery that the end user shouldn't have to care about, and maintainers shouldn't be burdened with.

My second thought (being obsessed with compilers) was to throw template metaprogramming at it. This has only one weakness, namely that templates do not work with floating-point numbers - making it impossible to do things like the AddWithWeight() call in the first example. Well, impossible without some fudging, anyways.

To fix the limitation on floating point literals, I decided to use a simple layer of indirection. Instead of putting "1.0" or "7.5" in my code, I use an enumeration value like CONSTANT_ONE or CONSTANT_FOO, since those can be used for template specializations. Then I do just that: specialize some templates to yield the correct floating point numbers on demand.

The next step is to describe the API from a user's perspective. I fiddled around a bit and came up with this:

CreateFormula<
		int,
		Add<int, DATA_SOURCE_CONSTANT_1>,
		Add<int, DATA_SOURCE_CONSTANT_1>,
		Multiply<int, DATA_SOURCE_CONSTANT_4>,
		DumpResult
	>();
This does add some ugly to things, namely in that the "int" type is now littered all over the code. Frankly, though, I think this is more of a problem in the silly examples than in real life, because in real life the API is used with many different types, so being able to specify a type at each step of the process is pretty handy. It also opens up the door for template-specialization-based optimizations.

Now that I know what I want the API to look like, it's time to build the implementation:

#include <iostream>


enum EDataSource {
	DATA_SOURCE_CONSTANT_0,
	DATA_SOURCE_CONSTANT_1,
	DATA_SOURCE_CONSTANT_4,
};


template <typename T, EDataSource source> struct Add { };
template <typename T> struct Add<T, DATA_SOURCE_CONSTANT_1> {
	static void Invoke (T * state) {
		*state += T(1);
	}
};


template <typename T, EDataSource source> struct Multiply { };
template <typename T> struct Multiply<T, DATA_SOURCE_CONSTANT_0> {
	static void Invoke (T * state) {
		*state *= T(0);
	}
};
template <typename T> struct Multiply<T, DATA_SOURCE_CONSTANT_4> {
	static void Invoke (T * state) {
		*state *= T(4);
	}
};


struct DumpResult {
	static void Invoke (int * state) {
		std::cout << *state << std::endl;
	}
};

struct Null {
	static void Invoke (int *) { }
};


template <typename T, typename O1, typename O2 = Null, typename O3 = Null, typename O4 = Null>
struct CreateFormula {
	CreateFormula () {
		T state = T();

		O1::Invoke(&state);
		O2::Invoke(&state);
		O3::Invoke(&state);
		O4::Invoke(&state);
	}
};



int _tmain (int argc, _TCHAR* argv[]) {
	CreateFormula<
		int,
		Add<int, DATA_SOURCE_CONSTANT_1>,
		Multiply<int, DATA_SOURCE_CONSTANT_0>,
		DumpResult
	>();


	CreateFormula<
		int,
		Add<int, DATA_SOURCE_CONSTANT_1>,
		Add<int, DATA_SOURCE_CONSTANT_1>,
		Multiply<int, DATA_SOURCE_CONSTANT_4>,
		DumpResult
	>();

	return 0;
}
What we have here is pretty straightforward. Each operation is represented by a struct template. In this case, operations always take exactly two parameters: one to tell the code what type the operation works on, and one to specify the data for the operation. We then use template specialization, as described above, to yield specific data values based on the enumeration. In more realistic code, instead of just having constants, we could have things like "list containing the locations of all nearby sandwiches" or whatever else.

Note that the operation structs have a single static function, Invoke. Invoke takes some "state" and goofs with it; nothing more, nothing less. This is the guts of each operation.

There are two operations worth calling out: DumpResult and Null. DumpResult is simply a way to take the state from the previous operation and put it on the screen. Null is a way to say "don't do anything" - this will come in handy momentarily.

The ugly beast is the CreateFormula template. This guy needs lots of optional template parameters so we can have formulas of varying length. Each optional parameter defaults to Null so that operations which are omitted from the template's type definition can be elided entirely.

Internally, CreateFormula has a single constructor, which does something that should sound really familiar: creates some state, passes it off to a series of operations, and then returns. Note that no dynamic allocation is necessary here; more sophisticated data structure usage might make it tempting to add more dynamic allocations, but that's not really the API's problem - it's down to the user to do the right thing.

Last but not least, in the main() function we see two examples of the template in action. It looks pretty darn close to the original, exactly like the syntax we wanted for the new model, and has the added benefit (over the original) of being more explicit about what's going on at each step.


As you might guess, the real application for this is a lot more complex. It doesn't limit itself to simple operations like add and multiply - there's analytical differentiation, integration by sampling, optimization (i.e. "highest" and "lowest") queries, and so on. So the actual operations are much richer. This is the main reason why operator overloading is not useful; we simply run out of squiggly symbols for all the different operations.

Another thing worth mentioning here is that modern compilers are exceptionally good at figuring out what's really going on in this code. When using int as the state type, for example, Visual C++ compiles the entire thing out and just passes a literal result to std::cout. Pretty cool.


I have no idea if this is remotely useful, mostly because I'm not convinced it's useful to me even, but it was a nifty trick and I felt like sharing. So there.


Stupid bit rot

Posted by , 05 November 2014 - - - - - - · 785 views
Epoch
After a couple of days of preoccupation with other things, I sat down tonight to start hacking on Epoch. I was unprepared for the bizarre puzzle that unfolded immediately after.


I had been mucking around for a little while when it came time to compile and test my changes. Everything seemed to build cleanly, and then I started the program, only to be greeted by bright red letters stating "Unsupported type for native code generation."

Mind you, this is my own error message, in my own code, which I wrote, and should remember. Needless to say, there was a rather loud WTF emitted regardless. Time to start poking around.

My usual routine when things go wrong is to binary search my changes until I pinpoint what's causing the error. So I dutifully removed a bunch of new code, rearranged some other code back to its old state, and tried again.

Unsupported type for native code generation.

Rinse, repeat, eventually get to a point where none of my changes are left. Uh... wat? So I did the only thing that makes sense: shelved my entire set of changes (thankfully Mercurial has a great shelf function) and try the code that was demonstrably working a couple of days ago.

I checked all the timestamps of the relevant files, looking for DLLs that might have accidentally been recompiled, looking for source code that might have snuck in a bogus edit or three, probing as carefully as I could for anything and everything that might be different between the known good build and today's.

After concluding that everything was in order, I confidently compiled the working program and launched it, fully expecting to see the successful execution that for fuck's sake was still showing in my terminal window from days prior.

Unsupported type for native code generation.

God. Damn. It.

Same compiler, same runtime, same libraries. Different binary output. Unfortunately I didn't think to keep a copy of the old working .EXE to compare with the new busted one, because hey, I expected the universe to play fair and obey the typical laws of physics whereby when you don't change anything it doesn't mysteriously break three fucking days later for inexplicable reasons.

Compiling a different project yields the expected results: no changes = binary, bit-for-bit identical output to what it used to produce.

Something is different between now and last week, and I have no idea what.

Out of desperation, I decide to look at the revision history on the Mercurial repository, and start rolling back changes until I find something that does work.

It takes exactly ten seconds to realize what went wrong.


I had a chunk of changes that I made late one night in between the last successful test build and now. I never did run them to make sure they worked. Oops.

Turns out, obviously, they didn't work. There's a bug in the compiler that got triggered by the changes I made that night, and that was cascading into the eventual barf that shocked me so much earlier this evening.


So that got squared away, at the expense of most of my brain power and energy for the night. I did manage to hack in support for assertions and reeaaaaallly rudimentary unit test "success" in the new runtime, but that's about all I can stomach of this mess for one night.


My general plan of attack is to slowly get the compiler test suite passing again, one test at a time. So at least now one test passes... one of nearly 100. It'll take time, but I'm still enthusiastic about the direction the project is taking, and I'm sure the heavy lifting will be over sooner than I expect.

Time to become unconscious.


More hacking on native executables

Posted by , 31 October 2014 - - - - - - · 915 views
Epoch
I've started the long and tedious process of slowly but surely hooking up every single Epoch language feature to the new LLVM bindings, so that the compiler can emit 100% native binaries instead of JIT compiling the native code when the program is started.

So far the only thing that works is printing out strings, and only statically defined strings at that. But that's something, because it means that the import thunk table works, function invocation works, and the embedded .data segment works. In less obscure terms, the Epoch runtime can be called from a native program and use constant data that's stored directly inside the executable file.

The infrastructure for doing all this took a bit of work to set up. The module responsible for generating EXE files from compiled IR is, give or take, about 1000 lines of Epoch code. The C++ shim that talks to LLVM is another 600 or so lines of code. The runtime itself is dead trivial but that's only because it doesn't have 99% of the language functionality in it, nor a garbage collector, nor a threading/task switching model.

It may not sound like a particularly large volume of code, but every line represents a significant battle to figure out all the intricacies of the Windows Portable Executable program format, the way early-bound DLLs work, how to map Epoch constructs into LLVM constructs, and so on. The amount of effort put into every ounce of this code is tremendous, given that I only have a few hours a week to hack on this project typically. The biggest hurdle is losing my mental context every time I have to call it quits for the night; if I could concentrate a solid five or six hours of focused work on Epoch, I could probably triple my productivity. Sadly, I just don't have that luxury right now.

Given the constraints I'm under, I'm pretty happy with progress thus far. It may take a while to get all of the various language features back to a fully functional state, but the project as a whole is already benefiting immensely from the reduced complexity of the pipeline as well as the general flexibility and power of the new architecture.


For now, though, I desperately need some sleep.


Native Binary Project: Day Whatever

Posted by , 27 October 2014 - - - - - - · 707 views
Epoch
I continue to hack on the Epoch compiler, slowly shaping it into a powerhouse of executable binary generation. So far I've gotten the program string table and DLL import table built in a flexible and extensible manner. This is the first step towards getting actual code generation back online at full capacity.

Now that I can link to DLLs from native Epoch programs, I can start replacing the old runtime library with a newer, cleaner, and slimmer version. As part of the process, I'm rewriting the LLVM bindings, which means that code generation is going to be broken for a while, at least in part.

A few minutes ago, the compiler generated a rudimentary .EXE that imports the new Epoch runtime DLL and calls a function in it. This is all completely flexible - none of the data is hardcoded or hackish to make it work. This is in direct contrast to the old method of building Epoch .EXEs, which basically amounted to hardcoding a tiny program that launched the main program after JIT compiling it with LLVM.


My goal here is to do absolutely everything I can in Epoch, and slice out as much C++ code as possible. I'm still playing around with ideas in my head for how to make the LLVM bindings as slim as possible, but it shouldn't be hard to seriously cut down on the amount of fragile C++ hackery being done there.


It's kind of annoying how much sheer effort I can dump into a project like this, and then be able to summarize the entire affair with a single sentence. I'd rail on about all the detailed hacking and prodding that went into making this happen, but it's actually pretty boring and involves a lot of profanity. So instead, I'll just leave you with a short journal entry.


And now to bed, and now to bed.


Rewrite ALL the things!

Posted by , 23 October 2014 - - - - - - · 558 views
Epoch
OK so I'm not really rewriting everything... just... a lot of it.

The current runtime infrastructure for Epoch is precarious at best, and desperately needs some improvements. There are a number of things I want to do that all happen to coincide nicely with a desire to rewrite the runtime system:
  • Destroy the EpochLibrary DLL forever. This is an ancient crufty artifact that deserves death, and getting rid of it should simplify the entire build pipeline noticeably.
  • Improve program start times. I've posted about this before; basically the Epoch runtime JIT compiles your program every time it starts up, which in the case of the compiler leads to about a 6 second stall. Gross.
  • Emit true native binaries - tied in with the above point, I'd like to emit genuine native .EXE and .DLL files instead of mutant VM hybrid files. This will enable the use of things like dynamic linking which will be a huge win.
  • Separate the compilation pipeline into modular pieces. Right now the parser is "reused" by brute-force including the code in both Era and the compiler itself; instead, I'd like to make the parser and AST stuff into DLLs that feed data around to each other.
I have a loose mental strategy for doing all this, which looks vaguely like the following:
  • Build a new DLL (tentatively EpochLLVM) which wraps LLVM and allows the compiler to make simple C-API calls to set up LLVM IR and generate executable machine code from it.
  • Retrofit the existing Epoch compiler to use this new DLL when generating binaries.
  • Rebuild garbage collection and other runtime infrastructure in a separate DLL (tenatively EpochRuntime) or maybe a few DLLs.
  • Self-host the compiler through this new chain.
  • Convert Era to use the new chain.
  • Build support for emitting Epoch DLLs.
  • Proof-of-concept the DLL system by replacing the C++ lexer used for Scintilla syntax highlighting with the actual lexer.
  • Split the remaining parser, AST, and codegen logic into separate DLLs.
  • Self-host again using this new infrastructure.
  • Ship Release 16.
This ought to keep me busy well through the end of the year...

The benefits will be huge though. In addition to being able to write Epoch DLLs, getting faster start times, and cleaning up the modularity of the code a lot, this will pave the way towards integrating higher-level source processing tools with Era, as well as giving me an opportunity to revisit some historical but gross decisions, like the use of UTF-16. Last but not least, it gets even more of the implementation of the language and tools moved into Epoch instead of C++.

Overall it's a daunting amount of work, but I think it can be managed. The real trick will be staying interested in the process during the long dark period where nothing quite works. I hope my skeletal plan will give me plenty of moments to sit back and enjoy visible progress, but we shall see.


So, here goes nothing!


Slimming down the Epoch runtime and improving program start times

Posted by , 18 October 2014 - - - - - - · 541 views
Epoch
Right now a big limitation of the Epoch program execution model is the fact that .EXEs generated by the compiler are not in fact native binaries. They are compact bytecode wrapped in a thin shim that churns said bytecode through an LLVM wrapper layer each time the program is launched. This means that every single time you start an Epoch program, it is basically getting re-built and re-optimized from bytecode and then JITted to native machine code.

This means that the compiler, for example, takes about 6 seconds to start on my machine. Pretty sad.

The way around this is to build native binaries using LLVM and just stuff them on disk. The runtime still needs to exist in order to provide garbage collection support, marshaling into other native processes, and so on. However, it will be much skinnier and more efficient without carrying the weight of the LLVM wrapper all over the place. Net result should be faster startup times for Epoch programs and less memory consumption overall.


Easier said than done.

LLVM doesn't directly generate machine code that's really all that suitable for emitting to a binary on disk. It leaves out a lot of platform-dependent details like how to link to DLLs and how to handle global data. Much of the infrastructure is designed to assume you're emitting machine code for JIT runtime purposes, not necessarily for serialization.

Now obviously this isn't insurmountable; plenty of compilers depend on LLVM (notably Clang) and generate binaries just fine. So the real magic is going to lie in finding all the places where JIT code doesn't serialize nicely and working around them.


Off we go!


Epoch Mailing List

Posted by , 13 October 2014 - - - - - - · 493 views
Epoch
I've spun up a mailing list for the Epoch project, mostly because I'm tired of having conversations about it span half a dozen websites and PMs and blah blah.

Here's the link: clicky. Or you can email epoch-language@googlegroups.com to accomplish the same thing.

It should be open to anyone to come and talk about the language or just ask questions. I'll be seeding the discussion with a few subjects that are still open questions in my mind and hopefully people will jump in and kick some thoughts around.


Three toads and the spit of a badger

Posted by , 13 October 2014 - - - - - - · 421 views
Epoch
So Epoch has a couple of warts that I'm looking at smoothing out. One of them has to do with overloads that do type decomposition of algebraic sum types:

//
// How things look today
//
type Optional<type T> : T | nothing

Test : integer x
{
    print(cast(string, x))
}

Test : nothing
{
    print("Nothing")
}

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}
I'm thinking about modifying the parser so that overloads of a function can be chained together without repeating the function name, by just specifying the signature of an overload immediately after the body of the function, and following with the body of the overload:

//
// How things look with implicit overloads
//
type Optional<type T> : T | nothing

Test : integer x { print(cast(string, x)) }
     : nothing   { print("Nothing") }

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}
The last question I want to ask is what lambdas look like, and whether or not they can help compact this code even further. Here's one thought experiment:

//
// Experiment with lambda syntax
//

type Optional<type T> : T | nothing

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    var f = lambda
     :(integer x) { print(cast(string, x)) }
     :(nothing)   { print("Nothing") }

    f(exists)
    f(doesntexist)
}
I'm not totally sold on the lambda syntax yet. I do like the idea of allowing overloads of lambdas, specifically for the type decomposition thing. Or maybe type decomposition just really needs a first-class structure, like case:

//
// Experimental case statement
//

type Optional<type T> : T | nothing

Test : Optional<integer> in
{
    case(in)
        : integer x { print(cast(string, x)) }
        : nothing   { print("Nothing") }
}

entrypoint :
{
    Optional<integer> exists = 42
    Optional<integer> doesntexist = nothing

    Test(exists)
    Test(doesntexist)
}







December 2016 »

S M T W T F S
    12 3
45678910
11121314151617
18192021222324
25262728293031