• entries
  • comments
  • views

About this blog

A bipolar guy in a pressure-cooker industry

Entries in this blog


Recently IaEUR(TM)ve been working on a rather difficult task, namely creating PDB debug database files from the href="https://github.com/apoch/epoch-language">Epoch Language compiler toolchain.

This is difficult in part because the format of PDB files is generally not well-understood, and is certainly poorly documented. I canaEUR(TM)t go much further without a hearty thanks to the LLVM project and particularly their tool llvm-pdbdump which makes it much easier to test whether or not a generated PDB is sane. When llvm-pdbdump has good information about the state of a given PDB, it is invaluable; and when it falls short, as is inevitably the case with a format like PDB, it at least gives me a starting point for understanding why things have gone wrong.

However, there is another tool, from Microsoft themselves, called Dia2Dump.exe which uses an authoritative implementation of the PDB format, via the file MsDia140.dll on Visual Studio 2015. This library is (as near as I can tell) close to or identical to the code used by Visual Studio itself for debugging programs. It also seems to parallel the implementations in WinDbg and DbgHelp.dll, both of which I use extensively in my research.

Last but not least, I must mention the Microsoft-PDB repo on GitHub, which is partial source for the implementation of the PDB format. It does not actually compile right now, so itaEUR(TM)s hard to use, but it has a significant purpose for me: I can cross-reference functions in MsDia140.dll with this code, and use that for some serious reverse-engineering.


Sometimes when feeding data into a black box like MsDia140.dll it can be hard to know what code paths are taken and why. For example, letaEUR(TM)s look at the function GSI1::readHash (see here to follow along in the source).

This function does some stuff I still donaEUR(TM)t fully understand, so letaEUR(TM)s walk through the process of gaining more understanding.

First we need a partially malformed PDB. This is easy to do since PDB files are sensitive to tiny changes, often in non-obvious ways. In particular, IaEUR(TM)m going to work on the Publics stream. This is a fragment of a Multi-Stream File (aka MSF) which contains, among other things, publicly visible debug symbols for some program.

At the beginning of the stream, there is a structure which llvm-pdbdump is sadly cryptic about. Thankfully, llvm-pdbdump contains some sanity checks which seem to align well with the checks made by MicrosoftaEUR(TM)s code, so itaEUR(TM)s at least easy to use the tool to verify what weaEUR(TM)re spitting out.

readHash is responsible for decoding part of this data structure, which appears to be some kind of hash table for accelerating symbol lookups. Inside the code for readHash (see link above) there is a call to a pesky function called fixHashIn. By attaching WinDbg to a running copy of Dia2Dump.exe and setting liberal numbers of breakpoints, I traced a failure in my PDB generation code to this single function. fixHashIn is vomiting because IaEUR(TM)m feeding it data it doesnaEUR(TM)t like.

The first thing to note is that fixHashIn begins with a decrement instruction to decrease the value of one of its parameters. This parameter is supposedly the number of buckets in the hash table, or so I extrapolate from the source.

In my case, the parameter has a value of zero! Clearly I donaEUR(TM)t want my hash table to have zero buckets, so it becomes apparent why fixHashIn is choking. What I donaEUR(TM)t immediately understand is why it thinks zero is the number of bucketsaEUR|aEUR' I had thought that I was passing a value in (8 bytes per entry * 16 entries) that would work. Clearly I was wrong, but where was the zero coming from?

MSF Files

A little more background is in order. In an MSF file (MSF being a superset of PDB files), data is divided into streams, each of which is built up of one or more blocks. A block can be different sizes, but IaEUR(TM)m using 1KB (1024 bytes) for convenience. Data not used is filled with junk bytes.

Crucially, I pad my blocks with zeroes. If somehow the PDB interpreter is reading one of my padding bytes, it might be incorrectly assuming I want to feed it a zero-size hash tableaEUR|aEUR' obviously a problem. So what to do?


And now the meat of everything!

Instead of padding my file with zeroes, I use carefully crafted poison values. For my purposes IaEUR(TM)m working with 32-bit data, so a poison value is usually 4 bytes long. A good example is 0xfeedface which is a funny but valid hex number that happens to be the right size.

The important thing is that we canaEUR(TM)t just pad every 32-bit slot with 0xfeedface. Instead, we want to make permutations of the poison value - one unique permutation per slot. Every possible 4-byte sequence of my PDBaEUR(TM)s "padding" is now a unique string of digits.

HereaEUR(TM)s the magic part: when I run this in the debugger, I can walk into the fixHashIn function, and look at its parameters.

My first run of this process is surprising - despite poisoning a bunch of data around where I thought this zero was coming from, the value is still zero when we reach the fixHashIn function! This indicates one of two things.

  1. The value is read from a place I didnaEUR(TM)t poison

  2. The value might be computed somehow

To rule out the possibility that IaEUR(TM)m not poisoning enough, I expand the poison to the entire file instead of just one blockaEUR(TM)s worth of padding bytes. The debugger still stubbornly shows the parameter as zero, meaning that the zero is being computed from some other data being fed in, not read directly from the file on disk.

This line of the Microsoft PDB source is illuminatingaEUR|aEUR' but href="https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/dbi/gsi.cpp#L65">this line even more so. At line 65 is a comment stating that fixHashIn is called from two placesaEUR|aEUR' one of them is the loader for the Publics stream, but one is for a totally unrelated stream called Globals!


It turns out IaEUR(TM)ve been hitting breakpoints all evening in fixHashIn, but the call stack is wrong. The calls IaEUR(TM)ve been seeing are from a totally different stream of data.

This post may not have a cheerful ending, but I hope the value of poisoning data is clear: I may have taken days to realize my mistake without having 100% proof that the evil zero was not coming from my Publics stream.

In any event, I use the poison technique a lot, and this is just one sampling of my adventures with the PDB format. Maybe IaEUR(TM)ll have a better story of success tomorrow!



Early this morning, Epoch achieved a minor success in the debugging department. I was able to generate a working PDB file with symbols for a small test program, run the program in Visual Studio and WinDbg, and step through the disassembly.

Along the way, the current Epoch function was correctly tracked, indicating two things: stack unwinding information is working correctly, and function-address-to-symbol mapping is working as well.

The next step is to get line number information piped through to the PDB from the parser. This will be a major haul, but well worth it since it will allow source-level debugging of Epoch programs.

Once that is complete, I plan to tackle type metadata and variable tracking, so that the values of variables can be visualized in the debugger. ThataEUR(TM)s an even bigger lift, so I donaEUR(TM)t expect it any time soon.

That said, I plan on waiting until debugging is in a good state before resuming work on self-hosting, because having debug information available makes that process vastly more convenient and approachable.

All in all itaEUR(TM)s a good day for Epoch!



My recent adventures in self-hosting the 64-bit Epoch compiler have led me to a significant conclusion: it isnaEUR(TM)t worth trying to self-host a compiler when you canaEUR(TM)t debug the target language.

A much better use of my time would be to improve the languishing PDB generation experiment and get the code set up to actually emit usable debug symbols for Visual Studio and WinDbg.

It presently takes several minutes to build a candidate compiler; given that fact, it makes little sense to try and brute-force my way to correctness. Debuggers are valuable tools and shouldnaEUR(TM)t be left as afterthoughts in the development of what aims to be a production language.

So IaEUR(TM)m dusting off the code for PDB emission and working on a tiny shim DLL that will provide some hard-coded one-off features that might be needed in the course of getting the legacy 32-bit compiler to generate debug information about 64-bit executables.

One such thing that has come up is that, since vanilla Epoch lacks pointer arithmetic, it is hard to do serialization well. The shim DLL currently contains a single function, GetBufferPtr, which takes an input pointer and offset and returns the pointer adjusted by that offset. In other words, itaEUR(TM)s a glorified pointer-add.

This isnaEUR(TM)t really satisfying to me as a long-term way to write Epoch code, but IaEUR(TM)ve decided that debug information is more important than implementing 64-bit features, including self-hosting. As such, itaEUR(TM)ll have to suffice for a while.



Just a short while ago, the first working 64-bit compiler for Epoch was produced!

Well, "working" might be a minor stretch; it launches, prints a simple text banner message, and then exits cleanly. But that represents a lot of operational code by itself.

  • The 32-bit compiler is able to lex, parse, type-check, and code-gen the entirety of the 64-bit compileraEUR(TM)s source code.

  • The 32-bit linker can emit 64-bit binaries, assisted by LLVMaEUR(TM)s machine code generation facilities.

  • The 64-bit compiler binary is a completely functional Windows executable image.

  • This executable can run to completion on 64-bit Windows versions.

  • Inside the compiled binary is a table of string constants.

  • 64-bit Epoch code can load those strings and route them out to the command-line console.

  • A number of support DLL calls are involved in this process, including loading garbage collection metadata and stack tracing for identifying GC roots.

All told there are hundreds of thousands of lines of code involved. Building the 64-bit compiler takes about 164 seconds (just over two and a half minutes) when using debug versions of LLVM. (For comparison, the 32-bit compiler can self-host in under 20 seconds, but thataEUR(TM)s an unfair comparison because that build process uses optimized Release versions of LLVM.)

IaEUR(TM)m pretty pleased with this progress. There are still many things left to get working, though.

  • 64-bit globals do not work correctly; all of them are currently stuffed into a single random address which may or may not crash when dereferenced.

  • More support DLL calls need to be implemented or eliminated.

  • Certain code constructs do not work correctly yet; this is worked around for the time being by not using them in the compiler, but they will be good to get working as soon as is practical.

  • A large number of hacks and temporary shims exist in the linker. This will need to be cleaned up substantially before self-hosting is really practical.

  • Debug metadata and symbols are not generated correctly yet.

  • Visual Studio integration has a number of bugs, ranging from the pesky to the outright unusable.

  • It is exceedingly likely that there will be bugs in the compiler, meaning that 64-bit self-hosting is still a ways out even if the basics are operational.

Making the highly optimisic presumption that this will all happen soon, I think itaEUR(TM)s fair to say that once all of the above is addressed (and 64-bit self-hosting is complete) it will be time to cut another release of Epoch.

In all probability, though, IaEUR(TM)ll course-correct sometime between now and then, but it never hurts to have objectives!



For a decent while now, IaEUR(TM)ve been working on self-hosting the Epoch 64-bit compiler. This involves getting the compiler to a point where it is robust enough to actually compile itself. In order to do this, IaEUR(TM)m using a modified 32-bit compiler which generates 64-bit binaries. Once a working 64-bit compiler is emitted, I can feed that compiler back into itself, thus completing the head-trip ouroboros that is self-hosting or "bootstrapping" a compiler.

At the moment, the compiler can successfully lex, parse, type-check, and partially code-gen itself. In practical terms, this means that the front-end of the compiler is working fine, but the back-end - the set of systems responsible for turning code into machine language and emitting a working executable - remains incomplete. For a slightly different perspective, IaEUR(TM)m generating LLVM IR for most of the compiler at this point.

The bits that are left are corner cases in the code generation engine. There are things like intrinsic functions that need to be wired up, special semantics to implement, and so on. In particular, right now, IaEUR(TM)m working on solving a corner case with the nothing concept. nothing is an Epoch idiom for expressing the idea that there is no data; except, unlike traditional null, nothing is its own type. If something has a type it cannot be nothing - again, unlike null. The usefulness of this may seem questionable, but the distinction makes it possible to avoid entire classes of runtime bugs, because you can never "forget" to write code that handles nothing - the compiler enforces this for you!

Anyways, the trick with nothing is that you can pass a literal nothing to a function as an argument, to signify that you have no semantically valid data to pass in. This is handled correctly by the parser and type checker, but falls down in code generation because we canaEUR(TM)t actually omit the parameter from the function call.

What happens is the code generator creates a function with, say, 3 parameters. If the second parameter is nothing at a call site, we have to still pass something over to the function, from LLVMaEUR(TM)s perspective. So we generate a dummy parameter that essentially translates the nothing semantics into null semantics - something LLVM can recognize.

Now things get complicated.

If we have an algebraic sum type that includes the type nothing, and we pass a sum-typed variable into a function which expects concrete types, the code goes through a process called type dispatching. This process basically matches an overload of a function with the runtime types of the arguments passed in. Think of it like virtual dispatch with no objects involved. (Strictly speaking, type dispatch in Epoch is multiple dispatch rather than the single dispatch seen in more popular languages.)

To facilitate all this, the compiler inserts annotations into the code, so that it can deduce what set of overloads to choose from when the runtime dispatcher is invoked. Some of these annotations survive at runtime - analogs of virtual-table pointers in C++.

Annotations are passed as hidden parameters on the stack when invoking a function. And at last we reach the real wrinkle: a nothing annotation can come from two distinct places: either the construction of a sum-typed variable which allows nothing as a base type, or a literal nothing passed to a function call.

The headache is that, to LLVM, both uses look like a function call. There is special case logic that exists to fix up the annotations for sum-typed constructors. Unfortunately, that logic collides with the logic needed to fix up annotations for general function call usage because LLVM doesnaEUR(TM)t know the difference.

ItaEUR(TM)s an imminently solvable problem, but itaEUR(TM)s a headache. Hopefully once this bug is gone there wonaEUR(TM)t be too many more to swat before I can start code-generating working 64-bit compilers.

(Spoiler: IaEUR(TM)m not optimistic.)



A few minutes ago, the first 64-bit self-hosted compiler for Epoch finished the code-generation processaEUR|aEUR' unsuccessfully.

For context, this means that the 64-bit compiler (as built by the existing 32-bit compiler) was lexed, parsed, type-checked, and turned into LLVM IR. What didnaEUR(TM)t happen is a successful code emission, i.e. the compiler is not yet producing a working executable image.

What it does produce is about 6.8 MB of errors, or just over 121,000 lines of output. This indicates that something in the code-gen process is off. WeaEUR(TM)re generating LLVM IR but it canaEUR(TM)t be turned into machine code because it is malformed in some way.

Inspection of the error output shows that one of the biggest offenses is bad linkage on a global variable. Epoch aspires to minimize the use of global state but itaEUR(TM)s a useful construct while bootstrapping a compiler. Fixing this mistake is trivial and reduces the error volume to much less.

In fact, the vast bulk of the output is actually the text of the LLVM IR in pretty-printed form. This "dump" is generated to help diagnose code-gen bugs, but itaEUR(TM)s meant for much smaller programs than the entire compiler! Culling the dumped IR shows that there are in fact only 208 errors left (after the global linkage fiasco was addressed). And all of them are the same "sort" of erroraEUR|aEUR'

Terminator found in the middle of a basic block!

LLVM divides code into sections called basic blocks. A single basic block represents a linear sequence of instructions, i.e. every instruction in it is executed exactly once. The way to accomplish branching flow control is to end a basic block with a branch instruction, likely a conditional branch of some kind. Branches target other basic blocks to allow different code paths to execute based on the branch.

The dreaded error "Terminator found in the middle of a basic block!" means that the constraints have been violated. Someone tried to issue a branch instruction in the middle of a block, which ruins the idea that every instruction in the block executes exactly once.

In concrete terms, this error signals a bug in the code generation process. It means that somewhere along the line, the Epoch compiler lost track of a basic block being terminated, and continued shoving instructions into it after a branch.

Thankfully, LLVM barfs a "label" when it emits this error, and that label is sufficient to locate the offending basic block:

1>  ; <label>:41                                      ; preds = %11
1> br label %9, !dbg !3338
1> br label %42

Sure enough, there are two branches being attempted here. The larger context is uninteresting (itaEUR(TM)s a nested if-statement inside a binary tree insertion routine) but the specific failure appears many times, meaning that itaEUR(TM)s probably a small number of actual code-generation bugs to solve.

Testing, 1 2 3

As with any good software, robustness in a compiler happens only when enough bugs have been fixed while simultaneously ensuring that no new ones are introduced. The best tool I know of for doing this is automated testing. Now that a compiler bug has been identified, the objective is to replicate it in as tiny a program as possible.

This "test case" provides two things: a way to reproduce the bug on-demand so a fix can be tested, and a way to detect if the bug ever reappears. The Epoch compiler test suite is still small, but invaluable for addressing this sort of problem. I will add this particular code to the test suite and hopefully have a fix in short order.



This is a quick test of HubPress.io to see how I like it. Assuming all goes well, I will probably resume posting Bag of Holding entries here soon. Ancient archives from the Bag of Holding are on GameDev.net if you happen to like my writing. They are very, very old though.

For now, hereaEUR(TM)s a sneak preview of where Epoch is headed:

simplelist types = new 0, nothing


I spend a lot of time thinking about programming languages. Most often it's from the perspective of a consumer, i.e. a programmer using Language X. Sometimes it's from the viewpoint of someone who wants to explore Language Y, either for fun or for practical usage. Occasionally I see languages from the vantage of a language creator.

Recently, I've sunk a lot of brain cycles into thinking about languages as a non-programmer. More specifically, I've been thinking about the tendency for large (and sometimes small) games to use "scripting" languages to do various things.

I think there are basically two (often overlapping) desires in play when using an embedded scripting language. One is to make programmers more productive. Often this is believed to be possible by using "higher level" languages to write certain types of logic. The other common desire is to make non-programmers more productive, by enabling them to directly modify logic in the host program/game.

The more I ponder this, the more I'm convinced that in most cases, neither desire actually justifies the use of a general-purpose scripting language.

Lies About Language Productivity
It is fairly common in programming circles to hear assertions like "Language X is more productive than Language Y." A popular tag for this used to be "rapid development" although I hear that term less nowadays. I will not attempt to tackle the debate about whether or not X can be more productive than Y. However, what I would like to point out is that this is actually irrelevant!

If we're talking about embedding a scripting language into a host program, like a game, we aren't comparing X and Y. We're comparing "X+interop layers+Y" to Y by itself. This distinction may seem pedantic, but it's crucial. If we wanted to add X to our Y-based program in order to boost programmer productivity, we have to be careful here.

Depending on how the job is done, adding a language to a program can actually be a massive net loss of productivity. Suppose we have a C++ program, written by C++ programmers, who implement a Lua interpreter in their program. Now the average programmer working on the program must know not only two different languages, but must also be intimately familiar with how the two interoperate.

If we designed APIs like this, we'd revile the result as a leaky abstraction, and rightly so. I content that embedding a general-purpose language is almost always a net loss, for the following reasons:

  • Programmer knowledge is now required to span multiple languages
  • Programmers MUST be fluent in the boundary layer between languages
  • Interop is likely less performant than staying in a single language, especially if experts in the interop are unavailable
  • Impedance mismatches are guaranteed to crop up, i.e. languages think of things differently and have different idioms
  • The mental overhead of managing multiple languages is taxing and leads to higher bug rates and nastier bugs
    So if you want to embed a general-purpose scripting language in your game to make development more productive, don't. Focus on making better tools and making better architectures. A good architecture allows you to build high-level logic almost as if it were scripting anyways. There is no need to do it in a different language.

    Note that this is in no way an argument against using, say, C# to write your level editor and C++ to write your game.

    What About Designers?
    The other motivation I mentioned above for doing scripting systems is to allow non-programmers to do powerful things.

    I'm going to point out something that feels painfully obvious to me, but in harsh experience, far too many people just cannot wrap their head around:

    Shoving a general-purpose programming language in a non-programmer's face is a great way to lose team members and make people very frustrated.

    I can safely assume that most of my readers are not astronauts. So if I were to cram you into an Apollo capsule and yell GOOD GODDAMN LUCK while slamming it shut, I can safely assume that you'd react negatively. You're not trained to think about the tools in front of you. You're not informed as to what they're for. You don't know what to expect after the door closes and the last echoes of my jerk voice fade away. You're in an alien environment and can't even tell what will happen next.

    That is what putting a designer in front of a general-purpose scripting engine feels like.

    Why do I keep mentioning languages as "general purpose"? Because there is an alternative, which works, and works well.

    Domain-Specific Languages
    Many people conflate DSLs with "very limited languages." This is unfair. A DSL can be immensely powerful and still be confined to a particular domain. It can even be Turing complete if you really want.

    I half-jokingly posted on Twitter the other day that Excel combined with a verb system would make a great DSL for game designers.

    And the more I think about it, the less I think that's a terrible joke of an idea.

    I might just give it a shot someday.

MMOs! Y u no EZ?

A common aspiration among nascent game developers is to make an MMO. It's actually a pretty common aspiration in many perfectly functional, well-established studios, as well; but at least on GDNet, we mainly see newbies who want to make the next World of Warcraft, and so it's easy to be misled into believing that the MMO fascination is somehow peculiar to those who "don't know any better."

The reality, though, is that plenty of non-newbies have tried and failed to make successful MMOs. Many projects begin and never see the light of day; still more launch and are ultimately failures as business ventures. There's a common cause here, I think: making an MMO is hard, and for reasons which are not altogether obvious. The newbie wants to make an MMO because he doesn't understand how hard it is; the veteran industry studio fails to make an MMO because they don't understand how hard it is, either.

To do something hard, and do it well, requires preparation and a lot of awareness. Even understanding why it is hard can be tough in and of itself. It doesn't help that basically every successful MMO developer ever is highly protective of their secrets - and rightly so.

As a successful MMO developer, I'm going to uphold the tradition, and not talk about the secret sauce that makes it possible to solve the hard problems of MMOs. But I will take some time to shed some light on what those problems really are, and why they are hard.

Content Volume
This is the one most people get hung up on, so I'll dispense with it right away. Producing the sheer amount of content that goes into a genuine MMO is hard, yes, but only because it's expensive. If you have a boatload of money and a lot of talented content creators, this problem isn't really that big of a deal. After all, huge single-player games get made all the time.

If you're not driving home in a dump truck full of hundred dollar bills, then this problem is obviously going to be pretty hard.

The Speed of Light
This is a genuinely hard problem because we can't change the speed of light. However, the speed of light (or more accurately, the speed of signal propagation in physical network equipment) represents a very real limit to how responsive a multiplayer game can be. The farther away your computer is from the "other" computer, the longer it takes to send a signal to that computer. For the sake of simplicity in this article, just assume that the back-of-the-napkin math limits us to some given latency in the tens of milliseconds for an average trip time. (I'm being very fuzzy deliberately.)

The bottom line here is that if I want to have realtime interactions with someone on the opposite side of the country, I'm going to have to wait an almost perceptible amount of time for my signals to reach them, and then an almost perceptible amount of time for their response to come back to me. Put together, this means that even in the purely optimal case where I can talk directly to other players, I might notice latency, or "lag", if they are far enough away.

So what about if I can't talk directly to my peers, but need a server?

Network Architecture
This is where things get truly fun. MMOs are traditionally client-server based, meaning that for me to talk to player B, I talk to my server first, who then relays my interactions to player B on my behalf. This is why server performance matters so much; if the server is busy and takes 60 milliseconds to forward my message to player B, I just made the speed-of-light problem that much worse.

But suppose I need to not just poke player B, but also demonstrate to the other 60 people in the immediate area that I did in fact poke player B. Now my server has to send 60 more messages. Each of those messages takes time to send - we don't have the luxury of sending multiple messages in parallel.

This means that the network that handles the game (i.e. typically a datacenter setup) has to be able to successfully blast out 60 of those messages to a potentially geographically diverse set of people, while maintaining minimal latency. Remember, again, every millisecond we pile on to the message trip time is a millisecond the player might end up feeling as lag.

Now, routing internet traffic is a pretty well-handled problem these days (in most of the world). Figuring out how to get our messages to each of our 62 players in a timely fashion is thankfully not that hard. What is hard is the fact that access to that level of internet beefiness is not cheap and not easy to attain. Internet Service Providers (ISPs) have peer agreements with datacenter installations, which govern how much traffic can come and go from the datacenter, and what quality of service it receives - i.e. how much latency and how much bandwidth will be involved.

A lot of people like to suggest "the cloud" as a solution here. The problem is that cloud providers generally can't compete with the quality of service that a good peering contract will give you. Yes, cloud services are great and still improving fast, but as of today, a dedicated datacenter is a better way to host a latency-sensitive game.

Datacenters, and peering agreements, are expensive propositions. This is back to the content volume problem, though - money can solve it. So let's look at other network related headaches.

The N^2 problem
Suppose we have a client-server model where 10 players are gathered to do whatever. Player 0 suddenly decides to jump up into the air. Now, players 1-9 need to see that, and player 0 needs confirmation that his character is jumping. So that's a total of 10 messages to relay.

Now suppose all the players jump simultaneously. Each player generates 10 messages, as we saw a second ago. Total up the messages for all the players: we must now relay 100 messages - 10 squared. This is generally referred to as "an N-squared problem."

N^2 problems occur all over the place in networked game construction. They have implications on many aspects of implementing an MMO: physics, game rule simulation, network throughput, security (think DDoS problems) and so on. There are ways to solve and/or mitigate the pain, but generally, every system that is affected by N^2 needs to solve the problem a little bit differently.

This means an interesting thing for software architecture: what solves your graphics N^2 may not solve your physics N^2, and both of those might fail to work with combat rules, and even then you need to figure out the network throughput issues, and so on and so forth. Suddenly you have a dozen systems that behave all slightly differently that have to be stitched together into a harmonious whole. That's not always easy, and in fact, it's one of the things that makes implementing a good MMO very difficult.

Nonlinear scaling
The final general principle I'd like to cover is that scaling is rarely linear. Let's say you make a multiplayer game. With 5 players, your server is doing great - 10% CPU usage. With 10 players, 18% CPU usage. With 20 players, 32% CPU usage. If you're following the trend, this sounds awesome! We can probably hit 60 players before running out of CPU! Except you add the 40th player and the CPU pegs out at 100%. What went wrong?

Scaling is rarely linear! We extrapolated that we could triple our 20-player numbers and get 60 players comfortably, but this extrapolation turns out to be flawed. What most people don't realize is that nonlinear scaling is everywhere in MMOs.

Take our N^2 scaling problem from earlier, even. It looks kinda linear if you don't look hard enough, but once you throw on the numbers, it bites hard. Believe it or not, there are worse ways to scale than N^2.

Network traffic, memory usage, CPU usage, database transaction volume, communication between back-end server systems... all these can scale in nonlinear ways. And when they do, you want to know about it long before it turns into a player-facing outage. This means testing, and load testing in particular. Good load tests are hard to do in general, but stress-testing an entire MMO infrastructure and proving it robust is particularly difficult. It makes the difference between a good launch and a bad launch, between profitable operation and burning money.

Parting Thoughts
There are many more challenges to making an MMO a successful business: security, legal issues, attracting a big enough player base, maintaining the player base (aka retention), monetization models, community management, and so on. Even for the issues I've touched on above, I've barely scratched the surface.

I hope this underscores the fact that MMOs are hard. Not impossible, certainly not impossible. Just very, very hard. It takes a diverse collection of skills and expertise to do them right, and sadly that mix just doesn't happen that often. I hope this writeup isn't depressing to aspiring MMO-makers; instead, I hope it gives you a start at understanding what exactly you will need in order to be successful.
So I have an interesting situation where I wrote a math-intensive API to allow easy composition of various functions. The details are pretty easy so I'll just show an example of what the API usage looks like:

formula.Create() .Set(0.0f) .AddWithWeight(some_data_source, 0.5f) .Multiply(other_data_source) .Display();The API is meant to operate on non-primitive objects, such as 2D arrays. This gives the end user a really powerful way to stitch together different operations without making the code unreadable or long.

Internally it does what you would guess: Create() allocates some working space to do the math in, and the operations pretty much do what it says on the tin, using that working space. Each function simply returns a reference to the "formula" object (*this) so that the operations can be chained arbitrarily.

Now, here comes the interesting bit: I want to change the API to support not just 2D arrays, but more sophisticated data structures, like, say, a quadtree. Operations should be able to mix and match types so that you can do things like prune the quadtree to only reference certain cells, grab all the objects in those cells, and do some calculus on some property of those objects.

Also, dynamic allocation is forbidden (inside the API implementation itself) and the syntax needs to retain the readability and succinctness of the original.

My first thought (being obsessed with compilers) was to build a sort of syntax tree that represents the operations and then traverse that to do the math. This is a failure on the dynamic allocation front, though, and also introduces a lot of overhead in terms of understanding how the code works. In other words, it involves a lot of machinery that the end user shouldn't have to care about, and maintainers shouldn't be burdened with.

My second thought (being obsessed with compilers) was to throw template metaprogramming at it. This has only one weakness, namely that templates do not work with floating-point numbers - making it impossible to do things like the AddWithWeight() call in the first example. Well, impossible without some fudging, anyways.

To fix the limitation on floating point literals, I decided to use a simple layer of indirection. Instead of putting "1.0" or "7.5" in my code, I use an enumeration value like CONSTANT_ONE or CONSTANT_FOO, since those can be used for template specializations. Then I do just that: specialize some templates to yield the correct floating point numbers on demand.

The next step is to describe the API from a user's perspective. I fiddled around a bit and came up with this:

CreateFormula< int, Add, Add, Multiply, DumpResult >();This does add some ugly to things, namely in that the "int" type is now littered all over the code. Frankly, though, I think this is more of a problem in the silly examples than in real life, because in real life the API is used with many different types, so being able to specify a type at each step of the process is pretty handy. It also opens up the door for template-specialization-based optimizations.

Now that I know what I want the API to look like, it's time to build the implementation:

#include enum EDataSource { DATA_SOURCE_CONSTANT_0, DATA_SOURCE_CONSTANT_1, DATA_SOURCE_CONSTANT_4,};template struct Add { };template struct Add { static void Invoke (T * state) { *state += T(1); }};template struct Multiply { };template struct Multiply { static void Invoke (T * state) { *state *= T(0); }};template struct Multiply { static void Invoke (T * state) { *state *= T(4); }};struct DumpResult { static void Invoke (int * state) { std::cout << *state << std::endl; }};struct Null { static void Invoke (int *) { }};template struct CreateFormula { CreateFormula () { T state = T(); O1::Invoke(&state); O2::Invoke(&state); O3::Invoke(&state); O4::Invoke(&state); }};int _tmain (int argc, _TCHAR* argv[]) { CreateFormula< int, Add, Multiply, DumpResult >(); CreateFormula< int, Add, Add, Multiply, DumpResult >(); return 0;}What we have here is pretty straightforward. Each operation is represented by a struct template. In this case, operations always take exactly two parameters: one to tell the code what type the operation works on, and one to specify the data for the operation. We then use template specialization, as described above, to yield specific data values based on the enumeration. In more realistic code, instead of just having constants, we could have things like "list containing the locations of all nearby sandwiches" or whatever else.

Note that the operation structs have a single static function, Invoke. Invoke takes some "state" and goofs with it; nothing more, nothing less. This is the guts of each operation.

There are two operations worth calling out: DumpResult and Null. DumpResult is simply a way to take the state from the previous operation and put it on the screen. Null is a way to say "don't do anything" - this will come in handy momentarily.

The ugly beast is the CreateFormula template. This guy needs lots of optional template parameters so we can have formulas of varying length. Each optional parameter defaults to Null so that operations which are omitted from the template's type definition can be elided entirely.

Internally, CreateFormula has a single constructor, which does something that should sound really familiar: creates some state, passes it off to a series of operations, and then returns. Note that no dynamic allocation is necessary here; more sophisticated data structure usage might make it tempting to add more dynamic allocations, but that's not really the API's problem - it's down to the user to do the right thing.

Last but not least, in the main() function we see two examples of the template in action. It looks pretty darn close to the original, exactly like the syntax we wanted for the new model, and has the added benefit (over the original) of being more explicit about what's going on at each step.

As you might guess, the real application for this is a lot more complex. It doesn't limit itself to simple operations like add and multiply - there's analytical differentiation, integration by sampling, optimization (i.e. "highest" and "lowest") queries, and so on. So the actual operations are much richer. This is the main reason why operator overloading is not useful; we simply run out of squiggly symbols for all the different operations.

Another thing worth mentioning here is that modern compilers are exceptionally good at figuring out what's really going on in this code. When using int as the state type, for example, Visual C++ compiles the entire thing out and just passes a literal result to std::cout. Pretty cool.

I have no idea if this is remotely useful, mostly because I'm not convinced it's useful to me even, but it was a nifty trick and I felt like sharing. So there.

Stupid bit rot

After a couple of days of preoccupation with other things, I sat down tonight to start hacking on Epoch. I was unprepared for the bizarre puzzle that unfolded immediately after.

I had been mucking around for a little while when it came time to compile and test my changes. Everything seemed to build cleanly, and then I started the program, only to be greeted by bright red letters stating "Unsupported type for native code generation."

Mind you, this is my own error message, in my own code, which I wrote, and should remember. Needless to say, there was a rather loud WTF emitted regardless. Time to start poking around.

My usual routine when things go wrong is to binary search my changes until I pinpoint what's causing the error. So I dutifully removed a bunch of new code, rearranged some other code back to its old state, and tried again.

Unsupported type for native code generation.

Rinse, repeat, eventually get to a point where none of my changes are left. Uh... wat? So I did the only thing that makes sense: shelved my entire set of changes (thankfully Mercurial has a great shelf function) and try the code that was demonstrably working a couple of days ago.

I checked all the timestamps of the relevant files, looking for DLLs that might have accidentally been recompiled, looking for source code that might have snuck in a bogus edit or three, probing as carefully as I could for anything and everything that might be different between the known good build and today's.

After concluding that everything was in order, I confidently compiled the working program and launched it, fully expecting to see the successful execution that for fuck's sake was still showing in my terminal window from days prior.

Unsupported type for native code generation.

God. Damn. It.

Same compiler, same runtime, same libraries. Different binary output. Unfortunately I didn't think to keep a copy of the old working .EXE to compare with the new busted one, because hey, I expected the universe to play fair and obey the typical laws of physics whereby when you don't change anything it doesn't mysteriously break three fucking days later for inexplicable reasons.

Compiling a different project yields the expected results: no changes = binary, bit-for-bit identical output to what it used to produce.

Something is different between now and last week, and I have no idea what.

Out of desperation, I decide to look at the revision history on the Mercurial repository, and start rolling back changes until I find something that does work.

It takes exactly ten seconds to realize what went wrong.

I had a chunk of changes that I made late one night in between the last successful test build and now. I never did run them to make sure they worked. Oops.

Turns out, obviously, they didn't work. There's a bug in the compiler that got triggered by the changes I made that night, and that was cascading into the eventual barf that shocked me so much earlier this evening.

So that got squared away, at the expense of most of my brain power and energy for the night. I did manage to hack in support for assertions and reeaaaaallly rudimentary unit test "success" in the new runtime, but that's about all I can stomach of this mess for one night.

My general plan of attack is to slowly get the compiler test suite passing again, one test at a time. So at least now one test passes... one of nearly 100. It'll take time, but I'm still enthusiastic about the direction the project is taking, and I'm sure the heavy lifting will be over sooner than I expect.

Time to become unconscious.
I've started the long and tedious process of slowly but surely hooking up every single Epoch language feature to the new LLVM bindings, so that the compiler can emit 100% native binaries instead of JIT compiling the native code when the program is started.

So far the only thing that works is printing out strings, and only statically defined strings at that. But that's something, because it means that the import thunk table works, function invocation works, and the embedded .data segment works. In less obscure terms, the Epoch runtime can be called from a native program and use constant data that's stored directly inside the executable file.

The infrastructure for doing all this took a bit of work to set up. The module responsible for generating EXE files from compiled IR is, give or take, about 1000 lines of Epoch code. The C++ shim that talks to LLVM is another 600 or so lines of code. The runtime itself is dead trivial but that's only because it doesn't have 99% of the language functionality in it, nor a garbage collector, nor a threading/task switching model.

It may not sound like a particularly large volume of code, but every line represents a significant battle to figure out all the intricacies of the Windows Portable Executable program format, the way early-bound DLLs work, how to map Epoch constructs into LLVM constructs, and so on. The amount of effort put into every ounce of this code is tremendous, given that I only have a few hours a week to hack on this project typically. The biggest hurdle is losing my mental context every time I have to call it quits for the night; if I could concentrate a solid five or six hours of focused work on Epoch, I could probably triple my productivity. Sadly, I just don't have that luxury right now.

Given the constraints I'm under, I'm pretty happy with progress thus far. It may take a while to get all of the various language features back to a fully functional state, but the project as a whole is already benefiting immensely from the reduced complexity of the pipeline as well as the general flexibility and power of the new architecture.

For now, though, I desperately need some sleep.
I continue to hack on the Epoch compiler, slowly shaping it into a powerhouse of executable binary generation. So far I've gotten the program string table and DLL import table built in a flexible and extensible manner. This is the first step towards getting actual code generation back online at full capacity.

Now that I can link to DLLs from native Epoch programs, I can start replacing the old runtime library with a newer, cleaner, and slimmer version. As part of the process, I'm rewriting the LLVM bindings, which means that code generation is going to be broken for a while, at least in part.

A few minutes ago, the compiler generated a rudimentary .EXE that imports the new Epoch runtime DLL and calls a function in it. This is all completely flexible - none of the data is hardcoded or hackish to make it work. This is in direct contrast to the old method of building Epoch .EXEs, which basically amounted to hardcoding a tiny program that launched the main program after JIT compiling it with LLVM.

My goal here is to do absolutely everything I can in Epoch, and slice out as much C++ code as possible. I'm still playing around with ideas in my head for how to make the LLVM bindings as slim as possible, but it shouldn't be hard to seriously cut down on the amount of fragile C++ hackery being done there.

It's kind of annoying how much sheer effort I can dump into a project like this, and then be able to summarize the entire affair with a single sentence. I'd rail on about all the detailed hacking and prodding that went into making this happen, but it's actually pretty boring and involves a lot of profanity. So instead, I'll just leave you with a short journal entry.

And now to bed, and now to bed.

Rewrite ALL the things!

OK so I'm not really rewriting everything... just... a lot of it.

The current runtime infrastructure for Epoch is precarious at best, and desperately needs some improvements. There are a number of things I want to do that all happen to coincide nicely with a desire to rewrite the runtime system:

  • Destroy the EpochLibrary DLL forever. This is an ancient crufty artifact that deserves death, and getting rid of it should simplify the entire build pipeline noticeably.
  • Improve program start times. I've posted about this before; basically the Epoch runtime JIT compiles your program every time it starts up, which in the case of the compiler leads to about a 6 second stall. Gross.
  • Emit true native binaries - tied in with the above point, I'd like to emit genuine native .EXE and .DLL files instead of mutant VM hybrid files. This will enable the use of things like dynamic linking which will be a huge win.
  • Separate the compilation pipeline into modular pieces. Right now the parser is "reused" by brute-force including the code in both Era and the compiler itself; instead, I'd like to make the parser and AST stuff into DLLs that feed data around to each other.
    I have a loose mental strategy for doing all this, which looks vaguely like the following:

    • Build a new DLL (tentatively EpochLLVM) which wraps LLVM and allows the compiler to make simple C-API calls to set up LLVM IR and generate executable machine code from it.
    • Retrofit the existing Epoch compiler to use this new DLL when generating binaries.
    • Rebuild garbage collection and other runtime infrastructure in a separate DLL (tenatively EpochRuntime) or maybe a few DLLs.
    • Self-host the compiler through this new chain.
    • Convert Era to use the new chain.
    • Build support for emitting Epoch DLLs.
    • Proof-of-concept the DLL system by replacing the C++ lexer used for Scintilla syntax highlighting with the actual lexer.
    • Split the remaining parser, AST, and codegen logic into separate DLLs.
    • Self-host again using this new infrastructure.
    • Ship Release 16.
      This ought to keep me busy well through the end of the year...

      The benefits will be huge though. In addition to being able to write Epoch DLLs, getting faster start times, and cleaning up the modularity of the code a lot, this will pave the way towards integrating higher-level source processing tools with Era, as well as giving me an opportunity to revisit some historical but gross decisions, like the use of UTF-16. Last but not least, it gets even more of the implementation of the language and tools moved into Epoch instead of C++.

      Overall it's a daunting amount of work, but I think it can be managed. The real trick will be staying interested in the process during the long dark period where nothing quite works. I hope my skeletal plan will give me plenty of moments to sit back and enjoy visible progress, but we shall see.

      So, here goes nothing!
Right now a big limitation of the Epoch program execution model is the fact that .EXEs generated by the compiler are not in fact native binaries. They are compact bytecode wrapped in a thin shim that churns said bytecode through an LLVM wrapper layer each time the program is launched. This means that every single time you start an Epoch program, it is basically getting re-built and re-optimized from bytecode and then JITted to native machine code.

This means that the compiler, for example, takes about 6 seconds to start on my machine. Pretty sad.

The way around this is to build native binaries using LLVM and just stuff them on disk. The runtime still needs to exist in order to provide garbage collection support, marshaling into other native processes, and so on. However, it will be much skinnier and more efficient without carrying the weight of the LLVM wrapper all over the place. Net result should be faster startup times for Epoch programs and less memory consumption overall.

Easier said than done.

LLVM doesn't directly generate machine code that's really all that suitable for emitting to a binary on disk. It leaves out a lot of platform-dependent details like how to link to DLLs and how to handle global data. Much of the infrastructure is designed to assume you're emitting machine code for JIT runtime purposes, not necessarily for serialization.

Now obviously this isn't insurmountable; plenty of compilers depend on LLVM (notably Clang) and generate binaries just fine. So the real magic is going to lie in finding all the places where JIT code doesn't serialize nicely and working around them.

Off we go!

Epoch Mailing List

I've spun up a mailing list for the Epoch project, mostly because I'm tired of having conversations about it span half a dozen websites and PMs and blah blah.

Here's the link: clicky. Or you can email epoch-language@googlegroups.com to accomplish the same thing.

It should be open to anyone to come and talk about the language or just ask questions. I'll be seeding the discussion with a few subjects that are still open questions in my mind and hopefully people will jump in and kick some thoughts around.
So Epoch has a couple of warts that I'm looking at smoothing out. One of them has to do with overloads that do type decomposition of algebraic sum types:

//// How things look today//type Optional : T | nothingTest : integer x{ print(cast(string, x))}Test : nothing{ print("Nothing")}entrypoint :{ Optional exists = 42 Optional doesntexist = nothing Test(exists) Test(doesntexist)}I'm thinking about modifying the parser so that overloads of a function can be chained together without repeating the function name, by just specifying the signature of an overload immediately after the body of the function, and following with the body of the overload:

//// How things look with implicit overloads//type Optional : T | nothingTest : integer x { print(cast(string, x)) } : nothing { print("Nothing") }entrypoint :{ Optional exists = 42 Optional doesntexist = nothing Test(exists) Test(doesntexist)}The last question I want to ask is what lambdas look like, and whether or not they can help compact this code even further. Here's one thought experiment:

//// Experiment with lambda syntax//type Optional : T | nothingentrypoint :{ Optional exists = 42 Optional doesntexist = nothing var f = lambda :(integer x) { print(cast(string, x)) } :(nothing) { print("Nothing") } f(exists) f(doesntexist)}I'm not totally sold on the lambda syntax yet. I do like the idea of allowing overloads of lambdas, specifically for the type decomposition thing. Or maybe type decomposition just really needs a first-class structure, like case:

//// Experimental case statement//type Optional : T | nothingTest : Optional in{ case(in) : integer x { print(cast(string, x)) } : nothing { print("Nothing") }}entrypoint :{ Optional exists = 42 Optional doesntexist = nothing Test(exists) Test(doesntexist)}
I'm still thinking a lot about the question of task syntax in Epoch. The more I contemplate the matter, the less I like the idea of having tasks be overloaded as the "only" way to do namespacing. I also don't like the idea of static methods/data very much.

I still really like keeping tasks as the single mechanism for packaging functionality with data. I also like the idea of tasks existing as siloed off entities that can't interact except via message passing; this gives tremendous power to the model and makes threading and distributed processing a lot easier to realize.

Really the only big thing I'm recanting on is the idea of tasks being the way to group named stuff. I think a package notion is more sensible, for a number of reasons. First, packages can contain related logic/data without demanding that they all share an instantiation (or are static). Second, packages handle the idea of related logic that doesn't share any state, such as a library of free functions. Third, packages can then in turn contain tasks, which I think is an appealing way to go.

So here's my latest concept of how all this might look:

//// Define a namespace of related "stuff"// Could also have structures and tasks inside it, or maybe even nested namespaces?//package Outputters{ console : string text { print(text) } logfile : string text { // Imagine an IO implementation here. }}//// A task is a compartmentalized entity. Once it is// created, it has no interaction with the rest of the// world except via message passing.//// While task functions can return values, only those// functions without a return value can be invoked// via plain messages. However, I'm pondering some// syntax for capturing task return values as futures.//task GenerateStuff :{ // // This is a valueless function (method) so it // can be called as a plain message from outside. // // Messages are not dispatched while the task // is already handling a message, so a caller // can pass a bunch of messages and they will // safely queue up and be executed in order. // go : integer limit, (output : string) { while(limit > 0) { output(cast(string, limit)) } } // // This is a valued function. It cannot be // invoked as a standard function because it // lives inside the task, and must be talked // to via message. In order to get the return // value "out", we use a future; see below. // compute : integer in -> integer out = in * 2}//// Standard Epoch entry function//entrypoint :{ // // Create an instance of the task // // Note the new syntax for creating something // with no data fields to initialize // GenerateStuff generator = () // // Pass some messages, using stuff // from the package as an example. // generator => go(42, Outputters.console) generator => go(42, Outputters.logfile) // // Pass a message to the generator, // get a future back out, and wait on // the future to get its value. // future result = generator => compute(21) // This blocks until the message comes back with the return value of compute() integer foo = result.value}
Release notes.

I didn't get a ton of votes, but all the votes I got were for doing a release sooner rather than later. So here you go, with all the warts and incomplete stuff.

This is mostly a milestone preview for those (few) of you who are really watching the project; think of it as a sneak peek more than a stable development platform. Even still, you can do quite a hell of a lot with it.

I look forward to hearing all the things that suck about it.

Plodding ever onward

I've begun the mammoth task of refitting the Epoch compiler to believe in namespaces.

Up until now, all names have been global, and all named entities have existed as peers in the global space. (The exception of course being local variables and function parameters/return slots.)

So far, I've moved algebraic sum types and scope metadata to be tracked in a namespace. This doesn't yet entail moving away from global naming, because I need to have all aspects of the compiler "understand" namespacing before I can do that. But it's a major step forward, and I've already caught a couple of rogue compiler bugs and corner cases in the language's implementation.

As rewarding as it is to make progress, there's still a very long ways to go:

  • Type aliases
  • Weak type aliases
  • Structures
  • Function signatures
  • Function tags
  • Functions
  • Overload resolution hints
  • Type matchers
  • Function templates
  • Structure templates
  • Sum type templates
  • Instantiated function templates
  • Instantiated structure templates
  • Instantiated sum type templates

    Some of these are much more pervasive in the code base than others, such as functions and overload resolution hints. Some are already partially ready to go thanks to being implemented later in the compiler's authoring process, and having benefited from me learning how best to use the language along the way.

    Once this list is all done, I'll need to go back in another pass and remove the direct usage of the global namespace from all the places where it's currently hardcoded. After that, I'll need to build logic for creating new namespaces and passing them around as appropriate, e.g. for tasks.

    Hopefully all this stuff will take long enough that I'll get around to finalizing the actual task syntax before it's time to start implementing tasks in earnest. We shall see.

    I haven't totally forgotten about the notion of doing a full language release, either. The plus side of doing a release now is that it would provide a much more realistic representation of the Epoch development experience than, say, Release 14 was. On the down side, it would still mean shipping a lot of subpar code and tools.

    Maybe if there's actually some interest in me doing a release, I'll package one up. I'd just hate to go to the trouble of building a package and having a whopping two people download it and never comment on their experiences/opinions. It's been hard enough in the past to get serious traction on releases, so I kind of feel like holding off until I have a really awesome release to show.

    Buuuuuuuut I also kind of want people to see all the progress that's been made without having to sync the code repository and go through the muddle of building the thing from sources.

    Agh! Indecision.
So Epoch has managed to self-host (as of an embarrassingly long time ago) and the Era IDE is slowly turning into something actually worth using. Over the past few days I got a rudimentary symbol lookup feature done, where you can press F11 to process a project and F12 with the text caret over a symbol to jump immediately to the definition of the symbol. This makes navigating the compiler source a lot easier, but still far from ideal.

As often happens with this sort of project, needs in Era fuel features in the compiler, and vice versa. At the moment my biggest annoyance with Epoch is the lack of good encapsulation support - there is no module system, no namespace system, no object system, nothing.

I've written here before about the ideas I have for solving this, but it's worth reiterating since the notion has evolved a bit in my head since the last time I was really prolifically talking about it.

Essentially, the unit of organization in an Epoch program right now is a function. The difficulty is that there is no mechanism for grouping functions into higher-level constructs.

My idea for solving this is the task. A task is an abstract collection of functions, data structures, and/or variables. It acts like a namespace, but also acts like a class in that you can instantiate a task and send messages to the instance. Functions in a task can be static, meaning that (as with most languages using the term) you can invoke them as messages without an instance of the task.

For example, here's the unit test for basic tasks in the compiler suite right now:

//// BASICTASK.EPOCH//// Compiler test for a very, very simple task//task SuccessTask{ succeed : [static] { passtest() }}entrypoint :{ SuccessTask => succeed()}
The entrypoint function, as always, is where the program begins. All this program does is send the "succeed" message to the SuccessTask task. Since "succeed" is static, it can be handled without a task instance, which simplifies the code a bit. Once the succeed message is received, it calls the unit test harness function and notes that the test has passed, and that's the end of that.

So why is this a big deal? Because tasks don't have to be confined to a single execution unit. A task, as an abstract notion, might actually represent a coroutine, or a thread, or another process, or even an RPC interface to a machine across the world. All of them get uniform syntax and semantics from the language, and all execution infrastructure is transparent. A task can be turned into a thread or whatnot just by applying the suitable tag to its definition. The rest Just Works.

The idea is nothing new; it's heavily based on CSP, the actor model, Alan Kay's "objects", Erlang's concurrency model, and so on. What's important is that all grouping of code and data in Epoch programs follows a single pattern.

This works for everything from small globs that would typically be classes in most languages, up to modules, and through all the way to entire systems interacting; because it's all uniform message passing, none of the guts matter to the programmer unless he wants to dig into them and configure the way tasks are executed.

Tasks of course can have no shared state, and one of the things I look forward to doing with this feature is destroying the global{} block for once and for all. This makes it trivially easy to spin up tasks on multiple threads (or machines, or GPUs, or whatever) and know for certain that they can't corrupt each other's state - the boundaries are always clearly defined by the message APIs exposed by each task.

A lot of the details remain to be worked out, but this is the general direction I'm heading, and so far I really like it. I'm kind of firing first and aiming later, so expect some course correction and refinement as I home in on the final setup. Most of this is being informed by actually writing the compiler (and Era) in the task model, so the decisions I make will be based on real programs and real concerns instead of abstract theoretical ones.

Anyways. To get all this to work, I first need to build the concept of a self-contained namespace into the compiler. So that will be my concrete project for probably the next few weeks, if the experience retrofitting the C++ implementation to use namespacing is any indicator. I hope Epoch proves to be more productive than C++ for this sort of change, but we shall see.

Once namespaces exist, I just need to build the idea of a task which owns a namespace, and the idea of sending messages into tasks. After that I'll build the notion of instantiating tasks, and then see what strikes my fancy from there.

Oh... and all of this is subject to take a long-ass time, because I'm currently prone to being hounded for belly-rubs by this giant fuzzball:

Thor 1.jpg

Playing with colors!

Been playing around with a dark theme for Era:

Era dark theme Sep 2014.png

Syntax highlighting still needs some love, but it's getting there.

Note that the IDE now highlights structure type names and function names. This is currently activated by pressing F12, which triggers a parse of the entire project. The parse stores off relevant identifiers for use in syntax highlighting. Obviously that's a painfully manual process, but I'll get it more seamless someday.

I'm not sure I care enough about local variables to bother figuring out a way to highlight them.

Sooner or later I'll actually have enough of this dumb IDE working that I'll feel comfortable doing heavy-lifting code work in it all the time. I think the main thing I want is symbol lookup, and with the parser now integrated and doing its thing, that shouldn't be too hard.

Argh, bitrot.

Turns out that leaving a project alone for six months is a great way to discover that it's full of mysterious bugs you don't remember having.

I noticed some weird behavior with Era, the Epoch IDE, earlier this evening, and started poking around looking for explanations. It turns out that in some weird combination of circumstances, the Epoch program (Era in this case) can get partway through executing a Windows message handler, crash internally, and then continue on merrily as if nothing had ever happened (except without executing the remainder of the WndProc or whatever else the proc had called out to).

My best guess is that there is some Structured Exception Handling magic going on in the LLVM library that causes the crash to partially unwind the call stack and then continue executing from some weird spot. I don't get reliable enough callstacks to prove this just yet, because the JITted Epoch code is pretty hard to untangle in a debugger.

So the goal of the day is to find out what the actual crash is, hopefully by capturing it under a debugger. But this apparently happens a lot more frequently than I'd realized, because attaching a debugger to the running Era process turns up all kinds of crashes and weird behavior. Something in the Epoch runtime is seriously borked.

At this point, it's 2332 hours (yayy palindromic times) and I'm not liable to get much sleep tonight. This is going to bug the shit out of me. So I grab a fresh beverage and settle in for a heavy debug session.

Initially, I turn on all of the CRT memory debugging features I know of, and fire up Era. Well, I try. Turns out that _CRTDBG_CHECK_ALWAYS_DF really does murder performance... to the tune of Era - which normally starts up and has my code visible in under a second - has been trying to load for the better part of 15 minutes now.

LLVM's optimizer apparently allocates memory like a freaking madman. Obviously not written by game developers.

Meanwhile, Era has pegged a core of my laptop's already-warm CPU and is showing no signs of being ready any time soon. Maybe that beverage needs a little ethanol in it...

At 2352, the IDE still shows no signs of finishing the loading process. The LLVM optimizer is hard at work burning through trillions of allocations and also murdering my poor CPU. If it were anywhere near this painful in Release, I'd seriously consider offering to write them some better allocation strategies so I can stop wasting my youth waiting for the dumb thing to finish calling operator new().

Periodic checking of the progress in the debugger indicates that, yes, we are making progress - it seems that the optimizer is finally working through the last few passes. This is at 0002 hours, so basically 40 minutes have elapsed just waiting for the IDE to load.

I might not whine about Visual Studio's startup time for a little while.

Nah, I'll still whine.

Anyways... of course once optimizations are done, we still have to generate machine code. Turns out this is even worse in terms of millions of tiny allocations. Quick back-of-the-cocktail-napkin estimates show the IDE loading at sometime around 8 AM. Screw this.

Sure enough, a couple of minutes of poking with the per-allocation checking turned off yields pay dirt. Looks like I had an off-by-one error in the library routine for converting strings between UTF-8 and UTF-16. DERP.

Fixing that leads to more interesting crashes, this time somewhere in the garbage collector. It's verging on 0045 and I'm wondering how much more I've got in the tank... but this is too compelling to pass up.

The faulting code looks innocent enough: loop through a giant std::map of allocated string handles, and prune out all the ones that don't have any outstanding references. For some reason, though, std::wstring is barfing deep in a destructor call, apparently because the "this" pointer is something wonky.

My first guess, of course, is that I have mismatched code - something compiled in one way while linking to (or sharing objects with) something from another compilation setup. Time to go spelunking in the Visual Studio project settings...

Sadly, probing into the compiler/linker settings yields no obvious discrepancies. Time for the good ol' magical Clean Rebuild.

No joy. Next attempt is to disable the deletion of garbage strings... it'll murder my memory footprint, but it might also reveal what else is interfering with the string table. This causes the crashes to stop for the most part, even with Application Verifier enabled, which is pesky. I do, however, get a crash when exiting the program - ie. when garbage collection is not destroying strings, but rather the actual teardown process.

This indicates a memory stomp to me... which is slightly terrifying. Something, somewhere, seems to be clobbering entries on the string table. I haven't yet discerned a pattern to the data that gets written on top of the table entries, so it isn't entirely clear what's doing the stomping.

It's 0111 and I'm seriously tired. My best guess is that the stomp originates from the string marshaling code that interacts with external C APIs, specifically in this case the Win32 API. I suspect that I'm doing some evil cast someplace that confuses pointer types and causes chaos, but I'm far too hazy to pinpoint that as the cause for certain tonight.

0116 - time to cash in. We'll see how this goes next time!

Sampler-Based Profiling: The Quick Version

So you're happily working on some code, and suddenly it happens: everything is just too damn slow! Something is eating up all your performance, but it's not immediately obvious what to do about it.

One of the first things any experienced programmer will tell you is to profile. In a nutshell, this is a grizzled beard-wearing programmer shorthand for measure things and be scientific about your approach to performance. Even some of the most brilliant programmers in the world find it hard to intuitively find the real performance bottlenecks in a complex piece of code. So don't rely on voodoo and ritualism to find your slow points - measure your code and act accordingly.

Picking a Profiler
There are fundamentally two approaches to profiling. One is instrumentation, and the other is sampling. Instrumentation means adding some magic to your code that times how long the program spends executing various functions. It can be as simple as subtracting a couple of timer values, or as complex as invasive changes to the entire program that do things like automatically record call stacks and such. The "industrial strength" profilers generally support this mode, although in practice I find it very hard to use for things like games, for one simple reason: it dramatically changes the behavior of your program when you want to do something in near real-time.

So I will suggest, if you're new to profiling, that you start with a sampling-based profiler. By now you have all the keywords you need to find one on the Internet, so I won't recommend anything in particular (people can be very dogmatic about what their favorite tools are).

Sampling and How it Works
The basic idea behind sampling profilers is simple statistics. First, start by taking a program that's already running. Freeze it in place, like hitting the Pause button, and note what code was running when you stopped the world. Optionally, you can record a callstack as well to get contextual information. Once the snapshot is taken, un-pause the program and let it continue running. Do this thousands of times, and you will - statistically speaking - slowly build up a picture of where and why your program is slow.

It is important to remember two things about sample-based profilers. First, they are statistical estimates only. They will give you slightly different results each time you run the program. Therefore, running as much code as possible is in your best interests - this gives you the opportunity to get lots of data and make a more accurate statistical picture. I usually run for something like 10,000 to 20,000 samples, to give you a ballpark idea.

Second, sampling will tend to downplay the importance of very tiny pieces of code. Statistically, it is easy to see why this has to be true: since the piece of code is tiny, the odds of freezing the program exactly in that spot are correspondingly slim. You might land just before it, or just after it, for example. If your program is largely built up of equally slow tiny bits of code, sampling might make it impossible to find a bottleneck.

That said, sampling is still a great tool, and an easy way to get started with profiling. So let's talk about how to use it.

Using Sampling
Typically, profilers will show you two crucial stats about a given piece of code: how often it was seen in the freeze-frame snapshots (sample count), and a rough guess at how much time was spent in that chunk of code across the entire profiling session (aggregate time). Some profilers distinguish between time spent in just the code (exclusive time) versus time spent in the code and everything that code calls (inclusive time). These are useful measurements in various situations so don't get too used to focusing on a single statistic.

For introductory purposes, often the easiest way to fix performance issues is to look at the combination of inclusive time and sample count. This tells you (roughly) what fraction of a program's life is spent chugging through a given piece of code. Any decent profiler will have a sorted breakdown that lets you view your worst offenders in some easy format.

Once you have identified a slow piece of code, it takes some analysis of the program itself to understand why it is slow. Sometimes, you might find a piece of code can't be optimized, but chews up a lot of profiling samples - this is common with high-throughput inner loops, for instance. So don't be afraid to use different stats (especially exclusive time) and look further down the list for potential wins.

Pitfalls and Gotchas
There's a few things worth mentioning yet; I'll only hit them at a high level, though, because the details will vary a lot based on OS, compiler, and so on.

  • Memory allocations can hide in weird places. Look for mysterious samples in system code, for example. Learn to recognize the signs of memory allocation and deallocation - even if you're in a garbage-collected language, these can be important.
  • Multithreading is the enemy of sample-based profiling, because blocking a thread looks an awful lot like chewing up CPU on the thread. Note that some good profilers can tell the difference, which is nice.
  • Statistics are lies. If your profiler is telling you something profoundly confusing, seek advice on whether you're missing something in the code, or if the profiler is just giving you an incomplete statistical picture of reality.
  • Practice is important, but so is realistic work. Why didn't I include an example program and profiling screenshots? Because real programs are much harder to profile than simple examples. If you want practice, work on real programs whenever possible, because you'll learn a lot more.
No structure or real nice formatting will be found in this post. This is a stream-of-consciousness blathering process wherein I contemplate how to organize code in a way that escapes the limitations of the file paradigm.

Considerations for organizing code
Main goal: aid in discoverability and navigation of complex code bases. Secondary benefit could be that complicated architectures will be correspondingly "messy" to look at, encouraging cleaner design.

Files suck because:

- They force revision history to occur at the file level instead of the unit-of-code (function/etc.) level. Move a function into a new file and POOF goes its revision history. This is lame.

- They imply a single view of code which may not be the most expedient way to look at code "right now". Working on functionality that spans multiple files is a classic example; I have to change files all the time when the chunk of code I'm conceptually editing has nothing to do with these dumbass file divisions.

- Files as modules or other semantic implications of file separation are FUCKING EVIL (Java I'm looking at you). There should be no forced correlation between the code storage mechanism on disk and the conceptual organizationS (emphatically plural) of the code at a high level.

- Where the fuck did that function go? Sigh, time for grep. This is lame if you misspell the function's name or can't remember exactly what it's called. grep cannot search for IDEAS. Files are not directly to blame for this, but the forced implied taxonomy of code based on files exacerbates the problem significantly.

- They unnecessarily complicate code reuse. I should be able to reuse just that function, or that entire group of related code, or whatever - independent of shuffling files around. This should tie in to revision history too so I can see the history of a function when I steal it for another project.

Other assorted thoughts:

Avoid the sin of only being able to see tiny fragments of code at a time, ala early Visual Basic; ensure that a comfortable volume of code is always at hand to be edited and browsed.

Would be nice to have some kind of graphical visualization of the code taxonomies created by the programmer.

I want a way to select groups of functions to view together; checkboxes are lame; lasso?

"Search for code unit by name" as first-class IDE feature, including full language parse?

How do we minimize reparsing without having to write a standalone incremental parser?

Parsing != semantic checks; maybe the former can be done fast enough for realtime work, and maybe that justifies keeping the semantic work as semi-modal operations?