/* attempt to deal with prototype, bootstrap, jquery conflicts */ /* for dropdown menus */

Jump to content

February 2017 »

19 202122232425

etodd makes games

4: Adsense

Enlisting IBM Watson as a voice actor

Allow me to regale you with an exciting tale: the birth of a janky dialogue and voice system.


I have a JSON file with all the localized strings in my game, like this:

	"danger": "Danger",
	"level": "Level %d",
A preprocessor takes this and generates a header file with integer constants for each string, like this:
namespace strings
	const int danger = 0;
	const int level = 1;
	// ...
At runtime, it loads the JSON file and hooks up the integer IDs to localized strings. A function called "_" takes an integer ID and returns the corresponding localized string. I use it like this:
draw_string(_(strings::danger), position);
This all worked (and still works) pretty well for UI strings. Not so much for dialogue.


To write dialogue, I had to come up with a unique ID for each line, then add it to the strings file, like this:

	"hello_penelope": "Hello! I am Penelope.",
	"nice_meet_you": "Nice to meet you.",
Yes, the preprocessor generated a new integer ID in the header file every time I added a line of dialogue. Gross.


I construct dialogue trees in Dialogger. With this setup, I had to use IDs like "hello_penelope" rather than actual English strings. Also gross.


A better way
I keep the string system, but extend it to support "dynamic" strings loaded at runtime that do not have integer IDs in the header file.


Now I can write plain English in the dialogue trees. The preprocessor goes through all of them and extracts the strings into a separate JSON file, using the SHA-1 hash of each string for its ID. Once everything is loaded, I discard all string IDs in favor of integer IDs.


I couldn't find a simple straightforward SHA-1 implementation that worked on plain C strings, so here's one for you.


The point of all this is: I now have a single JSON file containing all the dialogue in the game. Ripe for automation...


Speak and spell


Penelope is an AI character. I'm using text-to-speech for her voice, at least for now. I don't want to integrate a text-to-speech engine in the game; that's way too much work. And I don't want to manually export WAVs from a text-to-speech program. Also too much work.


I create a free IBM Bluemix account. They have a dead simple text-to-speech API: make an HTTP request with basic HTTP authentication, get a WAV file back.


I write an 82-line Python script that goes through all the dialogue strings and makes an HTTP request for each one. It keeps track of which strings have previously been voiced, to facilitate incremental updates.


Now I have a folder of WAV files, each one named after a SHA-1 hash. I'm using Wwise for audio, so the next step requires a bit of manual involvement. I drag all the WAVs into the project and batch create events for them.


Posted Image


Now when I display a dialogue string, I just have to look up the SHA-1 hash and play the audio event. Easy.


Disaster strikes
I don't hear anything. All signs indicate the audio is playing correctly, but nothing comes out of my speakers.


I look at one of the audio files in Wwise.


Posted Image


Looks like the file is corrupted. I play the WAV in a number of different programs. Some play it fine, others don't play it at all.


I edit my text-to-speech script to use Python's wave library to load the WAV file after downloading it from IBM. Sure enough, the library doesn't know what to make of it.


Too lazy to care, I edit the wave library in-place in my Python distribution. YOLO.


After a bit of printf debugging, I pinpoint the issue. The WAV format is based on RIFF, a binary format which breaks the file into "chunks". According to Wikipedia, the format of each chunk is as follows:

  • 4 bytes: an ASCII identifier for this chunk (examples are "fmt " and "data"; note the space in "fmt ").
  • 4 bytes: an unsigned, little-endian 32-bit integer with the length of this chunk (except this field itself and the chunk identifier).
  • variable-sized field: the chunk data itself, of the size given in the previous field.
  • a pad byte, if the chunk's length is not even.
Turns out, IBM's text-to-speech API generates streaming WAV files, which means it sets the "length" field to 0. Some WAV players can handle it, while others choke. Wwise falls in the latter category.


Fortunately, I can easily figure out the chunk length based on the file size, modify it using the wave library, and write it back out to the WAV file. Like so.


Problem solved. Wwise is happy. Next I set up some Wwise callbacks to detect the current volume of Penelope's voice, and when she's done speaking.


Here's the result, along with some rope physics in the background being destroyed by the wonky framerate caused by my GIF recorder:


Posted Image

If you want to hear it, check out the IBM text-to-speech demo here.


Thanks for reading!


Mirrored on my blog

The Poor Man's Threading Architecture

The game industry hit Peak Advice Blog a while ago. Every day I read skim ten articles telling me how to live.


Fear not! I would never give you useful advice. This series is about me writing bad code and you laughing at my pain.


First Contact
Say you have some voxels which occasionally get modified. You regenerate their geometry like so:

Because you are a masochist, you want to do this on a separate thread.


Rather than redesign your engine, you simply spawn a worker thread and give it a list of voxels to process. Easy in C#:

Queue<Voxel> workQueue;

static void worker()
    while (true)
        Voxel v = workQueue.Dequeue();
        lock (v)

new Thread(new ThreadStart(worker)).Start();

// And awaaaaaaay we go!
The lock signals to other threads "hey, I'm using this". If the other threads also acquire locks before using the object, then only one thread will access it at a time.


Kaboom! It crashes when thread A dequeues an item at the exact moment when thread B is enqueueing one. You need a lock on the queue as well.


This is Hard and Boring and Slow
Turns out, without a lock you can't trust even a single boolean variable to act sane between threads. (Not entirely true. For wizards, there are atomic operations and something about fences. Out of scope here!)


Sprinkling locks everywhere is tedious, error-prone, and terrible for performance. Sadly, you need to rethink your engine from the ground up with threads in mind.


The Right Way
Why am I even including this? Ugh.


You read a few articles on modern AAA engines, where you find a diagram like this:


Posted Image


This is the job graph from Destiny. AAA engines split their workload into "jobs" or "fibers". Some jobs depend on others. The graph has bottlenecks that split the work into phases. Within phase 1, tons of jobs execute in parallel, but all of them must finish before phase 2 starts.


With jobs, you wouldn't have to lock individual pieces of data. The dependency graph ensures that jobs run in the right order, and that nothing runs in parallel unless you're okay with it. You also don't have to think about individual threads — a scheduler delegates jobs to a pool of threads.


Here's another diagram you stumble across:


Posted Image


This shows the workload of the GPU and each CPU over time as the game renders a single frame. The goal is to fill all those holes so you use every bit of available compute power at maximum efficiency.


The Quick and Dirty Way
In a brief flash of clarity, you realize that you are not Bungie. You check your bank account, which sadly reports a number slightly lower than $500 million.


You recall the Pareto Principle, also known as the "80/20 rule". You decide to write 80% of a decent architecture for only 20% of the work.


You start with a typical game loop:

while (true)
    // Process window and input events
    SDL_Event sdl_event;
    while (SDL_PollEvent(&sdl_event))
        // ...




    // Present!

Side note: a while back you also switched to C++. Masochism level up.


What can you move off the main thread? If you touch OpenGL or anything within a mile of the windowing system from another thread, the universe explodes. For now, you keep graphics and input (SDL_PollEvent) on the main thread.


That leaves physics and game logic. You give each its own thread. Since you need to spawn/modify/query physics entities in game logic, no other physics can happen while you're in a game logic update. The rest of the time, the physics thread can work in the background.


Sounds like a perfect case for a lock:

std::mutex physics_mutex;

void physics_loop()
    while (true)
        std::lock_guard<std::mutex> lock(physics_mutex);

void game_logic_loop()
    while (true)
            std::lock_guard<std::mutex> lock(physics_mutex);

In this setup, your render() function can't read or write any physics data for fear of explosions. No problem! In fact, that limitation might be considered a feature. However, the render() function also can't make any OpenGL calls since it's not on the main thread.


You re-watch the Destiny GDC presentation and notice a lot of talk about "data extraction". In a nutshell, Destiny executes game logic, then extracts data from the game state and lines it up for a huge fan-out array of render threads to process efficiently.


That's essentially what your render() function will do: go through the game state, generate graphics commands, and queue them up. In your case, you only have one render thread to execute those commands. It might look like this:

enum class RenderOp
    // etc.

BlockingQueue render_queue;

void main_loop()
    while (true)
        RenderOp op = render_queue.read<RenderOp>();
        switch (op)
            case LoadMesh:
                int count = render_queue.read<int>();
                Vec3* vertices = render_queue.read<Vec3>(count);
                // etc...
            case FreeMesh:
                // etc...
It's a graphics virtual machine. You could give it a pretentious name like GLVM.


Now in your render() function, you just write commands to the queue:

void render()

    for (MeshRenderer& mesh : mesh_renderers)
        // etc.

This will work, but it's not the best. You have to lock the queue every time you read from or write to it. That's slow. Also, you need to somehow get input data from the main thread to the game logic thread. It doesn't make sense to have queues going both directions.


Instead, you allocate two copies of everything. Now, the game logic thread can work on one copy, while the main thread works on the other. When both threads are done, they swap.


Posted Image


Now you only have to use a lock once per frame, during the swap. Once the threads are synced, the swap operation is just two pointer reassignments.


Furthermore, you can keep the render command lists allocated between frames, which is great for performance. To clear one, just reset the pointer to the start of the list. Now you don't have to worry about implementing a queue with a ring buffer. It's just an array.


Our Bright and Glorious Future
For wizards, this is all wrong, because graphics drivers do command queueing anyway, and jobs and fibers are the right way. This is like Baby's First Threaded Renderer. But it's simple, it gets you thinking in terms of data flow between threads, and if you eventually end up needing a job system, you're already halfway there.


This setup might also make the switch to Vulkan easier. If you keep all your data extraction in one place and make it read-only, it should be trivial to split into multiple threads, each with their own render queue.


You can see a poorly-commented cross-platform implementation of this idea here. Potentially useful parts include the SDL loop, GLVM, and the swapper.


If you enjoyed this article, try these:

But here's a better idea: watch the Destiny GDC talk.


Thanks for reading!


Mirrored on my blog

Ludum Dare 34 Postmortem

Friday 21:15
Fifteen minutes after the theme announcement, my friend Ben Homan walks through my front door. Not really my front door, I'm just a subletter. But this is a first. Normally he ignores our instructions to walk in without knocking. The first time, he texted me from the driveway.


Jesse Kooner walks in, also unannounced, bearing frozen pizza. Before he can even kick his shoes off, I loudly explain the theme: a never-before-seen tie between "growing" and "two-button controls".


Jesse has no laptop. I dig out an old one from my closet. I plug it in and start working on a few Windows updates. 72 to be exact.
Meanwhile, we decide which technology to use. Jesse's less code-focused skillset leads him to prefer Unity, while Ben wants to use the weekend as an opportunity to become more familiar with Node.js. We decide on Node.js. Jesse will provide creative input and artwork.


My roommates, a brother and sister, arrive home from an apparently underwhelming Christmas light show. The concept of a game jam is foreign to them, but they're good sports. We spend a half hour playing QWOP with them.


Our design parameters:

  • Multiplayer. Otherwise, what's the point of Node.js?
  • Probably 2D due to the limited timeframe. Although Jesse is more comfortable working in 3D.
Jesse originally suggested doing this competition a few weeks ago, when he wanted to create a mash-up of "Cookie Clicker" and a tower defense game. He resurrects the idea now, only halfway joking.


Ben likes the idea of a multiplayer vine growing game. I'm partial to a text-based social game about growing a social media brand. No one commits to anything yet.
Ben is not a big gamer, so I pull up a few famous HTML5 games on his laptop. Cursors.io. Agar.io. 2048. This last one interests me in particular, as it involves growing numbers.


I'm pitching Ben and Jesse on a multiplayer version of 2048. I envision a free-roaming world filled with numbered tiles to collect. Instead of collapsing numbers together against the edges of the board, players would find walls and structures within the world to collapse their numbers against.



  • How can two or more games of 2048 occur simultaneously on the same board? In normal 2048, the player controls all tiles on the board. We decide to give each player a unique color, and allow them to assimilate unclaimed, grayed-out tiles.
  • What happens when tiles from two players meet? We decide that whoever moves first gets to own the resulting collapsed tile from a move.
  • How do players traverse through the world? We toss around the idea of procedural generation, but eventually decide on hand-crafted, linear levels linked together via portals.
  • This raises the question: how do the portals work? And what happens to a player's tiles once they exit a level?

Posted Image

Top right corner: our confidence that we'll actually finish the thing.


Saturday 00:30
We've eaten two pizzas. We have a Git repository and a Slack instance which is ultimately only used to test out amusing Slackbot responses.
Although the game is 2D, we decide on a 3D art style with an orthographic projection, to allow Jesse to use his skills in Maya LT, which he promptly installs on my old laptop.
Ben works on the server with Node.js, while I start on the client with Three.js.


Ben heads home first, then Jesse. I pass out on my bedroom floor.


Posted Image

You can see we're already struggling with the name.


I wake to find Ben working in the kitchen. He spends the morning building boiler plate for the server, while I work out some Three.js details.


I return from a run just in time to catch Jesse pulling up with donuts. After lunch, he and I spend the next few hours working out a pipeline between Maya and Three.js.


I tweet our first screenshot, featuring colorized instances of Jesse's tree model displayed in a horribly distorted orthographic projection.
Posted Image


Without the server API to code against, I run out of things to do on the client. Ben finishes the data model, but he has trouble conceptualizing the rules for player movement. I haul my laptop over to indulge in some good old-fashioned pair programming.


Break for dinner. Burritos. Jokes.


Jesse continues modelling. With the basic API done, I start working to make the client consume it. Ben works on an image loader. We want to design the levels in GIMP.


Ben takes off. I've got player movement, animations, and tile numbers done.


Sunday 00:30
Multiplayer works. Jesse and I play a few games against each other. It's fun! It's a game! We work out some issues with the level loading code and try to get an interesting level loaded.


Problem: it's pretty easy for one player to gain the upper hand and quickly assimilate all the tiles on the board, making it impossible for other players to move and grow.


Solution! Players should only control tiles within a certain radius of their "center". Outlier tiles are grayed out, free to be picked up by other players.


Fix implemented. Jesse heads home and I turn in.


Tweet another screenshot before heading to church.
Posted Image


At this point, I'm having an existential crisis about the name. I fire off a few panicked texts about it.


Ben and I are back to the grindstone. Jesse arrives with sandwiches and more donuts! Ben adds a cool username feature, but we eventually axe it to keep things simple.


Tweaks and bug fixes all day. Ben works on the level format, while Jesse lays out some levels in GIMP.


More polish. I put in Jesse's cloud models and a "connecting" spinner.


We finally brainstorm a name: Tile Risers. Jesse whips up a logo in Maya. We go through seven iterations before everything lines up in our janky export pipeline.


I spin up a Digital Ocean droplet, allocate an S3 bucket, commit the production URLs, and start filling out the submission form.


Time's up! Fortunately, we have another hour to submit. I later found out I misread the rules, and we actually had a whole extra 24 hours. At any rate, we were done. I commit two small bug fixes after submission, which is within guidelines.


Posted ImagePosted Image

Mirrored on my blog

One Weird Trick to Write Better Code

Developers hate him!

We'll cover some standard tips and tricks here, but we're not really interested in those. We're looking for the One Weird Trick to rule them all. Hopefully each trick we encounter brings us closer to coding Mecca.

In the beginning
The first video game I ever wrote was called Ninja Wars.
Posted Image
Yes, that is an HTML table of images. I changed the src attribute to move stuff around.

The top of the Javascript file looked like this:
var x = 314;
var y = 8;
var prevy= 1;
var prevx= 1;
var prevsw= 0;
var row= 304;
var endrow= 142;
var sword= 296;
var yrow = 0;
var yendrow = 186;
var I = 0;
var e = 0;
var counter = 0;
var found = 0;
var esword = 26;
var eprevsw = 8;
var bluehealth = 40;
var redhealth = 40;
var n = 0;
var you = 'ninja';
var bullet = 'sword';
var enemy = 'enemy';
var ebullet = 'enemysword';
var besieged = 0;
var siegecount = 0;
var esiegecount = 0;
var ebesieged = 0;
var healthcount = 0;
var player = 0;
var starcount = 0;
var infortress = false;
var prevyou = you;
var einfortress = false;
var prevenemy = enemy;
var previmg = "";
var prevbullet= "";
var eprevbullet= "";
var randnum = 0;
var randnum2 = 0;
var randnum3 = 0;
var randnum4 = 0;
var buildcount = 0;
var characters = new Array(4);
characters = ['ninja','tank','heli','builder'];
var bullets = new Array(3);
bullets = ['sword','bullet','5cal','sword'];
var echaracters = new Array(3);
echaracters = ['enemy','tank2','eheli','ebuilder'];
var ebullets = new Array(3);
ebullets = ['enemysword','bullet2','e5cal','enemysword'];
var health = new Array(4);
health = [40,30,20,10];
var prevorb = 0;
var prevnum = 0;
Hopefully this looks somewhat familiar, and I'm not the only one to start out writing code like this. Regardless, this debacle demonstrates our first trick:

Trick #1: globals are evil

We don't even know why they're evil, we just intuitively know.

edit: I should clarify that I'm no longer against globals. But that's getting ahead of ourselves.

Pretty soon we learn about objects. We can group our variables:
class Ninja
    int x, y;
    int previousX, previousY;
    int health = 100;

class Sword
    int x, y;
    int previousX, previousY;
    int sharpness = 9000;
We can even use inheritance to avoid copy-pasting:
class Movable
    int x, y;
    int previousX, previousY;

class Ninja : public Movable
    int health = 100;

class Sword : public Movable
    int sharpness = 9000;
Inheritance is nice! Nice enough to serve as our next trick:

Trick #2: object-oriented programming

Object-oriented is so great that it forms the core of many classic games, including Doom 3. Which, coincidentally, is open source.
Doom 3 loves inheritance. Don't believe me? Here's a small subset of its class hierarchy:
Imagine you're an employee at id Software. This inheritance hierarchy works great for a few months. Then, one fateful Monday, disaster strikes. The boss comes in and says "Hey, change of plans. The player is now a car."

Look at idPlayer and idAFEntity_VehicleFourWheels in the hierarchy. Big Problem #1: we need to move a lot of code.

Big Problem #2: the boss comes to his senses and calls off the "player is a car" idea. Instead, we're adding turrets to everything. The car is becoming a Halo Warthog, and the player is getting a giant turret mounted on his back.

As lazy programmers, we decide to use inheritance again to avoid copy-pasting. But look at the hierarchy. Where can we put the turret code? The only ancestor shared by idPlayer and idAFEntity_VehicleFourWheels is idAFEntity_Base.

We'll probably put the code in idAFEntity_Base and add a boolean flag calledturret_is_active. We'll only set it true for the car and the player. This works, but the terrible, logical conclusion is that our base classes end up loaded with tons of cruft. Here's the source code for idEntity.

Go ahead, scroll through it. You don't have to read it all.


The point is, that's a lot of code. Notice how every single entity — down to the last piece of physics debris — has a concept of a team, and of getting killed. Clearly not ideal.

If you're a Unity developer, you already know the solution: components! Here's what they look like:

Posted Image

Rather than inheriting functionality, Unity entities are just bags of components. This solves our earlier turret problem easily: just add a turret component to the player and car entities.

Here's what Doom 3 might look like if it used components:
What have we learned?

Trick #3: in general, favor composition over inheritance

Take a moment to review the tricks we've covered so far: global variables bad, objects good, components better.

You won't believe what happens next!

Let's take a small detour into the world of low-level performance with a very simple question: which function is faster?
double a(double x)
    return Math.sqrt(x);

static double[] data;
double b(int x)
    return data[x];
We'll hand-wave a lot of complexity away and just assume that these two functions eventually compile down to one x86 instruction each. Function a will probably compile to sqrtps, and function b might compile to something like lea("load effective address").
sqrtps takes about 14 CPU cycles on a modern Intel processor, according to Intel's manual. What about lea?

The answer is "it's complicated". It depends on where we load data from.
Registers      ~40 per core, sort of              0 cycles
L1             32KB per core          64B line    4 cycles
L2             256KB per core         64B line    11 cycles
L3             6MB                    64B line    40-75 cycles
Main memory    8GB                    4KB page    100-300 cycles
That last number is important. 100-300 cycles to hit main memory! This means in any given situation, our biggest bottleneck is probably memory access. And from the looks of it, we can improve this by using L1, L2, and L3 cache more often. How do we do that?

Let's return to Doom 3 for a real-life example. Here's the Doom 3 update loop:
for ( idEntity* ent = activeEntities.Next();
    ent != NULL;
    ent = ent->activeNode.Next() )
    if ( g_cinematic.GetBool() && inCinematic && !ent->cinematic )
        ent->GetPhysics()->UpdateTime( time );
    RunEntityThink( *ent, cmdMgr );
    ms = timer_singlethink.Milliseconds();
    if ( ms >= g_timeentities.GetFloat() )
        Printf( "%d: entity '%s': %.1f ms\n", time, ent->name.c_str(), ms );
From an object-oriented perspective, this code is pretty clean and generic. I'm assuming RunEntityThink calls some virtual Think() method where we could do just about anything. Very extensible.

Uh oh, here comes the boss again. He has some questions.
  • What's executing? Er... sorry boss, it depends on what entities are active at the time. We don't really know.
  • In what order is it executing? No idea. We add and remove entities to the list at random times during gameplay.
  • How can we parallelize this? Gee boss, that's a tough one. Since the entities execute randomly, they may be accessing each other's state. If we split them up between threads, who knows what might happen.
In short:

Posted Image
But wait, there's more! If we look closely, we see the entities are stored in a linked list. Here's how that might look in memory:

Posted Image

This makes our L1, L2, and L3 cache very sad. When we access the first item in the list, the cache thinks, "hey, I bet the next thing they'll want is nearby", and it pulls in the next 64 bytes after our initial memory access. But then we immediately jump to a completely different location in memory, and the cache has to wipe out those 64 bytes and pull in new data from RAM.

Trick #4: line up data in memory for huge performance gains
Like this:
for (int i = 0; i < rigid_bodies.length; i++)

for (int i = 0; i < ai_controllers.length; i++)

for (int i = 0; i < animated_models.length; i++)

// ...
An object-oriented programmer might be infuriated at this. It's not generic enough! But look, now we're iterating over a set of contiguous arrays (not arrays of pointers, mind you). Here's what it looks like in memory:

Posted Image

Everything is all lined up in order. The cache is happy. As a side bonus, this version allows us to answer all those pesky questions the boss had. We know what's executing and in what order, and it looks much easier to parallelize.

edit: Don't get too hung up on cache optimization. There's a lot more to it than just throwing everything in an array. I only bring it up to prove a point, and I'm only qualified to give the basic introduction. Check out the links at the bottom for more info.

Time to wrap this up and get to the point. What's the One Weird Trick? What do tricks 1-4 (and many more) all have in common?

The One Weird Trick: data first, not code first
Why were globals so evil (trick #1)? Because they allowed us to get away with lazy data design.

Why did objects help us (trick #2)? Because they helped us organize our data better.

Why did components help us even more (trick #3)? Because they modeled our data better by matching the structure of reality better.

Even the CPU likes it when we organize our data correctly (trick #4). No really, what is the trick actually
Let's break it down in practical terms. Here's a representation of a typical Unity-like component-based game design.

Posted Image

Each component is an object. It has some state variables listed at the top, and then some methods to do stuff with those variables. This is a well-designed object-oriented system, so the variables are private. The only code that can access them is the object's own methods. This is called "encapsulation".

Posted Image

Each object has a certain amount of complexity. But fear not! OOP promises that as long as we keep the state private, that complexity will stay encapsulated within the object, and won't spread to the other objects.

Unfortunately, this is a lie.

Posted Image

Sometimes, we need a function to access two or three objects. We end up either splitting the function between those objects, or writing a bunch of getters and setters so our function can access what it needs. Neither solution is very satisfying.

Here is the truth. Some things cannot be represented as objects very well. I propose an alternate paradigm, one which represents every program perfectly:

Posted Image

Once we separate process from data, things start to make more sense.

Object-oriented helps us write good code because it encourages us to encapsulate complexity (i.e. state). But it forces us to do so in a certain way. Why not instead encapsulate like this, if it makes sense within our problem space?

Posted Image

In summary, design data structures to match your specific problem. Don't shoehorn a single concept into a bunch of separate, encapsulated objects.

Next, write functions that leave the smallest possible footprint on that data. If possible, write pure stateless functions.
That's the trick.

If this struck a chord with you, it's because I stole most of it. Get it straight from the source:Thanks for reading!

Mirrored on my blog

The Poor Man's Postmortem - Lemma

The big secret of our industry is, we don't actually enjoy making games. We slave away in obscurity for years in anticipation of one glorious day.

Not release day, no. The day we can finally write a postmortem full of pretentious anecdotes, bad jokes, and unsolicited advice.

Well I just finished a game, and doggone it, I am going to exercise my inalienable rights as a developer.

The formatting on this article came out a little wonky. Read here for a better formatted version.

Things to do when making a game

Ancient gamedev postmortem traditions mandate that this section be titled "what went right". Unfortunately, the game was so shockingly good and so many things went right that a full overview would stretch on endlessly.

Instead you'll have to settle for this. These are some things I did that I recommend you do as well.

Come up with a good concept

I didn't do this one, actually. The original concept was a cartoony third-person game called Parkour Ninja. It changed every other week or so for the remainder of development. For a while the player had a pistol:
Posted Image
And for a while, you could rip voxels apart and re-attach them:
Posted ImagePosted Image

Almost everything got cut.

The final concept is not particularly unique. First-person parkour with a female protagonist has definitely been done before, and every third game on Steam uses voxels.

It worked in the end though, by combining familiar elements in a unique way, and by throwing in a weird, trippy, puzzle-y aesthetic. Every new idea steals from existing ideas.

In 2012 I released a short, strange, ugly, buggy alpha demo which nevertheless communicated the core ideas of parkour and mysterious voxels. Incredibly, Rock Paper Shotgun covered it. I doubt I would have stuck with the project without that affirmation.

Make it through Greenlight

I honestly have no idea how to replicate this feat. I don't know how it happened. Fortunately, Greenlight is much less daunting today than it was in April 2014.

I did almost nothing to promote the Greenlight page. I ran the campaign in tandem with a Kickstarter (more on that later), but almost all traffic came from Steam itself. Here's the embarassing trailer I used for both Greenlight and Kickstarter:

The game was greenlit in 16 days as part of a bundle of 75 other games, even though it hadn't reached the top 100 yet.

Posted Image

As you'll see later, if you want to make a living developing PC games, you have to get through Greenlight.

Iterate the controls

My character controller article goes into much more detail on this, but basically, take however much time you plan to spend on the controls, and then double it.

Use MIDI knobs to control game variables. Explore the game space. Use offline processing to bullet-proof your character controller.

This point applies to all games, even those that don't have a character. The player's action and the game's reaction are arguably the most important aspects of a game, because they are unique to the medium.

Design your graphics carefully

Lemma has been pretty ugly for most of its life. I got lucky with a few textures in the 2012 alpha, particularly the stone texture which features heavily in the final game. But mostly I just slapped textures on haphazardly. Here are some perfectly good textures applied in the worst possible way:

Posted Image

At the time, I knew something was wrong with this scene, but I couldn't put my finger on it. Allow me now, with the benefit of hindsight, to put my finger all over it.
  • Nowhere to climb. This is a claustrophobic indoor scene set inside some sort of derelict vessel. It belongs in Bioshock, not a parkour game.
  • Too busy. The textures are incredibly loud and detailed while the voxels are huge, flat, and boring.
  • No composition. Nothing draws my attention or invites me to explore. The shapes are all uninspired boxes.
  • Abysmal lighting and colors. If I recall correctly, I randomly placed the red point light on the left on a whim.
Compare to this shot from the final game:
Posted Image
Still some rough edges, but not overly painful.

Here's what I learned to get from point A to point B:
  • If you're like me, make up for your lacking art skills with code. God rays, SSAO, and particle effects worked wonders for me. And turn on mip-mapping for gosh sake.
  • Form and composition are more important than detailed textures. You can make a beautiful scene with just a few carefully placed shapes.
  • If you're trying to convey a massive sense of scale, your forms should have interesting features at every scale. A single, giant, featureless cube won't inspire awe. Neither will a giant cube with a detail pattern.
  • Colors and lighting make or break scenes. I probably spent as much time picking (and re-picking) colors as I did building voxels.
Support all the things

At the very least, add proper gamepad support. For me, Oculus Rift support was a huge selling point and a ton of fun for YouTubers.

Things like sparse options menus, missing gamepad support, and shoddy VR implementations enrage gamers, especially PC gamers. There's a reason TotalBiscuit starts every video with a look at the options menu.

I threw in every option I could think of, and almost every option requested by players. Y axis inversion, gamma, FOV, gamepad bindings, a framerate limiter, you name it.

Get involved with the community

I almost lost it In January 2014. Shut in my apartment for days on end, stuck in a difficult rut in production, I was going insane.

Thankfully, Columbus has a budding game development scene. I rented a desk from a local gaming incubator. The mere act of driving to work and existing around other humans got me through the winter. As an added bonus, I gained a ton of playtesters!

Posted Image

Every month I attend a local game development meetup. As a solo developer, it's the only time I get to talk openly about the topic that consumes 90% of my life. Seeing the same people every month and catching up on their progress is incredibly rewarding.
For the rest of the month, there's Twitter!

Do your own marketing

If you're like me, your marketing budget is $0.

On launch day, I spammed announcements to all of Lemma's accumulated fan base via Twitter, Facebook, IndieDB, GameJolt, Kickstarter, Steam Greenlight, YouTube, and an email list.

I spent several days collecting contact info by hand for various press and YouTubers. Whenever possible, I automated the process with Python and Javascript. Some resources I used:I pulled everything into a Google Docs spreadsheet and ran a mail merge on it two weeks before the launch. Somewhere around 400 Steam keys ended up being activated.

YouTube ended up bringing the most traffic. Over 400 videos have been uploaded to date, totalling over 5 million views, mostly thanks to three huge videos posted by jacksepticeye.

Ship your localization strings in plain text

Lemma uses Excel files for localization. I use a third-party library to read them, which makes the code pretty simple.
This ended up being a great decision, because foreign players step up with their own volunteer translations. They can edit the Excel files in place and see the results immediately in-game.

Things to never do, ever

Run a Kickstarter

As Greenlight becomes easier to conquer, Kickstarter becomes exponentially more difficult. Backers have been burned too many times by now, and everyone sets their goals much lower than the amount they need to deliver on their promises.

I ran a failed Kickstarter for Lemma in March 2014. Originally, I planned to abandon the game if the Kickstarter failed. Then the Greenlight went through and I decided to cut back the budget, take some contract work, and do some budgeted art items (namely the character model) myself.

Running a Kickstarter takes too much time away from development. My advice is to find another way to fund your game if at all possible.

Write a pretentious story

If you're making Deus Ex, feel free to go wild with gritty lore and philosophical questions about trans-humanism. But if you're making Flappy Bird, you can get away with maybe a 10 second cut scene, tops. Know how much story your game can "afford".

The story of Lemma features quantum mechanics, the Philadelphia Experiment, life and death choices, infidelity, betrayal, and jealousy. All this crammed into 50 optionally collectible notes in a game about parkour.

The story tries to do too much. When all these conflicting ideas combine, they blur together into a jumbled mess that neutralizes the impact of each individual idea.

When it comes to story, do one thing, and do it well.

Suddenly switch from linear to non-linear design half-way through
I planned this from the beginning, actually. The first half of the game is linear so I can introduce mechanics one at a time. The player knows everything by the second half, so the game opens up into a non-linear cornucopia of levels that review the things you've learned so far.

This is a pretty good pattern as far as pacing, but the linear to non-linear transition confuses players. The whole first half teaches you that there's one way to "win", then suddenly, you're dropped out in the cold and left to your own devices with a completely incomprehensible world map.

Posted Image
Wait, what?

I did this because I thought, "this game is about exploration, it needs to be more non-linear". But all of the alpha releases were completely linear and not a single player complained about it. In fact, many of them commented that they enjoyed how each individual level could be cleared in many different ways.

The takeaway is, there are tons of ways to make your game feel more non-linear than it is, without building a confusing tangle of interconnected levels.

Design bad puzzles

My worst puzzles break the game rules. If you have to write a custom script that manually pokes the game state when the player solves the puzzle, stop and re-think your life decisions.

I'm always worried that my puzzles are too easy and that players will breeze through them too quickly, but in reality it doesn't take much to slow players down.

Often, the simple act of exploring a 3D space is enough of a puzzle. Games like this are a continuous conversation between level designer and player. It's enough of a challenge for the player to parse what the level designer is saying.

Throw in unnecessary enemies

Enemies have been a part of Lemma since day one. I love watching players encounter them for the first time, because they're truly terrifying. That small taste of horror shakes things up and fits perfectly into the pacing.

But after the novelty wears off, enemies become annoying and redundant. There's no combat; your interaction with them is always the same: run away.

My goal was always to integrate enemies seamlessly with the environment. In parkour, the environment is already your biggest enemy and your most powerful tool, so it makes sense. Unfortunately, I only came up with one enemy that came anywhere close to achieving this goal: a sort of tower that detaches from the environment and falls on you.

In hindsight, I should have been more confident in the core gameplay and remove the enemies to focus on better level design.

Spend time on an unnecessary level editor

Some games benefit hugely from a level editor. Heck, Garry's Mod is a level editor. If you're making that type of game, more power to you.

If you're making a mostly linear, story-driven singleplayer game, a level editor doesn't make much sense. Everyone says "oh cool, there's a level editor", creates a few cubes, and then completely forgets about it.

Again: do one thing, and do it well.

Start a hobby project and transition it to professional

Hobby game development is like building a tower of bricks. You don't know how tall it's going to be, you just keep stacking bricks. Each brick represents something that happens to interest you at the time. Branching dialogue? Sure, stack it on there. A pistol? Why not. Pretentious story? Check-a-mundo.

Professional game development is like sculpting. You start with a certain amount of raw materials: time, money, motivation, player attention, etc. You plan out a rough idea of your sculpture, then you start chiselling. The size of the sculpture is irrelevant if you put your chisel in exactly the right spot.

The two paradigms are incompatible. If you're a hobbyist looking to make the switch, consider starting fresh with a new project. Results Steam
The second spike in these graphs is mostly due to jacksepticeye's Let's Play videos.


Posted Image
Sales: 3,171

Posted Image

Gross revenue before 30% cut: $43,554

Max simultaneous players: 63

Posted Image

Demo downloads: 10,126

Posted Image

Demo conversions: 277 (2.7% conversion rate)

Posted Image

Max simultaneous demo players: 53

Posted Image

Key activations: 483

Positive reviews: 77 (91%)

Negative reviews: 7

Refunds: 68

Posted Image


Posted Image

Demo downloads: 1,896

Sales: 46

Gross revenue before 5% cut: $701

Humble widget (direct website sales)

Posted Image

Sales: 37

Posted Image

Gross revenue before 5% cut: $557


Sales: 4

Gross revenue before 30% cut: $57


Demo downloads: 1,388
Lemma offers the option to anonymously upload analytics. 5,732 out of 13,410 demo downloaders (43%) actually opened the game and opted in to the analytics program.

A total of 7,310 pirated copies of the game have submitted analytics data to my server. Assuming 43% of pirates opt in to the analytics, I estimate about 17,000 people have pirated the game, for a piracy rate of 82%.

The worst part about piracy is that torrents cannot be updated, which means YouTube is full of footage of old, outdated builds.

Posted Image
  • Schedule: 3 years part-time, 1.5 years full-time
  • Core team members: 1
  • Contractors: 6
  • Budget: $30,000
  • Sunk opportunity cost: $80,000
  • Lines of code: 55,000
  • Audio assets: 200
  • Git revisions: 1,200
Lemma released May 12. The entire game engine is on GitHub. If you enjoyed this article, try these:Thanks for reading!

It's done!

It took nearly 5 years, $30,000, and most of my sanity, but the game is finally done! Check it out: lemmagame.com

Look for a more in-depth postmortem... soon. It's 6:30am and I haven't slept yet.


Lemma is finally coming to Steam on May 12. Check out the new trailer:

For the first time ever, I shelled out for Adobe Premiere rather than hacking something together in Movie Maker and OpenShot. I didn't use any of the fancy features, but it was worth it just to avoid dealing with crashes all the time. Although it still did crash (it's Adobe after all).

Ashton Morris did a number on the trailer audio. I had something entirely different all ready to go before he came in with something way better.

I've been collecting contact info for some time. I mostly just went through the following lists, pulling out info for people who might be interested in my game:Video Game Caster lists emails right there on the page. I pulled them out using a jQuery one-liner in the Chrome web console. I noticed a lot of them seemed to be inactive, so I used BeautifulSoup to query each YouTube page and determine when their last two videos were uploaded.

I spent today blasting out emails. I forked over $25 to increase the daily mail quota for a Google Sheets mail merge add-on, only to find out that a) the quota did not actually increase, and b) you can write your own mail merge in about 20 lines of Javascript. What a scam.

In Sheets, just hit Tools -> Script editor. Here's my script, I was blown away by this:
var EMAIL_SENT = "Yes";

function onOpen()
  var spreadsheet = SpreadsheetApp.getActive();
  var menuItems = [
    {name: 'Go', functionName: 'mail_merge'}
  spreadsheet.addMenu('Mail merge', menuItems);

function mail_merge()
  var draft = GmailApp.getDraftMessages()[0];
  var subject = draft.getSubject();
  var template_html = draft.getBody();
  var template = draft.getPlainBody();
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var column_map = {};
  for (var i = 0; i < data[0].length; i++)
    column_map[data[0][i]] = i;
  for (var i = 1; i < data.length; i++)
    var row = data[i];
    var email = row[column_map.Email];
    if (email)
      var sent = row[column_map.Sent];
      if (sent != EMAIL_SENT)
        var message = template;
        var message_html = template_html;
        for (var column in column_map)
          message = message.replace('{{' + column + '}}', row[column_map[column]]);
          message_html = message_html.replace('{{' + column + '}}', row[column_map[column]]);
        GmailApp.sendEmail(email, subject, message, { htmlBody: message_html });
        sheet.getRange(i + 1, column_map.Sent + 1).setValue(EMAIL_SENT);
        // Make sure the cell is updated right away in case the script is interrupted
It marks rows that have been sent to by putting "Yes" in the "Sent" column. It pulls the email address from the "Email" column. Those are the only two hard-coded values. I wrote the email template as a draft in Gmail. The outgoing messages all appear in your Gmail "sent" folder.

The only downside is that Gmail also has a 100 email per day quota. In my scramble to get these emails out, I had to pay for and set up Google Apps for Work on my domain to increase the quota to 1500 per day. Turns out, the quota still didn't increase, but between my personal Gmail and the Google Apps account, I was able to hit most of my contact list. I'll get the rest tomorrow.

I had no idea PR would be so much fun! It's just like programming!

Just kidding, it's still horrible.

Thanks for reading!

The Poor Man's Character Controller

Let's say that, like so many of us, you want to make a surreal voxel-based first-person parkour game. You're trying to figure out a production schedule. What will take the longest? Graphics? Sound? Level design? I bet it will be the character controller. And I bet it will take 4½ years. Why?
  • In running/jumping games, player movement is paramount. It takes forever to nail the right feeling.
  • Each game is a unique snowflake. You will not find an article explaining how to design the controls for your specific game. You're flying blind.
That said, each game offers a few transferrable bits of wisdom. Here's my story.

Make a character

You're a programmer, but one time you were able to suppress the gag reflex while using GIMP, so you're pretty much an artist too. You can draw a player character.
Posted Image
That's certainly... a drawing. So the player is an anthropomorphized cylinder? Well, we've seen worse.
If this character has any flaw, it's that he's too exciting and interesting. Can you make him a little more boring and generic? What if you use MakeHuman? It literally generates human characters from a template.
Posted Image

Much better. But there's just one problem: this is a first-person game, so when players look down, they can see their own nose:
Posted Image

Also, the "pectoral musculature" slider is a tad high, and players are getting confused about their gender.
You end up switching to a female character. Because why not?
Now for the nose problem. You can't remove the entire head, because a headless shadow might be somewhat disconcerting. What if you just remove the face?

Posted Image


(Eventually you revamp the model, hire an animator, and use separate models, one sans head, for the first-person view and shadow renderer. But none of that is entertaining.)

Make it move

You're using a great physics engine (seriously, it's quite good) that comes with a simple character controller. It looks like this:
Posted Image
The character is a cylinder floating above the ground, supported by a single raycast. This way, the cylinder can clear a small obstacle, and once the raycast hits it, the whole apparatus jumps on top.
Since the game world is made of voxels, you quickly run into this problem:

Posted Image

Tons of players get stuck this way in your first alpha release. Rather than spend time on an elegant solution, you brute-force it:
Posted Image

Despite this, people still get stuck. You resort to a collision handler that actually pushes the character away from anything that could cause problems. You also interpolate the vertical position to smooth out the camera when traversing uneven voxels:
Posted Image

Make it unrealistic

In an attempt to model reality accurately, the game has no air control at this point. When you originally made this decision, you somehow forgot that the game is about an imaginary cube world.
Thankfully, after listening to player feedback, you have a change of heart. In the real world, traceurs have many control dimensions (namely, their muscles) that enable precise jumps. Video games have exactly one button. Air control is only fair.

Make it fun

Since parkour is about momentum, you want the character to take several seconds to reach max speed. Which is fine, except that low acceleration makes small adjustments difficult. The first step takes forever, and the character feels like a semi truck.
Your solution uses different accelerations depending on the current speed. The final speed curve looks like this:
Posted Image

This solves half the problem, but players can still use the mouse to quickly whip the camera around 90+ degrees, which resets their speed back to zero.
You experiment with a few hacks, but eventually settle on a solution using the dot product. It's basically a measure of the angle between two vectors multiplied by their magnitude. (Here's a quick interactive demo.)
You use a dot product to find out how much side-to-side momentum the character has. If they're facing perpendicular to the direction of their momentum, the dot product will be large. You use that to increase the acceleration. Long story short, turning no longer burns momentum.

Make it slippery

There are other ways to lose momentum, like running into a brick wall. You try to mitigate this with low friction physics materials, but angling yourself into a wall will always slow you down:

Posted Image

You are inspired by a blog post by Mike Bithell on this topic. You use three raycasts and some cross product magic to figure out a velocity that will slide along the wall.

Posted Image

Later on, you discover another annoyance. Your wonderful voxel engine sometimes helpfully constructs voxels like this:
Posted Image

There's a seam between the two adjacent blocks due to floating point error. When the character moves flush with the wall and tries to jump upward, it hits the seam and immediately stops.
The solution is brain-dead simple: change the cylinder to a capsule. Yes, it really does take you 4 years to figure this out.

Make it forgiving

At first, players just don't understand the movement mechanics. They think they can't get from point A to point B, until you tap them on the shoulder and explain they have to do XYZ. You suspect this is because your tutorial is actually a placebo at this point.
Eventually, the tutorial gets pretty good. Everyone understands the movement capabilities, and they can figure out which moves to use. But now they have a new problem: they fail in the twitchy execution and timing details of their plans.
The worst culprit is a single infamous jump in the tutorial. It tries to teach players how to grab ledges because it's too long to cross with a normal jump.
Posted Image

Players fail two or three times before you tell them to "button-mash", which helps them nail the timing through sheer brute-force. Interestingly, as soon as they make this one jump, they have no trouble completing future jumps without button-mashing. For a while, you arrogantly conclude that people are just stupid.
Part of the problem is still the tutorial: you ask players to make a leap of faith and perform a move they've never seen before. They have no idea what the character will do or how long it will take. So you add another, earlier tutorial that lets players try out the ledge grab in a safe space.
But the frustration of perfect timing remains. The solution is two-fold:
  • Let players jump for a split second after they walk off an edge.
  • Let them hold buttons instead of tapping at the right moment.

Posted Image

To the surprise of no one but you, this makes the game a lot less frustrating and a lot more fun.

Make it look good

Over the course of development, you stumble on a few animation tricks. With enough nifty procedural animation, maybe people won't notice your shoddy weight painting and texture work!
  • Attach the camera position to the character's head bone, but use a separate root bone to control camera rotation. This eliminates weird rotations when blending between animations.
  • Speaking of which, use a quadratic curve to blend between animations rather than straight linear.
  • Also, don't use linear matrix interpolation. Instead use quaternion interpolation.
  • Remember the dot product from earlier, for calculating side-to-side momentum? Use that to make the character and camera lean when turning at speed.
  • Run the character bone transforms through filters for nice effects like tilting the character's head when looking up and down.
  • Plant the character's feet and play a little foot-shuffling animation when turning in place.
(For a much more eloquent and in-depth look at procedural animation, check out David Rosen's GDC talk.)


Posted Image

Budget an extraordinary amount of time for your character controller. Make it special and unique. And if you're me, prepare to be wrong most of the time.
Lemma is set to release in May. The entire game engine is on GitHub. If you enjoyed this article, try these:Thanks for reading!

Mirrored on my blog

Screenshot Saturday 213

It's the end of February and this game is supposed to be content-complete. In a sense, it actually is. All the levels are done. Twenty in all. I thought this month would never end!

Posted Image

Just so you know, there are sixty of those lights and I had to hook up each one individually. It fell just barely beneath the "worth it to automate" threshold.

Don't look too closely at this next one, it's a bit spoilery.

Posted Image

All that remains is to fill out a few story elements and wrap this thing up with some semblance of a satisfying ending.
I actually made some very interesting improvements to the parkour code in the past few weeks, but I'll save it for the upcoming character controller article.

For now, that's all I got! Thanks for reading.

Mirrored on my blog

The Poor Man's Voxel Engine

This is not a tutorial. It's a story. A Voxel Odyssey.
The story starts with 19 year old me in a dorm room next to the Ohio State stadium. I don't have the repo from this stage of development (SVN at the time), but I remember the process clearly.

Posted Image

Photo by Kristen Sutton
XNA 4 comes out in September 2010. I immediately dive in. This turns out to be a poor life decision.
For some reason, one of the very first things I implement is motion blur. I think this is Lemma's first screenshot, although at this point, it's a cartoony third-person game called "Parkour Ninja":

Posted Image

Such motion blur
I skip past the initial naive implementation of spawning a cube for each voxel cell. My first move is to iterate over these individual cells and combine them into larger boxes using run-length encoding.
Posted ImagePosted ImagePosted Image
Performance is already a problem even at small scales. I'm re-optimizing the entire scene every time I make an edit. Obviously, my next move is to only optimize the parts I'm editing.
This turns out to be difficult. Take this example:
Posted Image
I add a cube at the top of this stack. To optimize this into a single box, I have to search all the way to the bottom of the stack to find the beginning of the large box, then add 1 to its height and delete my little cube addition.
To speed this process up, I allocate a 3D array representing the entire scene. Each cell stores a pointer to the box it's a part of. Now I can query the cells immediately adjacent to my new addition and quickly get a pointer to the large box next to it.
Removals are the next challenge. My first idea is to split the box back into individual cells, then run the optimizer on them again.
Posted ImagePosted ImagePosted ImagePosted Image
This turns out to be horribly slow. I soon realize that rather than splitting the box into individual cells, I can skip a few steps and split it into "sub-boxes". I still run the optimization algorithm afterward, but I can make its life easier.
I am thrilled to get this demo running on my roommate's Xbox 360. It fails to impress the girls in the neighboring suite.

Goodbye Xbox
I quickly run into more issues. The CLR's floating point performance is absolutely abysmal on Xbox 360. The physics engine breaks down and cries after spawning 10 boxes or so. I decide to target PCs instead.


I render scenes by copying, stretching, and scaling a single cube model. Slapping a texture on this cube turns out to be a horrible idea that looks something like this:
Posted Image
To avoid texture stretchiness, I eventually write a shader to generate UVs based on the position and normal of each vertex. Here's the final version for reference:
float2x2 UVScaleRotation;
float2 UVOffset;
float2 CalculateUVs(float3 pos, float3 normal)
    float diff = length(pos * normal) * 2;
    float2 uv = float2(diff + pos.x + (pos.z * normal.x), diff - pos.y + (pos.z * normal.y));
    return mul(uv, UVScaleRotation) + UVOffset;
Next, another performance crisis. Somehow I realize that doing a whole draw call for each and every box in a scene is a Bad Idea™. So I take the obvious step and... use hardware instancing. Yes.
Incredibly, I'm actually proud of my work so far. Somewhere around this time I also discover my lifelong love of terrible music.

Local multiplayer? What is this game?

Improved level format
At this point, I'm saving and loading levels via .NET's XML serialization. Apparently XML is still a good idea in 2010. The voxel format is simply a 3D array represented as an XML string of ASCII 1s and 0s. Every time I load a level, I have to re-optimize the entire scene. I solve this by storing the boxes themselves in the level data as a base64 encoded int array. Much better.

Per-surface rendering
I start building larger levels and run into another graphics performance problem. The engine is simply pushing too many triangles. In a complex scene, a significant chunk of boxes are surrounded on all sides by other boxes, completely hidden. But I'm still rendering them.
I solve this problem by breaking each box into its individual faces. I actually iterate across the whole surface to determine what parts (if any) are visible. Shockingly, this turns out to be terrifically slow. I eventually mitigate the issue by caching surface data in the level file.
I render all these surfaces via... drum roll... instancing. Yes, really. I open Blender, create a 1x1 quad, export it, and instance the heck out of that thing. These are dark times. But I'm finally able to render some larger landscapes:

Posted Image

Time to see some cool physics. I now have two kinds of voxel entities: the static voxel, represented in the physics engine as a series of static boxes, and the dynamic voxel, represented as a single physics entity with a compound collision volume constructed of multiple boxes (I should plug the incredible BEPUPhysics for making this possible). It works surprisingly well:

Around this time I also switch to a deferred renderer, which is why I spawn an unreasonable number of lights at the end of that video.

Chopping down trees
Now it's time to take physics to the next level. My goal is simple: I want the player to be able to cut down a tree and actually see it fall over, unlike Minecraft.
This lofty dream is basically a graph problem, where each box is a node connected to its adjacent neighbors. When I empty out a voxel cell, I need a fast way to determine whether I just partitioned the graph or not.
So I add an adjacency list to the box class. Again, shockingly, calculating adjacency turns out to be a huge bottleneck. I later cache the adjacency data in the level file, which eventually balloons to several megabytes.
Now every time the player empties out a voxel cell, I do a breadth-first search through the entire scene, marking boxes as I go. This allows me to identify "islands" created by the removal of the voxel cell. I can then spawn a new physics entity for that island and break it off from the main scene.
I eventually come up with the idea of "permanent" boxes, which allows me to stop the search whenever I encounter a box that cannot be deleted.
I design a new enemy to showcase the new physics capabilities. I also test the limits of awkwardness and social norms by narrating gameplay footage in a dorm room full of people studying.

Around this time I learn about broadphase collision detection. My engine is scattering thousands of static boxes around the world, which makes it difficult for BEPUPhysics' broadphase to eliminate collision candidates. At the same time, it's becoming obvious that rendering the entire world in a single draw call is a bad idea.
I fix both of these issues by splitting the world into chunks. Each chunk has a static triangle mesh for physics, and a vertex buffer with basically the same data for rendering. Since I have to regenerate both of these meshes every time a chunk is modified, the chunk size can't be too large. Also, smaller chunks allow for more accurate frustum culling.
At the same time, the chunks can't be too small or the draw call count will explode. After some careful tuning I eventually settle on 80x80x80 chunks.
Posted Image

Partial vertex buffer updates
This is probably the low point.
By now, I am incredibly proud of my "loosely coupled" architecture. I have a Voxel class and a DynamicModel class which know nothing about each other, and a ListBinding between the two which magically transforms a list of Boxes into a vertex buffer.
Somehow, probably through questionable use of the .NET Timer class, I locate a bottleneck: re-sending an entire vertex buffer to the GPU for every voxel mutation is a bad idea. Fortunately, XNA lets me update parts of the vertex buffer individually.
Unfortunately, with all the surface culling I do, I can't tell where to write in the vertex buffer when updating a random box. Also, how to shoe-horn this solution into my gorgeous cathedral architecture.
This conundrum occurs during the "dictionary-happy" phase of my career. Yes. The ListBinding now maintains a mapping that indicates the vertex indices allocated for each box. Now I can reach into the vertex buffer and change things without re-sending the whole buffer! And the voxel engine proper still knows nothing about it.
This turned out to never really work reliably.
I lied earlier, this is probably the low point.
Voxel mutations cause noticeable stutters by now. With no performance data to speak of, I decide that multithreading is the answer. Specifically, the worst kind of multithreading.
I spawn a worker thread, sprinkle some locks all over the place, et voilà! It's multithreaded. It gains perhaps a few milliseconds before the main thread hits an unforeseen mystical code path and the menu somehow manages to acquire a lock on the physics data.
I am ashamed to admit that I never got around to correcting this colossal architectural faux pas.
Posted Image

Large Object Heap
I'm now building large enough levels to run into memory issues. Turns out, the .NET runtime allocates monstrous 80x80x80 arrays differently than your average object. I write more about this here.
Long story short, the garbage collector doesn't like to clean up large objects. I end up writing a custom "allocator" that hands out 3D arrays from a common pool. Later, I realize most of the arrays are 90% empty, so I break each chunk into 10x10x10 "sub-chunks" to further reduce memory pressure.
This episode is one of many which explain my present-day distaste for memory-managed languages in game development.

I graduate and work at a mobile game studio for the next year. The engine doesn't improve much during this time, but I start to learn that almost everything I know about programming is wrong and incomplete.

Posted Image
One of the many "offices" I've worked in over the years
I leave my job in February 2014 and continue hacking the engine. By now it's over 30k LOC and I am morally and spiritually unable to start over on it.

Goodbye allocations
With my newfound awareness of the .NET heap, I realize that my vertex arrays for physics and rendering are also probably landing in the Large Object Heap. Worse, I am reallocating arrays every time they change size, even if only to add a single vertex.
I genericize my Large Object Heap allocator and shove the vertex data in there. Then, rather than allocating arrays at exactly the size I need, I round up to the next power of 2. This cuts the number of allocations and makes it possible for my allocator to reuse arrays more often.

Goodbye cathedral
I finally throw out the "loosely coupled" ListBinding system and pull the vertex generation code into the voxel engine itself. The resulting speed boost is enough for me to go back to re-sending entire vertex buffers rather than faffing about with partial updates.

Goodbye index buffer
Up to this point, I've been maintaining an index buffer alongside the vertex buffer. In a much overdue stroke of "genius", I realize that since none of the vertices are welded, the index buffer is just a constantly repeating pattern, which is in fact the same for every voxel.
I replace the individual index buffers with a single, global buffer which gets allocated to the nearest power of 2 whenever more indices are needed.

Bit packing and compression
Many numbers in the level data format are guaranteed to fall in the 0-255 range. My friend decides to pack these numbers more efficiently into the integer array. He writes about it here.
I also pull in third party library #27 (SharpZipLib) and start zipping the level files. These changes cut the file size to under 30% of the original.

Goodbye UV optimization
I've been storing a huge amount of surface data like this:
class Box
    public struct Surface
        public int MinU, MaxU;
        public int MinV, MaxV;
    public Surface[] Surfaces = new Surface[]
        new Surface(), // PositiveX
        new Surface(), // NegativeX
        new Surface(), // PositiveY
        new Surface(), // NegativeY
        new Surface(), // PositiveZ
        new Surface(), // NegativeZ
I do this so that I can resize surfaces that are partially hidden, like this:
Posted ImagePosted Image
At some point in the vertex buffer overhaul, I realize that performance-wise, the physics engine doesn't care what size the surface is.
I use this fact to speed up mesh generation. I generate 8 vertices for the corners of each box, then copy them where they need to go in the vertex buffers.
Really, the graphics engine doesn't care much about the size of the surface either, aside from fill rate. What matters is whether the surface is there or not.
With this in mind, I kill the UV optimization code and store the surfaces in memory and in the level file like this:
class Box
    public int Surfaces;
The bits of the int are boolean flags for each surface. Yes, I could do it in a byte. Actually, maybe I should do that. Anyway, this simplifies my level loading and saving code, cuts my file sizes down to about 512kb on average, and drastically reduces memory usage. Axing the UV optimization routine also speeds up mutations.

Posted Image

Clearly, this article is mostly useless if you're interested in writing your own voxel engine. The final result is far from perfect. I just want to share the petty drama of my past four and a half years. I for one thoroughly enjoy reading about other people's struggles. Maybe that's weird.
Lemma is set to release May 2015. The entire game engine is on GitHub. If you enjoyed this article, try these:Thanks for reading!

Mirrored on my blog