• entries
65
133
• views
77998

Chairs, spaceships, and mostly-coherent technical rambling. What could possibly go wrong?

## Level 2 Boss-In-Progress

As usual, mirrored from the original at drilian.com, which has nice embedded videos instead of boring links!

Just another quick update to show off a pair of videos of the in-progress level 2 boss.

First, note that the textures that I have on there currently are horrible, and I know this. It's okay.

The first video was simply to test the independent motion of the various moving parts of the boss:

Procyon: Level 2 Boss Motion Test at Vimeo

aaaaand the second shows off the missile-launching capabilities of the rear hatches (complete with brand-new missile effects and embarassing flights into the direct path of the missiles!):

Procyon: Boss2 Missile Launcher Test at Vimeo

That's all for now!

## Update For The Past N Months

(As usual, this is mirrored from the original at drilian.com)

Yeesh.
It's been a while.
What have I been up to since early June? Quite a bit. On the non-game-development side of things: work's been rather busy (still). Also, I now own (and have made numerous improvements on) a house! So that's been eating up time. But, that's not (really) what this site is about.

#### Invisible Changes

Much of my time working on Procyon recently has been spent doing changes deep in the codebase: things that, unfortunately, have absolutely no reflection in the user's view of the product, but that make the code easier to work with or, more importantly, more capable of handling new things. For instance, enemies are now based on components as per an article on Cowboy Programming, making it super easy to create new enemy behaviors (and combinations of existing behaviors). It took quite a few days of work to do this, and when I was finished, the entire game looked and played exactly like it had before I started. However, the upside is that the average enemy is easier to create (or modify).

#### Graphical Enhancements

I've also made a few enhancements to the graphics. The big one is that I have a new particle system for fire-and-forget particles (i.e. particles that are not affected by game logic past their spawn time). It's allowed me to add some some nice new explosion and smoke effects (among other things):

Also, I used them to add some particles at the origins of enemies firing beams:

Additionally, particles now render solely to an off-screen, lower-resolution buffer, which has allowed the game to run (thus far) at true 1080p on the Xbox 360.
Also, I decided that the Level 1 background (in the first two images in this post) was hideously bland (if such a thing is possible), so I decided to redo it as flying over a red desert canyon (incidentally, the walls of the canyon are generated using the same basic divide-and-offset algorithm as my lightning bolt generator):

#### Texture Plus Generator Equals Texture Generator

The big thing I've done, though, was put together a tool that will generate the HLSL required for my GPU-generated textures, so that I didn't have to constantly tweak HLSL, rebuild my project, reload, etc. Now I can see them straight in an editor (though not, yet, on the meshes themselves - that is on my list of things to do still). It looks basically like this:

The basic idea is, I have a snippet of HLSL, something that looks roughly like:
Name: BrickFunc: BrickCategory: Basis FunctionsInput: Position, var="position"Output: Result, var="brickOut"Output: Brick ID, var="brickId"Property: Brick Size, var="brickSize", type="float2", description="The width and height of an individual brick", default="3,1"Property: Brick Percent, var="brickPct", type="float", description="The percentage of the range that is brick (vs. mortar)", default="0.9"Property: Brick Offset, var="brickOffset", type="float", description="The horizontal offset of the brick rows.", default="0.5"%% float2 id; brickOut = brick(position.xy, brickSize, brickPct, brickOffset, id); brickId = id.xyxy;

Above the "%%" is information on how it interacts with the editor, what the output names are (and which vars in the code they correspond to), and what the inputs and properties are.
Inputs are inputs on the actual graph, from previous snippets. Properties, by comparison, are what show up on the property grid. I simplified the inputs/outputs by making them always be float4s, which made the shaders really easy to generate.
Then, there's a template file that is filled in with the generated data. In this case, the template uses some structure I had in place for the hand-written ones. The input and output nodes in the graph are based on data from this template, as well. In my case, the inputs are position and texture coordinates, and the outputs are color and height (for normal mapping).

So a simple graph like this:

...would, once generated, be HLSL that looks like this:
#include "headers/baseshadersheader.fxh"void FuncConstantVector(float4 constant, out float4 result){ result = constant;}void FuncScale(float4 a, float factor, out float4 result){ result = a*factor;}void FuncBrick(float4 position, float2 brickSize, float brickPct, float brickOffset, out float4 brickOut, out float4 brickId){ float2 id; brickOut = brick(position.xy, brickSize, brickPct, brickOffset, id); brickId = id.xyxy;}void FuncLerp(float4 a, float4 b, float4 t, out float4 result){ result = lerp(a, b, t.x);}void ProceduralTexture(float3 positionIn, float2 texCoordIn, out float4 colorOut, out float heightOut){ float4 position = float4(positionIn, 1); float4 texCoord = float4(texCoordIn, 0, 0); float4 generated_result_0 = float4(0,0,0,0); FuncConstantVector(float4(1, 1, 1, 1), generated_result_0); float4 generated_result_1 = float4(0,0,0,0); FuncConstantVector(float4(0.5, 0.1, 0.15, 0), generated_result_1); float4 generated_result_2 = float4(0,0,0,0); FuncScale(position, float(5), generated_result_2); float4 generated_brickOut_0 = float4(0,0,0,0); float4 generated_brickId_0 = float4(0,0,0,0); FuncBrick(generated_result_2, float2(3, 1), float(0.9), float(0.5), generated_brickOut_0, generated_brickId_0); float4 generated_result_3 = float4(0,0,0,0); FuncLerp(generated_result_0, generated_result_1, generated_brickOut_0, generated_result_3); float4 color = generated_result_3; float4 height = float4(0,0,0,0); colorOut = color.xyzw; heightOut = height.x;} #include "headers/baseshaders.fxh"

Essentially, each snippet becomes a function (and again, if you look at FuncBrick compared to the brick snippet from earlier, you can see that the input comes first (and is a float4), the properties come next (With types based on the snippet's declaration), followed finally by the outputs. Once each function is in place, the shader itself (in this case, the actual shader is inside of headers/baseshaders.fxh included at the end, but that simply calls the ProceduralTexture function just before that) calls each function in the graph, storing the results in unique values, and passes those into the appropriate functions later down the line.

#### More Content

In addition to all of that, I have also completed the first draft of level 2's enemy layout (including the level's new enemy type), and am working on the boss of the level:

Once I finish the boss of level 2, I'll start on level 3's gameplay (and boss). Once those are done and playable, I'll start adding the actual backgrounds into them (instead of them being a simple, static starfield).

#### Bug Tracking

Finally, I've started an actual bug tracking setup, based on the easy-to-use Flyspray, which I highly recommend for a quick, easy-to-setup web-based bug/feature tracking thing.
In the case of the database I have, I've linked to the roadmap, which is the list of the things that have to be done at certain stages. I am going to try, this year, to submit my game for the PAX 10 this year, so I have my list of what must be done to have a demo ready (the "PAX Demo" version), and the things that, additionally, I'd really like to have ready (the "PAX Demo Plus" version). Then, of course, there's "Feature Complete", which is currently mostly full of high-level work items (like "Finish all remaining levels" which is actually quite a huge thing) that need to be done before the game is in a fully-playable beta stage.

In short: I'm still cranking away at my game, just more slowly than I'd like.

## Networking Is Hard (Part 3)

As usual, mirrored from drilian.com

In my previous two posts, I started looking into what it would take to code the networking for my game, and came up with a first draft, before realizing that floating-point discrepancies between systems totally threw my lockstepping idea for a loop.

#### Lockstepping With Collisions

In order to solve the issue with different systems having different floating-point calculation results, I decided to somewhat revamp the network design, and really leverage the fact that I don't care so much about cheating - you could never get away with a networking scheme like this in a competetive game.
• First, the timing of the scrolling, player bullet fire rates, enemy fire rates, etc were all modified to be integer-based instead of float-based. There are no discrepencies in the way that integer calculations happen from machine to machine, so the timings of things like enemy spawning, level scrolling, etc. are now all perfectly in sync, frame by frame.
• Next, when the client detects a collision between entities, it sends a message to the server (which, you may recall, is running on the same system as the client - each machine gets one) notifying it of a collision. These messages are also synced across the network.
• Thus, whenever an enemy does on any one client, it dies on both servers on the same tick (that is, the "first" client relative to the game clock to detect a collision determines when the collision took place). This means that there are no longer any timing differences between deaths on each server (and if a collision happens to be missed by one client, but hits on the other, it will eventually reach the other client).
• Because players were already sending "I died!" flags as part of their network packets, these were already always perfectly in sync, so no change was needed there.

As an added bonus, since all collision detections are now handled by the client (and communicated to the server), the server never has to do any collision detection calculations on its own, which eases up on the CPU load somewhat (previously, the client and server were both doing collision calculations). All the server has to do now is apply collisions reported by either the local client or the remote client.

So now, the actual mechanism by which the game keeps in sync from system to system is set, but how does it handle the three main enemies of the network programmer?

#### Network Gaming's Most Wanted #1 - Latency

"Lag" is one of the most dreaded words in the network gaming world. It's always going to be present - nothing can communicate across the internet faster than the speed of light (and, because of transmission over copper, it's really more like a sluggish 2/3rds the speed of light!). Routers and switches also add their own delays to the mix. According to statistics gathered by Bungie from Halo 3's gameplay, most gamers (roughly 90%)end up with a round-trip latency of less than (or equal to) 250ms. That is, it takes a quarter of a second for data to go from System A to System B and back to A. That's a long time for a fast-action networked game! Thankfully, because messages sent from system to system in this game's network design are never dependent on messages from the other, nothing has to round trip, so the latencies can effectively be halved, making the system much better at handling lag (because, quite frankly, there's just less of it!)

As discussed previously, because the client can run ahead of the server and, thus, process local player input immediately, there's no latency in what the player presses and what actually happens on-screen. But what about how the remote player's actions look? With a ping under 100ms, there are next to zero visible discrepancies on the system. That is, low-ping games are virtually indistinguishable from locally-played games.

At around the 400ms ping mark, it does start to become obvious that things aren't quite right - due to the interpolation of remotely-shot bullets, they accelerate up to a certain part of the screen until they reach their known location then slow down to normal speed, which is fairly noticeable (I'm still trying to smooth this out a touch). When enemies get too close to the remote player as it fires, due to the delay, the bullets will collide with the enemy, but the enemy will live longer than it appears it should (because the local client does not reliably know that the remote bullet is actually still alive, it can't deal actual damage to the enemy, it has to wait for the server to confirm).

Above 1-2 seconds of latency, all bets are off - the local player will find the game still perfectly playable, but the movements of the remote player will be completely erratic, and remotely fired bullets won't act at all like they should. But, since 90% of gamers have much lower latencies, this is not really an issue. For the majority of gamers, the game will look and play pretty close to how it would if both players were in the same room.

#### Network Gaming's Most Wanted #2 - Packet Loss

Latency's lesser-known brother is packet loss, which is where data sent from one machine to another never makes it (due to routing hardware failure, power outage, NSA interception, alien abduction, etc). On a standard internet connection, you can generally assume that about 10% of the packets that you send will get lost along the way. Also, just because you send Packet A before Packet B doesn't mean they'll arrive in the right order - a machine might get a packet sent later before one that was sent earlier.

With the XNA runtime, there are four different methods that you can send packets with (obviously you can mimic these with any networking setup, but I'm using XNA so it's my frame of reference here):
• Unreliable - the other system will get these in potentially any order, or it may not even get them at all. The name says it all - you can't rely on these packets. This is probably not the best option to use.
• In-Order - These packets are for data for which you really only need the most recent data; you only care about the most recent score, for instance - not what the score was in a previous packet. Thus, these packets contain extra version information so that the XNA runtime can ensure that packets that arrive out of order don't reach you. As soon as a new packet comes in, it becomes available to the game. If a packet that's older than the most recent one comes in, it's discarded. You immediately get new ones at the cost of never getting older ones. For many games, this is a perfect scenario.
• Reliable - These packets will always arrive. When the XNA runtime receives one of these, it sends an acknowledgement to the other system that it received it. If the system that sent it doesn't receive such an acknowledgement, it'll resend (and resend and resend and...) until it finally arrives at the destination. Packets sent reliably are not vulnerable to packet loss; if you send it, as long as the connection remains valid you know it will reach the destination eventually. However, these packets may not arrive in the proper order (you may receive Packet C before Packets A or B).
• Reliable, In-Order - On the surface, this sounds like the best choice! These packets always arrive in the right order, and they always arrive! That is, you will always get Packets A, B, C, and D, in that order. There's a hidden downside, though: If the game recieves Packet C, but has not yet received Packets A or B, it has to hold onto that packet until both A and B arrive, which, if they need to be resent, can really ratchet up the latency. Any one packet that needs to be resent will hold up the whole line until it arrives. Clearly, this type of packet should only be used when absolutely necessary - for normal gameplay, it's better to use In-Order or Reliable on their own.

Eventually, I decided to send packets in the Reliable way, but not In-Order. But, to minimize the amount that the game has to wait for resent packets to arrive, each packet contains eight frames worth of input/collision data. That way, as long as one out of every string of 8 packets arrives, the server will have all of the relevant information to sync up to that point. And if, for some reason, 8 packets in a row are all lost in transmission, they'll be resent and make it eventually anyhow.

To handle this, the game essentially has a list of frames that it's received data for (8 of which come in with each packet).
• For each frame that a packet contains, if the frame already been simulated by the server (a frame from the past), that frame is ignored.
• Similarly, if the frame is already in the list, it's ignored.
• If it's not a past frame and it isn't already in the list, add it to the list (in order - the list is sorted from earliest to latest).
• After this is done, if the next frame that the server needs to simulate is in the list, remove it from the list and go! Otherwise, wait until it is.

The game doesn't care which order the frames are received in - as long as it has the next one in the list, it'll be able to continue on. Because of the redundancy, it rarely has to wait on a resend due to packet loss. In fact, using XNA's built-in packet loss simulation (thank you, XNA team!), the packet loss has to be increased to over 90% before the latency of the simulation starts to increase (hypothetically, the magic number is above 87.5% packet loss - greater than 7/8 packets lost).

The disadvantage of this system is that it does add to the bandwidth use, as each packet now contains an average of eight times as much data as it would normally, which brings us to...

#### Network Gaming's Most Wanted #3 - Bandwidth

Ah, bandwidth. There's no point in having a low-latency connection between two systems if the game requires too much bandwidth for the connection to keep up. Because the Xbox Live bandwidth requirement is 8KB/s (that's kilobytes), that became my goal as well.

This is where I overengineered my system a bit. I was estimating, as a worst case, an average of 10 collisions per frame. With packet header overhead, voice headset data, and everything, with 8 frames worth of data in each packet, I expected to be just BARELY below the 8KB/s limit.

When I finally got the system up and running, it turned out the game was using less than 4KB/s. The average amount of collisions per frame is closer to TWO than it is to 10 (unfunny side note: I had the right number, but the wrong numerical base. My answer was perfect in binary), even with a lot of stuff going on (though an individual frame may have many more, there are usually large spaces between collisions as waves of bullets smack into enemies). The most I've been able to get it up to with this system is about 5KB/s, which means the game still has a delightful 3KB/s of breathing room. I think I'll keep it that way!

#### Final Remarks

Hopefully this has been an informative foray into the design of a network protocol for an arcade-style shoot-em-up game. I'm no network professional - in fact, this is the first network design I've ever done, so I'm sure people who do this stuff for a living are laughing at my pathetic framework. If anyone has any suggestions as to how I might improve my network model, I'm all ears - while it works pretty well, I'm always open to ideas!

## Networking Is Hard (Part 2)

As usual, mirrored from drilian.com

In my previous post, I weighed the advantages and disadvantages of my game vs. a standard FPS with regard to networking. After doing so, I came up with an initial network design.

The biggest issue, personally, was dealing with the causality of the network game. Each player gets information about the other with a delay, so neither player is ever seeing exactly the same set of circumstances (perfectionism and network lag don't mix very well). I think Shawn Hargreaves describes it best in one of his networking presentations: he says to treat each player's machine as a parallel world. They're not exact, but the idea is to try to make them look as close as possible. It doesn't matter if things don't happen exactly the same, but you don't ever want the following conversation to occur:

Player 1: "Wow, can you believe I killed that giant enemy at the last second? That was amazing!"

Player 2: "...What giant enemy?"

Procyon's Initial Design

The design I started with was a hybrid of both the peer-to-peer lockstep and client-server models.

Basically, within each system there is a client/server pair. In addition, each server runs in lock-step with the other, so that they both always tick with the exact same player inputs for a given frame, keeping the two machines' servers perfectly in sync. Since the only data being sent across the network is player inputs, bandwidth use was ridiculously low. And since nothing ever has to round-trip from the client to the server and back, the system still gets the effectively-halved-ping that a pure lockstep setup gets. Also, because the client and the server are running on the same machine, communications between the two (mostly, the server notifying the client of events that happened based on the other player's input) becomes very trivial - the server can just call callbacks to the client, and no sort of RPC mechanism is necessary.

Ignoring the clients for a moment, the servers are a very traditional lockstep system. Each normal tick, the player input is polled and then sent across the network. Once the server receives the input from the remote system for the next frame, it ticks forward. As you can imagine, each machine's server is a bit lagged, because it has to wait for inputs across the network. That's where the client comes in.

Each system also runs a client, which runs ahead of the server. This client always ticks every (in Procyon's case) 60th of a second, processing the local player's inputs right away, and running prediction on the remote player based on the last set of inputs and the last known position that the server knows about. That way, the local player's ship always reacts right away to player inputs (on the client, which is represented on-screen).

Essentially:
1. The current local player's input is polled and sent to the server.
2. The input is also sent immediately across the network to the other machine.
3. The client ticks using this input - the position and actions of the remote player are predicted based on the last-known input and position from the server.
4. If the server has the remote input for the next frame that it has to do (which is always an eariler frame than the client), it also ticks (no prediction necessary on the server, as it has up-to-date information about both players for the given frame it's simulating). Again, there is a server on both machines, but they'll both end up with the exact same simulation.

#### (Mostly-) Deterministic Enemies

Here's where one of the advantages of the game comes into play. In my last post, I mentioned that enemies are deterministic in their movements. This is actually a key part of the networking. What it means is, there is no prediction required for simulating enemy behaviors ahead of the server. For instance, say that the client is currently 5 frames ahead of the server. While the client is ticking frame 450, the server is only on 445 (because it hasn't received network input for frame 446 from the other machine yet). Even though the server hasn't simulated enemy movements yet, their movements are very strictly defined, so the client doesn't have to guess - it knows exactly where an enemy is on frame 450. Consequently, the client running ahead of the server is not a problem with regards to enemy positions - when you see an enemy in a specific place on the screen, you know that's exactly where it will be on the server when the server finally simulates that same frame.

Unless the other player kills it first.

That's right - there is exactly one case in which an enemy's behavior is non-deterministic, and it's most-easily described as follows: your local client is currently simulating frame 450, the server has simulated frame 445. You see a large cannon ship start to charge up its beam weapon for a massive attack. However, the server gets remote player input from frame 446 (the next frame it has to simulate), and when simulating it, realizes that the other player dealt the killing shot to the ship. Suddenly, the client's view of that enemy is wrong - it should have died four frames prior.

In essence, the only way that a client's view of an enemy is ever wrong is if the other player has killed it in the past, and the server hasn't caught up. This is a very important property: the only time, ever, that an enemy is not where you see it as is when it's not anywhere at all (because it already died). However, this is where it starts to get tricky.

Most of the time, it's not a big issue. The client kills the enemy as soon as the server tells the client that the enemy should be dead, and it just dies a few frames too late. But when the enemy has fired bullets (or any other type of weapon), suddenly there's a problem. What has to happen in that case is that the client has to remove the bullets that shouldn't really have been spawned. With reasonable pings, the bullets will generally be close enough to the enemy's death explosion that they won't really be noticeable, they'll blink up in the middle of the explosion and disappear by the time it's done. With larger lag times, though, bullets for dying enemies might spontaneously disappear. Ah, lag. How networked games love you so.

The real problem, though, is when, using the frame numbers above, you crash your ship into an enemy on tick 450 that actually died on tick 446. What then? You crashed into something that shouldn't even have been there! After much internal debate, I decided there's a really simple way to arbitrate this: on your screen, you crashed, so you still get the consequences. Even though you hit something (either a should-have-been-dead enemy or a bullet spawned from such an enemy) that shouldn't have even been there, you still ran into something on your screen, and still pay the price. So, in addition to player inputs that get sent to the server (and across the network), the client also sends a flag that says "I died!"

As an added bonus, the server no longer needs to calculate collisions against either player - when a player dies, the client signals it (This plays into another advantage of my game - in a competetive multiplayer game, you would never, ever trust a client machine to tell you whether or not its player got hit by something).

#### Random Number Generation

One issue has to do with random number generation. Obviously, when connecting the two machines across the network, the machines have to agree on a random number seed so that random numbers generated on each are the same. In fact, not only do the machines on either side of the network have to have the same random number seeds, but the client and server also have to start with the same seeds, or they'll get out of sync (a machine getting out of sync with itself is always funny).

However, what happens when an enemy (Smallship 5) shoots a bullet in a random direction, and then dies "in the past" based on a server correction? On the server(s), this random number generation never happened. On the client, however, a bullet was shot, and a random number was created. With a single random number generator, that means all of the random numbers from that point on on the client are going to be incorrect. Thankfully, there's an easy way around that.

Give every enemy its own random number generator!

I found a very lightweight (and very high-quality) random number generator: Marsaglia's mutliply-with-carry generator (or, if you prefer, the infinitely more difficult-to-read Wikipedia version). This generator only requires two unsigned integers for each generator to do its magic, so there wasn't much overhead to hand one of these off to each enemy. So, the initial random number seed is decided upon before the level begins to load. As the level begins to load, and enemies are created, a new generator is created for each entity, using random seeds generated from the original generator. This way, each enemy gets its own generator, and if an enemy ever generates any extra random numbers before it dies, it doesn't affect any of the other enemies at all! Problem solved.

#### Remote Player Bullets

Again, pretend the client is ticking frame 450. When the server reaches 445, it notices that the remote player fired a bullet. So it notifies the client: "Hey, 5 frames ago, the other player fired a shot!."
• The client knows that it needs to spawn a new remote player bullet.
• It also knows exactly where that bullet will be at the current tick (since it was spawned at tick 445, but it's now tick 450, it can tick the bullet forward 5 frames).
• It does not, however, know that the bullet will actually survive until tick 450. It's possible that anywhere from tick 446 until 449, that it might hit something.

Even though it knows exactly where the bullet should be, it doesn't display it there. It actually starts the bullet in front of its current estimate of where the remote player is (the predicted position), and interpolates it into the correct spot. That way, remote player's bullets don't seem to appear way out in front of the remote player, they still start right where they seem like they should.

The client knows exactly where the bullet should be, but not that it's actually there (because it might have died on, say, tick 448). So while it does collision detection against enemies, if the remote bullet hits an enemy, it doesn't actually do any damage on the client - it just removes the bullet. If it turns out it dealt damage after all, the server will eventually send a correction to the client.

However, eventually, the server reaches frame 450 (when the client first learned about the bullet). If the bullet is still alive on the server, then we know that it never hit anything from frame 445 until then, so it was a live bullet when the client found out about it. Also, if it's still alive on the client (it didn't collide with anything while the server caught up to frame 450), that means the client knows that the bullet is still alive.

Now that it knows that it has a bullet that is guaranteed to still be alive, and is at the exact position that it's supposed to be, the bullet can flip from being treated as a remote bullet to being treated exactly like a locally-fired bullet. Basically, this bullet now will deal damage and act exactly like a bullet fired by this machine's own player! It's one less thing that will act different on the other machines, further strengthening the illusion that both players are seeing the same thing.

#### Floating Points Are Sharper Than Expected

However, there was one big issue with this entire design: floating-point errors. Due to minute differences in the way floating point numbers are calculated on different systems (due to different CPUs, code optimizations, quantum entanglement, etc), the collision code on each system acted slightly differently. Consequently, the two servers running in lockstep weren't actually in sync. This was causing all sorts of issues - slight timing differences on enemy deaths caused discrepencies in scores and energy amounts...and the real kicker is that it was possible that two objects would collide on one system and never collide at all on the other simulation (a grazing collision on one might not trigger as a collision on the other system). Added bonus when an enemy died on one server just after shooting a ton of bullets, but just before shooting them on the other server. Suddenly, one player's screen would be full of enemy bullets, and the other's would be clear as the summer sky.

This became a real problem, and it took a while to solve it (again, perfectionism and networking don't mix)...and next time, I'll present the solution.

## Networking Is Hard (Part 1)

As usual, this entry is mirrored from drilian.com.

I haven't had much development time in the last (nearly two) month(s), but I have had time to nearly finish up the networking code for the game, so I thought I would describe some of the work that went into the design of the system.

#### Resources

I tried to find some references on how most shmups (shoot-em-ups) such as my current game handle their networking, but I didn't really find anything. There are a lot of resources on how RTS games do it (such as this great article about Age of Empires' networking), and even more articles about how to write networking for an FPS game (the best of which, in my opinion, are the articles about the Source engine's networking and the Unreal engine's networking). There's also a great series of articles on the Gaffer on Games site. And no list of references would be complete without the Gamedev.net Multiplayer and Networking forum.

One of the most seemingly-relevant things to my game that I read was that, in current system emulators (for instance, SNES emulators) that have network play, they work in lock step with each other...that is, each system sends the other system(s) its inputs, and when all inputs from all systems have arrived, they can tick the simulation forward. I did an early experiment to see how well this would work with my game, and it turns out, not so well - the latency introduced with even a moderate ping (100ms) is rather prohibitive. The player would press up on the gamepad and have to wait for the ship to start moving...and it would only get worse with higher latencies! Obviously, this wouldn't work.

Since there were many articles about the FPS model of networking, I opted to use it as a starting point, as my game is also a fast-action game. First off, I decided to make a (partial) list of the advantages and disadvantages of my game type vs. the FPS model:

#### Advantages of Procyon vs. an FPS

• Only two players are supported, which greatly simplifies the bandwidth restrictions and the communication setup.
• There's no need to support joining/quitting in the middle of a game (again, simplifying game communications and state management)
• Since it's cooperative, I'm not really worried about players cheating, so I can allow clients to be authoritative about their actions moreso than the average FPS can.
• Players can't collide with each other, so there's no need to worry about such things (in fact, a player basically can't affect the other player at all - a player can only affect enemies)
• Enemies are deterministic in their movements - none of my enemy designs have movement variation based on the player's actions, so collision vs. enemies in a given frame is always accurate (unless the enemy was dead "in the past" due to the actions of the other player, which is discussed later).

• In an FPS, the view is more limited - it's rare that a player sees all of the action at once due to the limited field of view. However, in Procyon, everything that is happening in the game is completely visible on both screens, meaning the game generally has to be less lenient about discrepancies.
• Most FPS games are "instant hit." For instance, when you fire a pistol in a FPS game, there's no visible bullet that streaks across the screen, so it's easier to fudge the results on the server. In Procyon, however, all of the weapons are on-screen and visible, so fudging them in a minimally-noticeable way becomes very difficult.
• In many FPS games (though not all), there are WAY less entities active at a time than there are in a shoot-em-up.
• This falls into the realm of "not a lot of references for this type of game," but many of the "cheats" employed by FPS game programmers are really well-known, but very domain-specific; with a shmup, I'd have to make up my own network fakery to hide latency.

#### Network Strategies

There are two main network strategies used in games (at least, in those that I researched): lockstep and client-server.

Lockstep keeps all systems involved perfectly in sync, though at the cost of making it more susceptible to latency, because the game doesn't step until the other player's inputs have traversed the network. Generally, every so often (say, every 1/30th of a second), the current input is polled and then sent across the network. Then, once both/all inputs have reached the local machine, it ticks the game. This is the type of networking used by emulators (out of necessity - there's no reliable way to do any sort of prediction with an emulator anyway) and some RTS games (such as Age of Empires, linked above). RTS games can hide the latency a bit by causing all commands to execute with a bit of a delay. One interesting property of lockstep networking is that nothing ever has to round-trip across the network. That is, nothing that server B ever sends back relies on something that Server A sent previously. Because of this, the game's latency is effectively halved, as the meaningful measurement is one-way latency, not a full there-and-back ping. Lockstep games are frequently done peer-to-peer: each system sends its inputs to each other system...there's no central entity in charge of all communications.

Conversely, a client-server architecture (which is used by an overwhelming majority of FPS games) puts one system in charge and generally gets authority over the activities of all players. In this case, the client sends its information to the server (inputs, positions, etc), and the server sends back any corrections that need to be made (locations/motion of the other players, shots fired, things hit/killed, etc). The client can run ahead of the server based on a prediction model, where it gives its best guess as to where things are currently based on the last communication from the server. This makes game lag much less apparent (At least, in the player's own movements), as the player's inputs can be processed immediately on the local machine.

In the next installment, I'll talk through Procyon's initial network design, which is a bit of a hybrid between the client-server and lockstep architectures...stay tuned!

## Yet Another Super-Quick Update

(Mirrored from drilian.com)

Level 1 is effectively complete (excluding some global gameplay and balance issues.)

Here's a video:

Yay!

And some MP3s of the music from the first level:
To the Skies!
Boss Battle

## Another Procyon Update

Mirrored from drilian.com, as per usual.

This will probably be the norm for a while, unless I find myself with an excess of time some night; there'll probably just be some project updates for a few weeks.

Progress on finishing the first level is going good, all of the main enemies are implemented, the level 1 boss is designed (but not yet coded), and I'm on my way towards getting the level done by the end of the month.

Enemies

For the enemy designs, I looked to things like insects and aquatic creatures to get ideas for shapes and the like. Some screenshots!

(Click to enlarge)

And, a couple videos of the enemies in action:
">Enemy video 1
">Enemy video 2
Current Work
The next few items of work are:
• Finish the implementation of the boss of level 1
• Lay out the level
• Profit!

## Procyon Project Update

As usual, mirrored from drilian.com, complete with embedded youtube!

I haven't made any specific posts about my game in a while, so I thought I'd just make a quick status update.

#### Code Progress

Progress is coming along quite nicely. Most of the game systems are done (though I have some modifications to make to enable things like curved enemy beam weapons and side-by-side ship paths).
• Enemies have multiple ways of spawning, and can follow paths (or not).
• All four weapons are implemented
• The player can now die (and respawn). Lives are tracked (but the game currently doesn't game-over if the player dies, as that would be a pain for testing).
• The scoring system is in (with a first run at a score multiplier system).
• The on-screen display has been completely redesigned.
• Two players can play at once, which includes a special combination lightning attack that is even more devastating than the standard lightning attack.

Here is a video of a lot of those systems in action:

#### Art Work

For the last few evenings, I've been working on designs for the enemy ships (though I'll note that the background is still very much throwaway temp art). I plan to have about 5 or 6 types of ships in the first level of the game (not counting the boss) - and the goal is to have at least 2 new types in each successive level of the game.
Here are a few of the current designs:

(click to enlarge)

The goal is to make the ships look fairly organic - I'm completely avoiding hard edges (except for claw-like points), and trying to make them as smooth and rounded as possible. Thanks to Silo and its subdivision surfaces, this is actually pretty easy :) The gray/green/purple combination feels, to me, rather otherworldly, and it's less "standard" than the traditional black and red alien ship motif (i.e. Cylon ships). The color scheme will pose some interesting problems when it comes time to make levels inside of alien locations though - I'll need a way to make the alien ships stand out from the background, while simultaneously make the background feel like it belongs to the same creatures that are piloting those ships.

#### Goals

I'm aiming to have two full levels of the game done by the end of next month, which should give me adequate time to get everything that needs to be implemented for them implemented, and polish them to a nice shine. There's still quite a bit to do:

• As stated above, I want to allow curved beam weapons from the enemies (the player's curved beam weapon is currently a special case)
• Also stated above, I want to be able to have enemies spawn side-by-side along paths (so that I can easily make them fly in in formations instead of just one at a time)
• There's no sound/music code yet, though with XNA that's pretty straightforward.
• Controller rumble!
• I'm sure there's some code that I'll need to add to enable giant boss fights.
• Not to mention the additional art/sound/music work to make it all shine.

It's going to be an interesting month and a half. Off I go, I've got work to do!

## .NET Reflection and State Machines

Mirrored from drilian.com, as usual.

The .NET framework's reflection setup can be amazingly useful. At a basic level, it allows you to get just about any information you could want about any type, method, assembly, etc that you could want. In addition, you can programmatically access type constructors and methods, and invoke them directly, allowing you to do all sorts of neat stuff.

One useful application of this is in the creation of state machines. Imagine an entity in a game that flies around in a pattern for a bit, then stops to shoot some bullets, then returns to flying. Such an entity would have two states, "Flying" and "Shooting."

#### A First Attempt

You might decide to create a state machine that could handle this entity, but also be usable for all entities that need a state structure. What should a state machine like that need to be able to do?

One such set of possibilities is as follows:
• Be able to assign a state based on a string state name.
• Each state has three functions:
• Begin - This function is called whenever the state is transitioned into, allowing it to initialize anything that needs it.
• Tick - This is the meat of a state, it is called once per game tick while the state is active.
• End - This is called when the state is transitioned out of, allowing any clean up. Note that this is called before the next state's Begin function is called.
• However, a state shouldn't have to specify functions that it doesn't need. If a state has no need for the Begin function, it doesn't need to implement it.
• If the state changes during a Tick call, the new state will tick as soon as it happens. This allows many states to fall through in rapid succession. For instance, a player character in a 2D Mario-like puzzle might be in the middle of the "Jumping" state, while holding right on the controller. When the character hits the ground, it would transition into the "Standing" state, but from there, it would then transition into "Walking" because the player is holding right. That way, the character would land and immediately start walking, which is what one would expect when playing the game.

So what would a first draft version of such a machine need to look like? First, we'd need a place to store delegates to the three functions needed for a state:
public delegate void StateDelegate();class StateInfo{  public StateDelegate OnBegin { get; set; }  public StateDelegate OnTick { get; set; }  public StateDelegate OnEnd { get; set; }  public void Begin()  {    if (OnBegin != null)      OnBegin();  }  public void Tick()  {    if (OnTick != null)      OnTick();  }  public void End()  {    if (OnEnd != null)      OnEnd();  }  public StateInfo()  {    OnBegin = null;    OnTick = null;    OnEnd = null;  }}

Given that, we can now store them in a dictionary, indexed by by the state's name (a string). To initialize it, we would just call AddState with the state name and three delegates (which may be null, to specify that a given function is unnecessary), and it would add that state to the list. We also need to be able to Tick the state machine and change the state. The whole contraption would look something like the following:

public class StateMachine{  Dictionary states = new Dictionary();  bool ticking = false;  StateInfo currentStateInfo = null;  string currentStateName = null;  public void AddState(string stateName, StateDelegate begin, StateDelegate tick, StateDelegate end)  {    StateInfo info = new StateInfo();    StateInfo.OnBegin = begin;    StateInfo.OnTick = tick;    StateInfo.OnEnd = end;    states.Add(stateName, info);    // If no states have been set yet, this state becomes the initial state.    if(currentStateName == null)      State = stateName;  }  public string State  {    get { return currentStateName; }    set    {      // End the previous state.      if (currentStateName != null)        currentStateInfo.End();      // Set the new state      currentStateName = value;      currentStateInfo = states[currentStateName];      // Initialize it.      currentStateInfo.Begin();      // If we're in the middle of the Tick function, go ahead and tick the new      // state now, as well.      if (ticking)        currentStateInfo.Tick();    }  }  public void Tick()  {    // Set ticking so that we know that we're in the middle of ticking this machine.    // That way, if the state changes in the middle of this, we know to run the next    // state's Tick as well.    ticking = true;    currentStateInfo.Tick();    ticking = false;  }}

As you can see, it's fairly straightforward. An entity can call AddState (probably during its constructor) on the state machine to add a new state. It can assign a new state by using State = "newState", which will end the current state and begin the new state. Finally, the entity just needs to call Tick to run the current state.

In practice, this might look something like:
public class FlyShooter : EntityBase{  /* Entity vars Go here */  StateMachine machine = new StateMachine();  public FlyShooter()  {    /* Initialize stuff */    // Flying doesn't need an end    machine.AddState("Flying", FlyingBegin, FlyingTick, null);     // Shooting doesn't need Begin or End.    machine.AddState("Shooting", null, ShootingTick, null);  }  void FlyingBegin()  {    /* Begin flying state */  }  void FlyingTick()  {    if( IsTimeToShoot )    {      machine.State = "Shooting";      return;    }    /* Tick flying */  }  void ShootingTick()  {    if( DoneShooting )    {      machine.State = "Flying";      return;    }    /* Tick shooting */  }  public override void Tick()  {    machine.Tick();    /* Also tick anything that happens in all states */  }}

There's not much to it.

One thing to note is that the way that the state fall-through works (that is, changing state in the middle of another state's Tick function) will still cause the rest of the current function to execute (after it's already partially run another state), so it's best to either change states at the very end of the function or to return after setting the new state, as is done above).

Though this works pretty well as-is, say you want to add new states to an entity afterwards. What if the enemy now needs a special "Dying" state. Not only do you have to add new functions (DyingBegin, DyingTick, and DyingEnd), you also have to remember to call AddState at object startup. Wouldn't it be nice if you could just add the new functions and it would just work? As it turns out, using .NET's reflection functionality, you can do exactly that!

#### Mirror, Mirror On the Wall, What Are the States That I Can Call?

There are a number of ways to implement a state machine system using reflection. The one that was settled upon for Procyon was the design with the least redundancy (that is, "don't repeat yourself"). State names only appear once in a given entity's codebase (excepting when states are set), and it takes a small number of characters, right where the state code is defined, to mark a state. At the point that it's added to the code, it's already ready to use, there's no additional list of states to update.

Most reflection code works by looking for .NET Attributes. Attributes are class objects that can be added as metadata to a class, a method, a property, or any number of other pieces of your code. Also, you can create your own attributes - program-specific metadata that you can later find.

In the case of the state machine, a very simple attribute is needed.
using System.Reflection;[AttributeUsage(AttributeTargets.Method)]public class State : Attribute{}

It doesn't need any sort of custom data, so there's not much to it. The AttributeUsage at the top (which is, itself, an attribute) specifies that the "State" attribute being defined can only be applied to methods. Attempting to add it to anything else (say, a class) would generate a compiler error.

With this attribute, you'll be able to mark methods as states. In this case, each method will take a StateInfo as a parameter (meaning that the StateInfo class, which was described above, has to be a public class now). The method will then fill in the state information with delegates. The most elegant way to handle this is using anonymous delegates. Here's that enemy class again, using the new method:
public class FlyShooter : EntityBase{  /* Entity vars Go here */  StateMachine machine;  public FlyShooter()  {    /* Initialize stuff */    // Create the new state machine with "Flying" as the initial state.    machine = new StateMachine(this, "Flying");    // Note - no states are being explicitly initialized here.  }  [State]  void Flying(StateInfo info)  {    info.OnBegin = delegate()    {      /* Begin flying state */    };    info.OnTick = delegate()    {      if (IsTimeToShoot)      {        machine.State = "Shooting";        return;      }      /* Tick flying */    };    // Don't need to set info.End  }  [State]  void Shooting(StateInfo info)  {    info.OnTick = delegate()    {      if (DoneShooting)      {        machine.State = "Flying";        return;      }      /* Tick shooting */    };  }  public override void Tick()  {    machine.Tick();    /* Also tick anything that happens in all states */  }}

As you can see, the [State] attribute is there. Each of the (up to) three functions for a state are initialized inside of the state function, cleanly grouping the three functions together into one unit. The state function's name is the state's name, so it's only specified once. And, of course, there's no manual list manipulation; to add a new state, simply add the code for the new state and make sure that it's marked by a [State] attribute...the runtime will take care of the rest.

To get this whole thing to work, though, there'll obviously have to be some code to find methods using it. When creating your state machine, you'll now want to pass in a pointer to the object that the state machine belongs to. From that, it will be able to get the type of object for which you're creating the state machine. Automatically adding states that are marked with your state attribute goes something like this:
• Scan through each method in the type of the object you passed in (if you want to be able to grab private methods from parent classes of the current object's type, you'll need to scan through the classes in the heirarchy one by one)
• If it doesn't have the State attribute, continue on to the next method.
• If it does have the attribute, get the name of the method (this becomes the state's name).
• Once you have the name of the method, call the method and let it populate a StateInfo that you hand to it.
• Once that's done, add the method into the state dictionary, using the state method's name as the key (and thus, the state name).

The AddState function can go away - it's no longer necessary, as the state list is now update completely automatically. The StateMachine class' constructor needs to be updated a bit, to add all of this reflection magic into it. It will now look something like this:
/// /// Create a new state machine using the given object - it will automatically/// search for State attribute-marked methods and use those to generate its/// state table at creation time./// /// The object that the state machine belongs to/// The name of the initial state of the machine.public StateMachine(object param, string initialState){  Type paramType = param.GetType();  // Loop through parent types to make sure to pick up any private states.  for (Type currentType = paramType; currentType != typeof(object); currentType = currentType.BaseType)  {    // Now loop through each method in this type (and this type only, no parent methods).    foreach (var method in currentType.GetMethods(BindingFlags.Instance | BindingFlags.DeclaredOnly |                                                  BindingFlags.NonPublic | BindingFlags.Public))    {      // See if this method is a State.  If it's not, go on to the next method.      object[] attributes = method.GetCustomAttributes(typeof(State), true);      if (attributes.Length == 0)        continue;      // Get the method name, ensure it's not already in the list.      string methodName = method.Name;      if (states.ContainsKey(methodName))        continue;      // Call into this method to populate the state info...      StateInfo info = new StateInfo();      method.Invoke(param, new object[] { info });      // ...and add it into the state list.      states.Add(methodName, info);    }  }  if (initialState != null)    State = initialState;}

None of the rest of the state machine class' code needs to change; it all works exactly as it did before. The new constructor means that the list gets created at state machine creation time, but the rest of the internal data remains exactly the same.

#### Possible Improvements

It works pretty well as-is, but there are ways it could be improved. For instance:
• You could add a type of "Sleep" method to the state machine, to make it delay a certain number of ticks before continuing to tick the current state
• You could add a list of transitions to the StateInfo, so that all transitions to different states can be declared up-front, in an easy-to-read manner. That removes that extra state-setting clutter from your tick routine (and even means that a state may not need a Tick routine at all!)
• If you need multiple sets of states for objects (say the player needs a collection of states for movement and a separate collection of states for weapons fire), you could modify the State attribute to provide a collection name (like [State("Movement")]), and specify a state collection when creating your state machine.

This is a small list of possible changes. If you have other ideas, let me know!

## Lightning Bolts

As usual, mirrored from my homepage, drilian.com. It's cooler there, with embedded youtube videos and stuff!

You're flying your ship down a cavern, dodging and weaving through enemy fire. It's becoming rapidly apparent, however, that you're outmatched. So, desperate to survive, you flip The Switch. Yes, that switch. The one that you reserve for those...special occasions. Your ship charges up and releases bolt after deadly bolt of lightning into your opponents, devastating the entire enemy fleet.

At least, that's the plan.

But how do you, the game developer, RENDER such an effect?

Lightning Is Fractally Right

As it turns out, generating lightning between two endpoints can be a deceptively simple thing to generate. It can be generated as an L-System (with some randomization per generation). Some simple pseudo-code follows: (note that this code, and really everything in this article, is geared towards generating 2D bolts; in general, that's all you should need...in 3D, simply generate a bolt such that it's offset relative to the camera's view plane. Or you can do the offsets in the full three dimensions, it's your choice)
segmentList.Add(new Segment(startPoint, endPoint));offsetAmount = maximumOffset; // the maximum amount to offset a lightning vertex.for each generation (some number of generations)  for each segment that was in segmentList when this generation started    segmentList.Remove(segment); // This segment is no longer necessary.    midPoint = Average(startpoint, endPoint);    // Offset the midpoint by a random amount along the normal.    midPoint += Perpendicular(Normalize(endPoint-startPoint))*RandomFloat(-offsetAmount,offsetAmount);    // Create two new segments that span from the start point to the end point,    // but with the new (randomly-offset) midpoint.    segmentList.Add(new Segment(startPoint, midPoint));    segmentList.Add(new Segment(midPoint, endPoint));  end for  offsetAmount /= 2; // Each subsequent generation offsets at max half as much as the generation before.end for

Essentially, on each generation, subdivide each line segment into two, and offset the new point a little bit. Each generation has half of the offset that the previous had.

So, for 5 generations, you would get:

That's not bad. Already, it looks at least kinda like lightning. It has about the right shape. However, lightning frequently has branches: offshoots that go off in other directions.

To do this, occasionally when you split a bolt segment, instead of just adding two segments (one for each side of the split), you actually add three. The third segment just continues in roughly the first segment's direction (with some randomization thrown in)
direction = midPoint - startPoint;splitEnd = Rotate(direction, randomSmallAngle)*lengthScale + midPoint; // lengthScale is, for best results, < 1.  0.7 is a good value.segmentList.Add(new Segment(midPoint, splitEnd));

Then, in subsequent generations, this, too, will get divided. It's also a good idea to make these splits dimmer. Only the main lightning bolt should look fully-bright, as it's the only one that actually connects to the target.

Using the same divisions as above (and using every other division), it looks like this:

Now that looks a little more like lightning! Well..at least the shape of it. But what about the rest?

Initially, the system designed for Procyon used rounded beams. Each segment of the lightning bolt was rendered using three quads, each with a glow texture applied (to make it look like a rounded-off line). The rounded edges overlapped, creating joints. This looked pretty good:

..but as you can see, it tended to get quite bright. It only got brighter, too, as the bolt got smaller (and the overlaps got closer together). Trying to draw it dimmer presented additional problems: the overlaps became suddenly VERY noticeable, as little dots along the length of the bolt. Obviously, this just wouldn't do. If you have the luxury of rendering the lightning to an offscreen buffer, you can render the bolts using max blending (D3DBLENDOP_MAX) to the offscreen buffer, then just blend that onto the main scene to avoid this problem. If you don't have this luxury, you can create a vertex strip out of the lightning bolt by creating two vertices for each generated lighting point, and moving each of them along the 2D vertex normals (normals are perpendicular to the average of the directions two line segments that meet at the current vertex).

That is, you get something like this:

Animation

This is the fun part. How do you animate such a beast?

As with many things in computer graphics, it requires a lot of tweaking. What I found to be useful is as follows:

Each bolt is actually TWO bolts at a time. In this case every 1/3rd of a second, one of the bolts expires, but each bolt's cycle is 1/6th of a second off. That is, at 60 frames per second:
• Frame 0: Bolt1 generated at full brightness
• Frame 10: Bolt1 is now at half brightness, Bolt2 is generated at full brightness
• Frame 20: A new Bolt1 is generated at full, Bolt2 is now at half brightness
• Frame 30: A new Bolt2 is generated at full, Bolt1 is now at half brightness
• Frame 40: A new Bolt1 is generated at full, Bolt2 is now at half brightness
• Etc...

Basically, they alternate. Of course, just having static bolts fading out doesn't work very well, so every frame it can be useful to jitter each point just a tiny bit (it looks fairly cool to jitter the split endpoints even more than that, it makes the whole thing look more dynamic). This gives:

And, of course, you can move the endpoints around...say, if you happen to have your lightning targetting some moving enemies:

So that's it! Lightning isn't terribly difficult to render, and it can look super-cool when it's all complete.

## Collision Detection Performance (Volume 2)

(As is now the usual, this entry is mirrored from drilian.com)

Yikes, I'm getting backlogged on the stuff I want to write about!

Anyway, this will probably be way shorter than it deserves, but my memory on the subject is about 3 months old. Basically, this will be more or less a short chronicle of the dumb story of the mesh vs. mesh tests.

Appearances Can Be And Are Frequently Deceiving

When I started work on the collision detection observation, there was one surprising fact: the mesh vs. mesh code (used to determine whether the player was intersecting enemy ships) was working at full speed! I didn't seem to have to do any optimization at all on it to get it working.

I did, however, opt to go ahead and change the functions to not be recursive (as the current implementation was, of course, recursing into both meshes' sphere trees). When I finished that work, suddenly, the routine was much, much slower. Was, in this case, the recursion overhead better than what it took to handle the double tree recursion in a non-recursive way?

While I was working out the kinks in the non-recursive version, I had added some visual feedback to display unambiguously when the meshes were colliding. While experimenting, I reverted back to the old (recursive) method, and found out something surprising: the collision results weren't accurate! It was reporting collision in cases where the player was inside of the object's space, but nowhere near any edges!

Turns out, there was an erroneous fall-through case, and it was falling through to a "true" result, causing the entire collision test to short-circuit with an invalid collision result. Sure the original routine was fast, but it didn't work at ALL. Now that I'd fixed that and had an accurate version, it was super, super slow. Now it became an imperative to optimize.

Double Dribble

Through the course of optimizing the routine, I tried all sorts of crazy stuff. I eliminated the recursion, I tried AABB vs. Rotated AABB tests, I tried Sphere vs. AABB tests, etc. There was nothing that I could do. As I was stepping through the code, however, I started noticing something odd: there were duplicate triangles all over the place. I would see the same triangle tens of times in a given collision test.

As it turns out, the way I was dividing up the mesh in the tree creation routine was boneheaded. What I would do was as follows:
1. If the current set of triangles has less than a specified number of entries, stop splitting the mesh, we've reached a leaf node.
2. Figure out the bounding box of the current set of triangles
3. Find the axis on which it was best to split the set of triangles (where best was determined as the most balanced split, i.e. where the difference between triangle counts on the left and right of the split was lowest)
4. For each triangle
1. If the triangle has a vertex to the left of the split, add it to the left side.
2. If the triangle has a vertex to the right of the split, add it to the left side.
5. Once the mesh is divided into the two sets, recurse into each set to split it further as necessary

Now, a cursory analysis of this algorithm should tell you that step 4 is horribly broken. If a triangle straddles the split, it would get added into both sides, thus expanding the bounding boxes of each sub element to encompass all triangles that straddle the split, causing there to be tons of duplicates of the same triangles throughout the code.

The correct way to do it would be:
1. If the current set of triangles has less than a specified number of entries, stop splitting the mesh, we've reached a leaf node.
2. Figure out the bounding box of the current set of triangles
3. Find the axis on which it was best to split the set of triangles (where best was determined as the most balanced split, i.e. where the difference between triangle counts on the left and right of the split was lowest)
4. For each triangle
1. Calculate the centroid of the triangle (i.e. average the 3 vertices).
2. If the centroid is less than the split, add the triangle to the left, OTHERWISE add it to the right.
5. Once the mesh is divided into the two sets, recurse into each set to split it further as necessary

There, now a triangle only gets added to the side that it's MOSTLY on, and there are no duplicates. This change greatly boosted the speed of the collision, however there were cases where it would still cause a framerate dip. Curses and drat!

All the King's Horses...

Try as I might, I could not get the mesh vs. mesh intersection test to perform well. So what happened when you get into a situation like that? You do what every game developer does: you cheat!

What existed at the time was a wicked fast sphere vs. mesh collision test. So rather than treat the player ship as a mesh, I opted to treat it as a representative collection of spheres (There are 29 spheres total, and it's a pretty good approximation of the player ship - in fact, it's probably better than necessary).

Suddenly, player vs. enemy collision performance was great! Since this was the only place that I had needed a mesh vs. mesh, I could get rid of that slow beast.

~fin~

All of the collision code is complete, now, so next (give or take a post or two) I'll talk about something with a bit more of a visual element - Bolts of lightning!

## Collision Detection Performance (Volume 1)

(As is now the usual, this entry is mirrored from drilian.com)

I have been hard at work on my game (in my ridiculously limited spare time) for the last month and a half. One major hurdle that I've had to overcome was collision detection code. Specifically, my collision detection performed great on my PC, but when running it on the Xbox 360, everything would slow to a crawl (in certain situations).

The types of collision detection I have to deal with are varied, due to the weird way that I handle certain classes of obstacle (like walls):
• Player bullets vs. Enemy - Player bullets are, for simplicity, treated as spheres, so sphere/mesh testing works here.
• Enemy bullets vs. Player - Same as above.
• Player vs. Wall - Because the game's playing field is 2D, the walls in-game are treated as 2D polygons, so it boils down to a 2D mesh vs. polygon test.
• Player vs. Enemy - Mesh vs. Mesh here
• Beam vs. Enemy - The player has a bendy beam weapon. I divide the curved beam up into line segments, and do ray/mesh tests.

The worst performance offender was, surprisingly, the sphere vs. mesh test, which will be the subject of this article. Before optimizing, when I'd shoot a ton of bullets in a certain set of circumstances, the framerate would drop well into the single digits, because the bullet vs. mesh (sphere vs. mesh) collision couldn't keep up. Here are the things that I changed to get this test working much, much faster.

When using Value Types, Consider Passing As References

One thing that was slowing my code down was all of the value type copying that my code was doing. Take the following function:
public static bool SphereVsSphere(Vector3 centerA, float radiusA, Vector3 centerB, float radiusB){  float dist = radiusA+radiusB;  Vector3 diff = centerB-centerA;  return diff.LengthSquared() < dist*dist;}

Simple, yes? This function, however, falls prey to reference type copying. You see, "centerA" and "centerB" are both passed in by value, which means that a copy of the data is made. It's not an issue when done infrequently, but with the number of SphereVsSphere calls that were happening during a given frame, the copies really started to add up.

There's also a hidden set of copies: the line "Vector3 diff = centerB-centerA" also contains a copy, as it passes centerB and centerA into the Vector3 subtraction operator overload, and they get passed in by value. Also, a new Vector3 gets created inside of the operator then returned, which, I believe, also copies the data into diff.

To eliminate these issues, you should pass all of your non-basic value types (that is, anything that's not an int, bool, float, anything like that) by reference instead of by value. This eliminates all of the excess copies. It does come at a price, though: in my opinion, it does make the code considerably uglier.

Here's the updated routine:
public static bool SphereVsSphere(ref Vector3 centerA, float radiusA, ref Vector3 centerB, float radiusB){  Vector3 diff;  Vector3.Subtract(ref centerA, ref centerB, out diff);  float dist = radiusA+radiusB;  return diff.LengthSquared() < dist*dist;}

Instead of having a nice-looking overloaded subtraction, now there's a call to Vector3.Subtract. While it's not so bad in the case of a simple subtraction, when you have a more complicated equation, they pile up pretty quickly. However, given the speed boost just making this change can give you, it's totally worth it.

Use Hierarchical Collision Detection (But Use a Good Bounding Volume)

Heirarchical collision detection is a good thing.

For those of you that DON'T know, basically instead of testing your collider against every triangle in a mesh, you have a tree in which each node has a bounding volume, and the leaves contain the actual triangles. The idea is that, by doing a much simpler collider vs. bounding volume test, you can elminiate large amoungs of triangles before you ever have to test them.

In this case, I was using a sphere tree, where each node in the tree has a bounding sphere, and the leaves of the tree contain actual mesh triangles. I used spheres instead of AABBs (Axis-aligned bounding boxes) because transforming AABBs is expensive (and they become Oriented bounding boxes after the transform). Transforming a sphere is easy, however. None of my object transforms have scale data, so it's a simple matter of transforming the sphere's center.

However, the use of bounding spheres has a dark side. Unless all of your heirarchy levels is roughly sphere-shaped, a sphere is a terribly inefficient bounding volume. They're usually much larger than the geometry that they contain, so there are more recursions into lower levels of the tree (think of there as being more dead space around the geometry when using spheres than bounding boxes).

By also adding bounding boxes to the data, I could use them where I'm not having to transform them. For instance, because this is sphere vs. mesh, and the entire mesh is rigid, I can take the mesh's world 4x4 matrix, and transform the sphere by the INVERSE of it. This way, the sphere is in model space, and I can use the bounding volumes without having to do any transformations at lower levels.

But now I needed a sphere vs. AABB test. However, I didn't much care if it was exact or not, so instead I used a simple test where I expand the box by the radius of the sphere, then test whether the sphere is inside of the box or not. Near the corners (surely this is where the term "corner case" comes from), this can give false positives, but it will never say the sphere DOESN'T intersect the box when it should say it does. This is an acceptable trade-off.
public static bool SphereVsAABBApproximate(ref Vector3 sphereCenter, float sphereRadius, ref Vector3 boxCenter, ref Vector3 boxExtent){  Vector3 relativeSphereCenter;  Vector3.Subtract(ref sphereCenter, ref boxCenter, out relativeSphereCenter); // Get the sphere center relative to the box's center.  Vector3Helper.Abs(ref relativeSphereCenter); // Per-component absolute value.  return (relativeSphereCenter.X <= boxExtent.X+sphereRadius && relativeSphereCenter.Y <= boxExtent.Y+sphereRadius && relativeSphereCenter.Z <= boxExtent.Z+sphereRadius);}

Simple, but effective. Converting from using a sphere bounding volume to AABBs cut down the number of recursions (and triangle comparisons) being done dramatically, since the AABBs are a much tighter fit to the geometry.

Recursion Is Weird

One suggestion I got, also, was to eliminate recursion. The heirarchical nature of the algorithm meant that my test was recursive. Here was the test as originally written:
public static bool SphereVsAABBTree(ref Vector3 sphereCenter, float sphereRadius, CollisionTreeNode node){  if (!SphereVsAABBApproximate(ref sphereCenter, sphereRadius, ref node.BoxCenter, ref node.BoxExtent))    return false; // No collision with this node, return false  if (node.Left != null)  {    // This means that there are Left and Right children (either they're both null, or both set).    if(SphereVsAABBTree(ref sphereCenter, sphereRadius, node.Left))      return true;    return SphereVsAABBTree(ref sphereCenter, sphereRadius, node.Right); // A node with child nodes can't have triangles, so just return this result.  }  // This node has triangles, so test against them.  If any of them intersects the sphere, return success.  for(int i = 0; i < node.Indices.Length; i+=3)  {    if(SphereVsTriangle(ref sphereCenter, sphereRadius, ref node.Vertices[node.Indices[i+0]], ref node.Vertices[node.Indices[i+1]], ref node.Vertices[node.Indices[i+2]]))      return true;  }  return false;}

As you can see, it recurses into child nodes until it either gets a false test out of both of them, or it reaches triangles. But how do you eliminate recursion in a tree such as this? More specifically, how do you do it while using a constant (non-node-count dependent) amount of memory?

The trick is as follows: Assuming your nodes contain Parent pointers in addition to Left and Right pointers (where the Parent of the trunk of the tree is null), you can do it with no issue. You track the node that you're currently visiting ("cur"), and the node that you previously visited ("prev", initialized to null). When you reach a node, test as follows:
• if you came to it via its parent (that is, prev == cur.Parent), you've never visited it before.
• At this point, you should do your collision tests. It's a newly-visited node.
• prev = cur
• cur = cur.Left -- This is basically "recursing" into the Left node.
• If you arrived from its Left child, you visited it before and recursed into its left side
• Since this node has already been visited, do not do the collision tests. They've already been shown to be successful.
• prev = cur
• cur = cur.Right -- Since we recursed into Left last time we were here, recurse into Right this time.
• If you arrived via its Right child, you've visited both of its children, so you are done with this node.
• Again, this node has already been visited, so do not run the collision tests.
• prev = cur
• cur = cur.Parent -- We're done with this node, so go back to its parent.
• When doing the collision tests, if a Sphere vs. AABB bounding volume test ever fails, we don't have to "recurse" so go back to its parent.
• prev = cur
• cur = cur.Parent
• Finally, if you do a Sphere vs. Triangle collision test and it succeeds, we can immediately return, as we have a guaranteed collision, and no more triangles or nodes need to be tested.

Doing all of this makes the routine bigger, but no recursion is necessary, so there's no additional stack space generated per node visited (and no function call overhead, either). The finished code is as follows (note that I also added a quick sphere vs. sphere test right at the outset, because it's a very quick early out if the sphere is nowhere near the mesh):
public static bool SphereVsAABBTree(ref Vector3 sphereCenter, float sphereRadius, CollisionTreeNode node){  CollisionTreeNode prev = null, cur = node;  // At the top level, just do a sphere/sphere test for a super-quick out.  if (!SphereVsSphere(ref sphereCenter, sphereRadius, ref node.Sphere.Center, node.Sphere.Radius))    return false;  while(cur != null)  {    if(prev == cur.Parent) // Only do the tests if we JUST got here.    {      if (!SphereVsAABBApproximate(ref sphereCenter, sphereRadius, ref cur.BoxCenter, ref cur.BoxExtent))      {        // No intersection?  Go ahead and just back out of this node now.        prev = cur;        cur = cur.Parent;        continue; // By continuing, we bypass the rest of this code and re-visit the parent immediately.      }      for(int i = 0; i < cur.Indices.Length; i+=3)      {        if(SphereVsTriangle(ref sphereCenter, sphereRadius, ref cur.Vertices[cur.Indices[i+0]], ref cur.Vertices[cur.Indices[i+1]], ref cur.Vertices[cur.Indices[i+2]]))          return true;      }    }    // "Recurse"    if (cur.Left != null)    {      if (prev == cur.Parent) // If this is the first visit to the node, recurse left.      {        prev = cur;        cur = cur.Left;        continue;      }      if (prev == cur.Left) // If this is the second visit, recurse right.      {        prev = cur;        cur = cur.Right;        continue;      }    }    // If there are no child nodes or prev == cur.Right, return to the parent.    prev = cur;    cur = cur.Parent;  }  return false;}

Mission Complete

After making that set of changes, the sphere vs. mesh tests no longer bog down on the Xbox, even in a degenerate case such as when there are tens or hundreds of bullets well inside of the mesh's area.

Getting the sphere vs. mesh test working was a great accomplishment, but as much as I thought it was already working well, it turns out that mesh vs. mesh testing was a much bigger problem. However, that's another story for another day.

## Understanding Half-Pixel and Half-Texel Offsets

First, I'm joining the ranks of people whose dev journals are moving. In my case, I'm moving it to drilian.com. I'll mirror generally portions of them here (though this entry is mirrored in its entirety), so you can still track this journal on the Journals page to keep up to date. However, if you use an RSS feed, feel free to redirect it to the drilian.com rss feed.

I have a blurb about why I'm moving my journal at the new site, in case you care.

And now, onto the entry at hand (p.s. FINALLY A NEW ENTRY):

For those of you not using Direct3D 9 or XNA, you can safely ignore this post (OpenGL and Direct3D 10 are immune to this particular oddity). However, if you are, it's likely that you've had to deal with the dreaded half-texel offset. Today, after I don't know how many years of using Direct3D, I came to realize that I really didn't understand what the source of the issue was. Now that I've sort of gotten a handle on it, I figured I'd post it to my super new journal. Consider it a test run.

Coordinate Spaces

The first thing to note is the basic coordinate space. I'm going to be referring to texture space and clip space a lot, so I thought I'd just do a quick refresher here on what I mean (mostly to make sure you're thinking with the same terminology that I'm using).

• Clip space - the post-projection half-cube of space where X,Y in [-1..1] and Z in [0..1]

• X = -1 is the far left edge of the screen

• X = 1 is the far right edge

• Y = -1 is the bottom edge

• Y = 1 is the top

• Z = 0 is near (the near plane)

• Z = 1 is far (the far plane)

• Texture space - the area where u,v in [0..1] on a texture map making up a single end-to-end repeat of the texture.

• 0,0 represents the upper-left coordinate on the texture

• 1,1 is the lower-right.

Texture Sampling

Texture sampling is pretty straightforward.

In this diagram (and future diagrams), I'm using a 4x4 texture. The [0..1] range in both X and Y map exactly to the range of the texture. Note, here, that the center of the upper-left texel does not lie at 0,0; instead, 0,0 refers to the upper-left corner of that texel. This is an important thing: if you are using bilinear filtering and you want to sample color data from a single texel, you need to sample from the center of the texel instead of the corner. To do that, you'd sample from a half-texel in (that is, for this 4x4 texel, you'd want to sample from (0.125, 0.125), because a texel's width in texture space is .25 (each texel is 1/4 of texture space), and half of that is 0.125).

A side effect of this is that, if you have bilinear sampling and wrapping on, sampling along the left edge gives you exactly the same data as sampling along the right edge at the same y coordinate. That is, in both cases, you get a perfect blend of the texels on either end of the texture. This is true for the borders between texels in general: sampling where the "dot" is on the image above is the only way to get a pure sample of a texel with bilinear filtering. Anywhere else will be a blend between two. Edges (the lines) are where the two texels neighboring the edge are weighted evenly.

The upshot of this is that the official center of a texel is located where the center SHOULD be: half of a texel from the edge. However, as we'll see in a moment, the rule for sampling texels does not apply to pixels.

Pixel Coordinates Are Weird And Unintuitive

When referring to "pixel coordinates" vs. "texel coordinates," pixel coordinates are, for lack of a better way to explain it, the coordinates used when rendering to the screen (or a render target). The pixels are the destination coordinates.

This is where clip space comes in: When you render your geometry, it goes through the gauntlet of matrices, ending with the projection matrices, and ends up in clip space (with x and y both being in [-1..1] for the visible area of the screen).

This diagram is the way that clip space maps to pixels. note that the upper-left clip space coordinate (-1, +1) is actually RIGHT ON the pixel center. This is the root cause of the whole offset problem.

Basically, say you have a screen-sized texture. if you draw it to the screen using a full-screen quad (clip space from upper-left (-1,1) to lower-right (1, -1)), the upper-left pixel would sample the very upper-left corner of your quad, which would give you texture coordinates of (0,0). But remember: that's not the center of the texel! It's actually the upper-left corner of the texel. With bilinear filtering, instead of getting a pure sample of your texture, you have a perfect blend of all 4 surrounding texels.

In simpler terms: the half-pixel/half-texel offset exists because there's a discrepancy between how the centers of pixels and the centers of texels are computed.

To fix this, you could simply offset your texture coordinates by a half-texel before sampling (that is, when your shader gets its uv coordinate, you'd add a half texel ( (.125, .125) for our 4x4 example) before sampling, which would then give you a perfectly lined-up texture sample, and you'd get a great 1:1 mapping of pixels to texels.

However, there's another problem. What happens when you're using multisample anti-aliasing (MSAA)?

Weird Pixel Centers and MSAA

So, we now know that the center of the origin pixel is actually the upper-left coordinate of clip space. This causes problems when MSAA is turned on. With MSAA, the geometry is sampled multiple times per pixel, using a set of points that exist within the pixel's square (while a pixel is not really a square, but a point sample within a square, MSAA effectively treats a pixel as multiple samples in a square).

As an (important) aside: with MSAA on, textures are still only sampled once, from the pixel center. MSAA does not affect the UV coordinates you get for a given pixel, it merely affects geometry coverage.

Continuing with our example, here's a hypothetical, simple MSAA strategy wherein a pixel is sampled four times (4x MSAA) in an aligned grid:

This diagram illustrates the issue with the standard [-1..1] full screen quad, D3D's choice of pixel center, and MSAA. Even with a full-clip-space (and, thus, traditionally full-screen) quad, along the top and left edges, some samples no longer touch the quad, and thus the edges are not entirely filled in (a pure white quad drawn to a render target that had been cleared to black would have grey edges along the top and left).

You've likely seen this in a ton of games: when you turn on full-screen anti-aliasing, there's a weird line down the side of the screen during fadeouts and the like. This is the pixel offset problem rearing its head.

How do you fix this? Turns out, it's simple. When drawing a full-screen quad, instead of adding a half-texel to the uv coordinates, you should instead SUBTRACT a half-pixel from the position (not the uv coordinates). This shifts the quad so that it lies perfectly within the grid in the diagram, so that every MSAA sample hits the geometry. As an added bonus, it means that the UV samples that you get will already be in the texel centers; no more adding a half-texel required!

Of course, if you're drawing world geometry and using its screen coordinates to index into a texture map (like, for instance, if you're using light pre-pass rendering), you'll still want to add the half-texel instead. In fact, here is a great reference on how to handle this case.

Take Us Home, Article
The half-texel/half-pixel offset is a bizarre feature in Direct3D 9 (and, by extension, XNA). In order to properly handle it when using full-screen-sized textures:

• When rendering a full-screen (or otherwise screen-aligned quad), subtract a half-pixel's size from the output vertex position from your vertex shader. This will ensure both that the texel centers line up with the pixel centers (for proper texture sampling) AND that the quad will play nice along the left and top edges of the screen with MSAA.

• When rendering normal geometry, the geometry is already in the proper place (i.e. it already plays fine with MSAA). Consequently, you should add a half-texel to the uv coordinates for your full-screen sample. This will allow you to sample from the texel centers as desired.

While this article refers mostly to full-screen effects, this information is generally more useful when downsampling textures, as you need to know where your sample points on the source texture will actually hit when rendering (and you'll want bilinear filtering for, for instance, a 4-tap 4x4 average downsample).

Hopefully, if you were as confused by the half-texel offset as I was, this helps clear things up.

Directly Mapping Texels to Pixels (Direct3D 9)

Coordinate Systems (Direct3D 10)

## HUDson Hawk (The King of Terrible Puns Returns!)

Okay, I'm back! Sorry for the delay, my job got super-crazy there for a month or so. It hasn't really let up too much, but it's enough that I was able to get a little bit done. However, nothing really to show for it, I'm afraid.

But, I do have SOMETHING interesting: a look into the HUD design process. This work was done almost a month ago, but I haven't had time to even sit down and write this entry until now.

Necessary elements

There are a few elements that are necessary on the in-game HUD:

1. Player name - Especially important in two-player mode, having both players' names on-screen will help to differentiate which statistics belong to which player
2. Lives - Also very important is the number of lives that a player has.
3. Score - Points. Very important.
4. Weapon Charge - You'll acquire weapons charge throughout the course of the game, which you'll be able to spend to temporarily upgrade your weapons. This meter will show you how much charge you have. I chose to represent this with a blue bar.
5. Secret Charge - I'm not quite ready to divulge this little gem, but this meter only fills up when the blue (weapon charge) meter is completely full. I chose yellow for this one.

Mockups!
I quickly made a mockup of my initial idea for the hud.

Click to enlarge

The first suggested modification was to swap the two meters vertically. Because the yellow bar only fills up when the blue meter is full, it would be analogous to pouring in water (filling from the bottom up). I liked this concept, so I swapped the bars (I wanted to get just one readout set up, so I took out Player 2 for the next while):

Click to enlarge

At that point, I didn't really feel that the look was consistent. The text didn't match the bars, and I didn't like the look of the gradient on the text. So I reworked both so that they had bright centers fading to darker colors at the top and bottom extremeties, which really unified the look of the various elements

Click to enlarge

At this point, it was noted that the bar looked kind of stupid relative to the text, since it's so much larger. So the scale of the bars was modified to match the height of the text. This also allowed a little bit of the height to be taken out of the HUD.

Click to enlarge

At this point, I was happy with the layout, so I set out to figure out how to put the second player in. Initially I had two options:

Click to enlarge

The first one is sort of the "classic" two player layout. The second was optimized so that there would be a minimum of data sitting over where the enemies are coming from (potentially obscuring relevant enemy activity). However, everyone that I talked to (including myself, though I promise that dialogue was mostly internal) thought that option B was a pretty terrible-looking layout, so I scrapped it entirely...but I wasn't entirely happy with the first option, either.

So I started to play around some more.

I spread out one bar across the entirety of the top of the screen, and placed the second one at the very bottom of the screen. It's unobtrusive and quite neat:

Click to enlarge

...but it seemed a bit off-balance. Eventually, mittens had a great idea: why not mirror one of the bars (swap the elements left to right), so that it would be more or less radially balanced. So I basically did that, I moved the top player's life count off to the right edge of the screen, and swapped the player name and the score of the bottom player. And it actually looks pretty good!

Click to enlarge

...but some people still really liked the "classic look"

Have Your Cake and Eat It, Too

So I opted for both! The default will be the new theme (conveniently entitled "Default"). But the option will be there for the "Classic" HUD, for those that prefer it.

So here they are!

Click to enlarge

The current UI layouts for the game!

Hope that gave you some insight into the craziness of the process! Enjoy!

(PS: If you haven't already checked it out, you should go play Asplode! You can find it at mittens' development journal. It's fun!)

## Scrollathon 2008

This previous weekend, I was able to accomplish another major milestone in game development: The Scrolling Background (TM) (C) (R) (BBQ).

Click to enlarge

Requisite Scrolling Video (Xvid AVI, ~2MB)

The Skinny On Scrolling
The interesting thing about the scrolling method that I settled on is that it's not based on any sort of overall world coordinate system. World coordinates don't actually exist, the only true coordinate system is the screen coordinate system (with coordinates ranging from -16,-9 to +16,+9, for a delicious [and integer-tastic] 32:18 [2x 16:9] visible area).

So how does it work? Each level will be built out of tiles, in order. Each tile has the following data:
• A model to render
• Collision Data
• The Camera Path

The camera path for a tile is currently just an input position and an output position. That is, the position at which the camera ENTERS the tile, and the position at which the camera EXITS the tile.

Now, here's the trick: Say you have two of the same tile next to each other. Each has an input coordinate of (0,1) and an output coordinate of (4, 0). What the system does is it moves the second one so that its input coordinate is in the same spot as the first one's output coordinate. (that is, the second one's input coordinate becomes effectively (4,0) like the first's output coordinate and, relative to that, the second's output coordinate becomes (8, -1)).

However, actual world coordinates aren't strictly necessary, so whichever tile the camera is currently in is considered the "origin" tile. That is, it is used as the basis by which all other visible tiles get their on-screen positioning.

Thus, the setup is easy: figure out where on-screen (given the camera's position in the tile) the tile should display, then make all of the visible tiles to the left and right relative to that.

This is nice for a few reasons:

First off, if, for some reason, a level were RIDICULOUSLY long, I would never have to worry about accumulating floating point round-off error.

The big thing is this allows me to have what is essentially a staple of the shoot-em-up game (and is actually quite visible in the video posted above): an endless loop of background.

These loops are especially useful for when fighting bosses. Say you're zooming down a metallic corridor while scrapping with a boss that happens to be flying along with you. Rather than have to hope that the player finishes the fight before the camera hits the level's end, you can just rely on the fact that the corridor will keep on looping until something triggers the loop's end, signaling that the level should keep going (or end, assuming that there's no more to the level).

This triggering system is not yet implemented, and I hope to get it done this weekend (though I have a ton of other, smaller items on the to-do list, so it may have to wait for the NEXT weekend).

One design element that was tricky was signaling to the player that the ship is too close to a wall. The obvious metric is, of course, a shadow. However, standard shadows only cast in one direction, which would be great if all we cared about was distance to the floor. However, we really need "distance to any object." This looks like a job for the existing lighting system!

A new type of "light" was designed: essentially a black light, which has a center, a length, and a radius (thus, the actual light is more like a line light than a point light). Consequently, the fakey shadow from the ship will "cast" onto any surrounding objects.

Click to enlarge

And, once again, that's all we have time for on this week's episode of "What Did Drilian Do Last Weekend". Stay tuned next week, same Bat-Time, same Bat-Channel!

## One Pass, Two Pass, Red Pass, Blue Pass

Here it is: another late-week journal update that pretty much chronicles my weekend accomplishments, only later.

But First, Beams!

First up, here's a preview of the powered-up version of the final main weapon for the project:

Click to enlarge

The beam itself actually locks on to targets and continually damages them. Implementation-wise, it's a quadratic bezier. Initially, I tried to calculate the intersection of a quadratic-bezier-swept sphere (i.e. a thick bezier curve) and a bounding sphere exactly. That's all great, only it becomes a quartic equation (ax^4 + bx^3 + cx^2 + dx + e == 0), which is ridiculously difficult to compute programmatically (just check the javascript source on this page to see what I mean). So I opted for another solution:

I divided the curve up into a bunch of line segments, treated those segments as sphere-capped cylinders (capsules), and did much simpler intersection tests. PROBLEM SOLVED!

I also implemented Light Pre-Pass Rendering, which is sort of a "Low-Calorie Deferred Shading" that Wolfgang Engel devised recently. Considering my original plan for lighting was to only allow right around 3 lights on-screen at a time, this gives me a much greater range of functionality. It's a three-step process, as illustrated below.

Render Object Normals (Cut a hole in a box)

Step 1: render all of the objects' normals and depth to a texture. Due to limitations that are either related to XNA or the fact that I want to allow a multisampled buffer, I'm not sure which, I can't read from the depth buffer, so I have to render the object depth into the same texture that the normal map is rendering into.

Given two normal components, you can reconstruct the third (because the normal's length is one). Generally, Z is used on the assumption that the Z component of the normal is always pointing towards the screen. However, with bump mapping (and even vertex normals), this is not a valid assumption. So just having the X and Y normal components is not enough. I decided to steal a bit from the blue channel to store the SIGN of the Z component. This leaves me with 15 bits of Z data, which, given the very limited (near-2D) range of important objects in the scene, is more than plenty for the lighting (as tested by the ugly-yet-useful "Learning to Love Your Z-Buffer" page).

Consequently, the HLSL code to pack and unpack the normals and depth looks like this:

float4 PackDepthNormal(float Z, float3 normal){  float4 output;  // High depth (currently in the 0..127 range  Z = saturate(Z);  output.z = floor(Z*127);    // Low depth 0..1  output.w = frac(Z*127);    // Normal (xy)  output.xy = normal.xy*.5+.5;    // Encode sign of 0 in upper portion of high Z  if(normal.z < 0)     output.z += 128;  // Convert to 0..1  output.z /= 255;      return output;}void UnpackDepthNormal(float4 input, out float Z, out float3 normal){  // Read in the normal xy  normal.xy = input.xy*2-1;    // Compute the (unsigned) z normal  normal.z = 1.0 - sqrt(dot(normal.xy, normal.xy));  float hiDepth = input.z*255;    // Check the sign of the z normal component  if(hiDepth >= 128)  {    normal.z = -normal.z;    hiDepth -= 128;  }    Z = (hiDepth + input.w)/127.0;;}

And, it generates the following data:

Click to enlarge

That's the normal/depth texture (alpha not visualized) for the scene. The normals are in world-space (converting from the stored non-linear Z to world position using the texcoords is basically a multiply by the inverse of the viewProjection matrix, a very simple operation).

Render Pure Lighting (Put your junk in that box)

Next step, using that texture (the object normals and positions), you can render the lights as a screen-space pass very inexpensively (the cost of a light no longer has anything to do with the number of objects it's shining on, it's now simply a function of number of pixels it draws on). Bonus points: you can use a variation on instancing to render a bunch of lights of a similar type (i.e. a group of point lights) in a single pass, decreasing the cost-per-light even further.

The lighting data (pure diffuse lighting, in my case, though this operation can be modified in a number of ways to do more material lighting types and/or specular lighting if necessary, especially if you have a separate depth texture) is rendered into another texture, and it ends up looking as follows:

Click to enlarge

That's a render with three small point lights (red, green and blue in different areas of the screen) as well as a white directional light.

Render Objects With Materials (Make her open the box)

Finally, you render the objects again, only this time you render them with materials (diffuse texture, etc). However, instead of doing any lighting calculations at this time, you load them from the texture rendered in step 2.

This ends up looking like this (pretty much what you expect from a final render):

Click to enlarge

And that's how light pre-pass rendering works, in a nutshell. At least, it's how it works in my case, which is very simplistic, but it's all I need for the art style in my game. It's a lot easier on the resources than deferred shading, while still separating the lighting from the objects.

Delicious!

Hopefully, in my next update, I'll have an actual set of background objects (as that's the next item on the list, but it does require the dreadful tooth-pulling that is "artwork," so we'll see how quickly I can pull this together.

Until next time: Never give up, never surrender!

## Progress: Like Regress, Only Forward!

It's been tricky to make much progress these last couple weeks - having a (non-gaming) coding job and being able to come home and work gets tricky, so a large majority of my game coding time is weekend time. Also, couple some deadlines at work, and you've got a large case of "I don't want to code when I hit home."

However: I did make a good deal of progress these last few weeks.

If you look at the screenshot in my last entry, it should be plain exactly HOW MUCH. Suddenly, my little experiment looks considerably like a GAME.

Click to enlarge

Also! GAMEPLAY VIDEO!
High Quality Xvid, 32MB
Low Quality Xvid, 3.5MB

Particles Make Me Sneeze
The biggest hurdle for this section of the project was the general-purpose particle system. Even though I've done a bunch of crazy graphics-related stuff, a particle system has NEVER been on that list. But no longer!

For my particles, I wanted the following data:

• Position (3D)
• Rotation (around Z)
• Image Index (which image in the particle map to use)
• Scale (how big the particle is scaled, relative to the particle map's specified world size)

The particle map mentioned in that list is a simple list of texture coordinates into the particle texture (Which contains images for all of the particles), as well as the size of a given particle in world space.

The particles in this system are actually rendered using shader constants (2 float4 shader constants per particle), which gave me right around 100 particles per draw call. On my PC, I can push the system up to 24,000 particles before it starts to slow down. On the Xbox 360, it's closer to 6000. Both of those are well within my game's target maximum of 2000 particles, and I could probably get that number higher if I had to.

The State Machine Tells Way Fewer Lies Than the Political Machine
One thing I learned when working on Mop of Destiny was how to set up a totally sweet state machine in C++. I got to port those concepts over to C#, which made it even EASIER, given all of the reflection support. Note that I do the majority of the reflection calls at application startup, so the expensive calls are already done when it's time to do the actual game running.

Each state can have three functions associated with it: Begin, Tick, End.

Begin is called on the transition to that state from some other state.
Tick is called every time the state gets run (once per frame)
End is called on the transition from that state to some other state.

Also, each state can have a number of transitions associated. They take the form of: "BooleanFunction = TargetState"

Every frame, before calling tick, the state machine core will run each of the specified functions. When one of them evaluates to true, it switches states to the new TargetState, which will then be run (unless one of ITS transitions triggers). A state can also call the SetState function direction, but having the transitions in the function attribute makes it really easy to see where a state can transition to.

See You Later, Allocater!
One of the most important things that I have been doing with my code is ensuring that, during the run of a game level, no memory is allocated. At all. The reason is the .Net garbage collector (GC).

The GC, on Windows, is triggered every 2MB of allocations (among other scenarios, including low-memory and lost-focus cases). On the Xbox 360, the GC runs ever 1MB of allocations. Since the GC pauses all threads while it does its thing, it's better that it's never triggered during runtime...ESPECIALLY if the heap is complicated and garbage collection is going to take a while.

To handle this, I've created a few of my own data structures, including the OrderlessList. I've used OrderlessLists alot throughout my code. Simply stated, it's an array (allocated at the time of the object with some maximum number of elements) in which the order of the objects is unimportant (i.e. it can be reordered and it doesn't matter). Given the property of being able to reorder at any time, removal from the list is a simple matter of copying the last list over the top of the element being removed, then decreasing the reported size of the list.

For the bullets (both the player and the enemy bullets), there's an OrderlessList of live bullets, and an OrderlessList of dead bullets. Whenever a bullet is needed, an object is retrieved from the dead bullet list, given its properties, and added to the live bullet list. Whenever a bullet dies (goes off-screen or hits an enemy), it is deactivated and returned from the live bullet list to the dead bullet list. No allocations necessary.

That's right, it's the ol' "pool of objects so you don't have to allocate" trick. But hey, it works!

Rambling Is For Cowboys, Not Coders
Alright, enough talk! Tomorrow is another day at work, so it's likely I won't make any more progress until next weekend.

In the meantime, death is but a door, time is but a window; I'll be back.

## Mork Calling Drilian, Come in, Drilian

I haven't had nearly as much time to get stuff done at home as I'd like, as work has been a bit of a scramble recently. Working ridiculously hard at a code-related day job and then coming home and trying to code is...difficult. And recently, highly unsuccessful.

However, this last weekend I was able to get a few things done.

Silos Are Not Just For Grain and Missiles
First up, I decided to check out Nevercenter Silo, a 3D modeling program that I swear has to be the easiest-to-use modeling software I have ever SEEN. For some reason, this software just completely clicks with me.

Maybe it's that it allows me to start with something as simple as a box and push/pull/extrude/warp/etc it slowly into the shape that I want, or maybe it's that it has built-in support for symmetrical modeling (you basically model HALF of a model and the other half changes shape along with it). It's hard to say. However, it feels more like sitting down with clay and slowly morphing it into the shape that I want vs. the usually-cumbersome task of modeling a 3D mesh.

Consequently, not terribly long after I got it, I modeled what will likely be the first ship for my new game!

Click to enlarge

I used subdivision surfaces to do the modeling, so the source geometry (pictured to the left above) is actually rather simple. But when subdivided, it forms (what I think is) a pretty cool-looking spacecraft.

S.S. Procedural Is Leaving Port
I also managed to get my procedural content (textures, mostly) generation ported over to run on XNA (from C++, so it was a bit of work). Since I'm aiming to release this game as part of the XNA dealy on the 360 (with an option for XBLA if I'm really, really lucky), I've been taking great care in ensuring that it is going to run well on the Xbox 360. So far, so good. I have background asset generation working, etc etc.

But more importantly, I have my ship model loading into the game.

AND it's procedurally textured using my super-spiffo texture generation framework, so that's awesome, too.

Check it: (Yes, that is a wireframe bezier patch in the background and yes, that means I'm also going to have procedural geometry generation)

Click to enlarge

So, to sum: not a lot of coding done (the overwhelming majority of that has been done at work), but I do finally have some pretty new screenshots!

## One Year Later

It's done! Finally, after over a year since its completion, Mop of Destiny gets its own webpage!

I had a really difficult time trying to figure out what the webpage should even look like, so I just kept putting it off, until last night when I was almost asleep, I had that "eureka!" moment immediately before dozing off. Luckily, I remembered my idea in the morning!

Also DrilNES 1.10 is released. I fixed up a few very minor emulation issues, added support for all of the 6502's "undocumented" opcodes (i.e. the opcodes that just happen to work even though they're really not supposed to), and modified the display a bit.

Now you can even make it look like a crappy old TV, if you choose! For, uh..nostalgia's sake.

Enjoy!

## DrilNES - This Space Intentionally Left AWESOME

HERE COMES A NEW CHALLENGER!

Inspired by Scet's Tub of Awesome, I opted to continue work on my old emulator, DrilNES.
And here it is! DrilNES in all its glory! Note that it does not support PAL NES timing, it only runs NTSC games (so US and Japanese games only).

A Brief History of the World

I first attempted to tackle NES emulation back in 1999. I had the goal of getting three games to be playable: Castlevania 3, Startropics, and Crystalis. Turns out, these three games are some of the harder games to emulate, due to the tricky nature of the cartridge hardware that they run on. However well it ran at the time, it was woefully inaccurate, and this bothered me. In mid-2004, I apparently decided to try again.

Emulator Action

The great thing about having started this project over in 2004 is that I have the entire history of the emulator's development in SVN, so I can see exactly what I did, and in what order. Here's a quick list of the hilights:

• Wrote the CPU emulation code (runs opcodes, etc).
• Got the PPU (pixel processing unit) up and running
• Rewrote the CPU emulation code to be more accurate with regards to instruction length counting (all instructions now emulate every read and write of the instruction, even the unnecessary ones, and the CPU cycle is clocked on each memory access).
• Added input and mappers. Mappers are emulators of the hardware that came IN the various cartridges. Originally mappers were just for memory mapping, but the hardware eventually added additional graphical, sound, and interrupt capabilities as well.
• Added support for the color emphasis bits which...tweak the output color from the NES in various ways.
• Got various IRQs running (for interrupting the CPU at certain points in the audio playback or at certain scanlines)
• Added a custom rom open dialog, with a treeview and stuff
• Even more mappers
• Two-player support
• Rewrote the MMC3 mapper from scratch, because the MMC3 code was nasty and ugly and I hated it. And it insulted my mother. And your mother.
• IRQ fixes. At this point, judging from some...colorful SVN log comments, I was starting to hate IRQ work.
• Set down the code for a year.
• Complete PPU rewrite to be way more accurate. This was the point at which I was starting to hate Battletoads, which is probably the most touchy game when it comes to accurate timing.
• Complete PPU rewrite. Yes, I know that is also on the previous line. I rewrote it again. Seriously.
• Rewrote the PPU's sprite access logic. STOP REWRITING ALREADY SERIOUSLY WHAT THE HELL
• Rewrote the IRQ logic again. Ugh.
• Rewrote the CPU. This time the CPU is awesome and infallible and will never need to be rewritten or even fixed ever again. Also, I think I was slowly losing my sanity.
• Added a faster read of guaranteed in-ROM memory, without going through all of the IO handling code. I didn't notice for over a year and a half, but this totally broke the accurate CPU timing that I had going for me.
• Rewrote the CPU again. Apparently last time I was wrong. Boy this is awesome!
• Stopped working on DrilNES for a year and a half. I'm not sure but I think I'd had enough.
• Started last weekend! Multithreaded the emulator.
• Added a whole bunch of GUI features (including ripping out the now-hideous custom rom open dialog that I added back in early 2005)
• Added the VRC6 mapper (with its additional sound channels) so that Akumajou Densetsu (Castlevania 3 Japanese) works!

As you can see, development was a tortured path, filled with rewrite...after rewrite...after rewrite...

The upside is that now it's pretty accurate. It's not perfect, I'm sure, but it's pretty good and better than a large majority of emulators out there.

Vista-Specific Functionality

One thing that was fun to add was some Windows Vista-specific functionality. Mainly, when you load roms or save state, there are little popup windows that notify you. In Vista, they look like this:

instead of just a generic yellow-on-black window. It's nice to be able to see through them.

Also custom is the Rom load dialog (link to the screenshot, I was too lazy to make a thumbnail for this screenshot), though there is a custom one in XP as well. Basically, it lets you see relevant information about a ROM before you load it (and will also prevent you from loading unsupported ROMs).

But, in general, it's nice to be "done" with it (though there are tons of things that I would like to do with it still).

## Seriously, I Was Bored

No real dev updates. I'm still working on the behind-the scenes stuff. I got a bit bogged down with the asset management - turns out, handling on-the-fly asset streaming on the 360 using XNA is a tricky proposition, due to the garbage collection (which triggers every megabyte or so of allocation, grinding everything to a halt if your heap is too complex) and the XNA content management system (you can't dynamically unload any given individual component - it's all or nothing).

Because it's been rather frustrating, I didn't feel like coding tonight.

Instead, I did some image editing based on a dumb idea I had.

Click to enlarge

That's all.

## Slow Progress Is Still Progress

I've gotten less done recently that I would have liked, due to a lack of time to sit down at my computer. However, I did implement tiling perlin noise in-shader.

It uses the same basic technique as I used to set up the tiling noise (so you can read it in one of my earlier posts).

Basically, I can generate perlin noise that tiles at any given (integer) position.

Quite handy for generating tiling textures (because not everything I'm generating needs to be 3d)

Here are some examples. It's hard to tell that they tile without actually tiling them yourself, but they do. So there.

Click to enlarge

So I used it to put together a seamless version of my earlier brick texture:

Click to enlarge

Finally, I did the same basic trick with the Worley (cellular) noise.

The first screenshot is a large area repeating Worley, the second repeats at a very small level so it should be obvious how it tiles, even looking at the thumbnail:

Click to enlarge

Next up: Maybe I should actually do something useful with these things.

## Pumpkins: Great For Pie, Great For Faces

I haven't had time to do any more work on my project the last couple days, but I did have time to carve some faces on some pumpkins. My wife requested that the rounded one be a happy face, so I got to make some sort of surprised "OMGWTFBBQ" face on the second.

Click to enlarge

kekeke

## Jpeg Buoys Amidst a Sea of Text

So I put off working on this entry long enough that it's now two entries worth of data in one.

Too Many Instructions: Cutting Down On the Noise

So, the implementation of Improved Perlin noise from GPU Gems 2 boils down to 48 pixel shader instruction slots (9 texture, 39 arithmetic). That's one octave of noise. What I needed, desperately, was a faster implementation of noise, where the base quality doesn't matter (especially useful for things such as fBm and the like).

In the FIRST GPU Gems, in the chapter on Improved Perlin Noise, Ken Perlin makes a quick note about how to make a cheap approximation of perlin noise in the shader, using a volume texture. The technique is straight forward, but it took me some effort to understand exactly what was supposed to go into the volume texture.

In my case, I ended up using a 32x32x32 volume texture to simulate an 8x8x8 looping sample of perlin noise space. Essentially, when sampling this texture, divide the world position by 8, and use that as the (wrapped) texcoord into the volume.

Crazy 8s: Modifying Perlin Noise To Loop At A Specified Location

The first trick is that it has to be LOOPING Perlin noise. But how do you generate such a thing?

Turns out, in the reference implementation of Improved Noise, there are a bunch of instances where there are +1s. For instance:
A = p[X  ]+Y;AA = p[A]+Z;AB = p[A+1]+Z;B = p[X+1]+Y;BA = p+Z;BB = p[B+1]+Z;

(Later, AA, AB, BA, and BB are also accessed with +1s).

Figuring out how to make the noise wrap at a specific value (in my case, 8), was a matter of rethinking those as follows:
A = p[X  ]; // note: no +Y hereAA = p[A+Y]  (+Z); // +Z in parens because it actually gets added later, like the Y does hereAB = p[A+(Y+1)] (+Z);B = p[X+1]; // again, no +YBA = p[B+Y] (+Z);BB = p[B+(Y+1)] (+Z);

So, to make the noise wrap at a certain value, you need to take those (coordinate+1)s and change each into a ((coordinate+1)%repeatLocation).

The final version of the texture shader that generates noise that loops at a specific location is as follows:

// permutation tablestatic int permutation[] = { 151,160,137,91,90,15,131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180};// gradients for 3d noisestatic float3 g[] = {    1,1,0,    -1,1,0,    1,-1,0,    -1,-1,0,    1,0,1,    -1,0,1,    1,0,-1,    -1,0,-1,     0,1,1,    0,-1,1,    0,1,-1,    0,-1,-1,    1,1,0,    0,-1,1,    -1,1,0,    0,-1,-1,};int perm(int i){	return permutation[i % 256];}float3 texfade(float3 t){	return t * t * t * (t * (t * 6 - 15) + 10); // new curve//	return t * t * (3 - 2 * t); // old curve}float texgrad(int hash, float3 p){  return dot(g[hash%16], p);}float texgradperm(int x, float3 p){	return texgrad(perm(x), p);}float texShaderNoise(float3 p, int repeat, int base = 0){	int3 I = fmod(floor(p), repeat);	int3 J = (I+1) % repeat.xxx;	I += base;	J += base;    p -= floor(p);    float3 f = texfade(p);  	int A  = perm(I.x);	int AA = perm(A+I.y);	int AB = perm(A+J.y);	 	int B  =  perm(J.x);	int BA = perm(B+I.y);	int BB = perm(B+J.y);    	return lerp( lerp( lerp( texgradperm(AA+I.z, p + float3( 0,  0,  0) ),                                   texgradperm(BA+I.z, p + float3(-1,  0,  0) ), f.x),                           lerp( texgradperm(AB+I.z, p + float3( 0, -1,  0) ),                                 texgradperm(BB+I.z, p + float3(-1, -1,  0) ), f.x), f.y),                     lerp( lerp( texgradperm(AA+J.z, p + float3( 0,  0, -1) ),                                 texgradperm(BA+J.z, p + float3(-1,  0, -1) ), f.x),                           lerp( texgradperm(AB+J.z, p + float3( 0, -1, -1) ),                                 texgradperm(BB+J.z, p + float3(-1, -1, -1) ), f.x), f.y), f.z);  }

Whee!

Noise + Real Numbers + Imaginary Numbers == ???

So, the second trick: the texture actually needed to contain two values (R and G channels), to act as real and imaginary parts. Very simple, I added a base parameter (in the code above) so that I could offset into a different 8x8x8 cube of noise. I drop a different 8x8x8 noise into the G channel.

Finally! We have a texture with 8x8x8 noise. But 8-cubed noise sucks, because it's ridiculously repetative. That's where that weird imaginary part comes into play. You sample the 8-cube volume again, but at 9x scale (so it's lower frequency). You then use the (real component of) high-frequency as an angle (scaled by 2pi) to do a quaternion rotation on the low-frequency noise.

float noiseFast(float3 p){  p /= 8; // because the volume texture is 8x8x8 noise, divide the position by 8 to keep this noise in parity with the true Perlin noise generator.  float2 hi = tex3D(noise3dSampler, p).rg*2-1; // High frequency noise  half   lo = tex3D(noise3dSampler, p/9).r*2-1; // Low frequency noise    half  angle = lo*2.0*PI;  float result = hi.r * cos(angle) + hi.g * sin(angle); // Use the low frequency as a quaternion rotation of the high-frequency's real and imaginary parts.  return result; // done!}

And that's it! Compare the instruction counts of the real Perlin noise to this fast fake:

Old (high-quality):  approximately 48 instruction slots used (9 texture, 39 arithmetic)New (lower-quality): approximately 20 instruction slots used (2 texture, 18 arithmetic)

Essentially, wherever I don't need the full quality noise, I can halve my instruction count on noise generation. Score!

Here's a comparison: on the left, the weird confetticrete chair with the original noise, and on the right is the new faster noise:

Old (left) vs. New (right)
Click to enlarge

They look roughly the same, there are some artifacts on the new one (the diamond-shaped red blob on the upper-right of the new chair due to the trilinear filtering), but it's way faster.

Cellular Noise

Okay, I have some cool perlin noise stuff. But man cannot live on Perlin noise alone, so I decided to implement cellular noise, as well.

Turns out, there's something called Worley noise which does exactly what I was hoping to do. Implementation was pretty simple.

void voronoi(float3 position, out float f1, out float3 pos1, out float f2, out float3 pos2, float jitter=.9, bool manhattanDistance = false ){  float3 thiscell = floor(position)+.5;  f1 = f2 = 1000;  float i, j, k;    float3 c;  for(i = -1; i <= 1; i += 1)  {    for(j = -1; j <= 1; j += 1)    {      for(k = -1; k <= 1; k += 1)      {        float3 testcell = thiscell  + float3(i,j,k);        float3 randomUVW = testcell * float3(0.037, 0.119, .093);        float3 cellnoise = perm(perm2d(randomUVW.xy)+randomUVW.z);        float3 pos = testcell + jitter*(cellnoise-.5);        float3 offset = pos - position;        float dist;        if(manhattanDistance)          dist = abs(offset.x)+abs(offset.y) + abs(offset.z);         else          dist = dot(offset, offset);        if(dist < f1)        {          f2 = f1;          pos2 = pos1;          f1 = dist;          pos1 = pos;        }        else if(dist < f2)        {          f2 = dist;          pos2 = pos;        }      }    }  }  if(!manhattanDistance)  {    f1 = sqrt(f1);    f2 = sqrt(f2);  }}

The gist is that each unit cube cell has a randomly-placed point in it. for each point being evaluated by the shader, you find the distance to the nearest point (a value called "F1"), and the distance to the next-nearest ("F2"), etc (to as many as you care about - though anything past F4 starts to look similar and uninteresting). Using linear combinations of these distances gives interesting results:

Left: F1 Right: F2
Click to enlarge

Left: F2-F1 Right: (F1+F2)/2
Click to enlarge

Something cool to do, also, is to use Manhattan distance instead of standard Euclidian distance to calculate the distance. You end up with much more angular results. Here are the same 4 calculations, using manhattan distance:

Click to enlarge

Considering that a few levels of my current project will take place in a metallic fortress, this will especially come in handy.

So, what can you do with these?

I, predictably, have made a few test textures:

Click to enlarge

Also, it still looks pretty cool if you use fBm on it. For instance:

4 octaves of F1 Worley noise

But I hear you asking "duz it wrok n 3deez, Drilian?!?!?!" Oh, I assure you it does!

Click to enlarge

And now I hear you asking "Can u stop typing nau? I is tir0d of reedin." (or alternately, "I is tir0d uv looking @ imagez sparsely scattered thru the text taht I dun feel liek reedin.") To this, I say: Sure, but it worries me that you're asking your questions in some form of lolcat.

That's all I got.