Coding and Abstraction

posted in The Bag of Holding

Published December 05, 2005

I'm a sucker for advice. Which isn't to say that I'm particularly keen on receiving it, even on the rare occasions when I have the sense to solicit it. I'm much more keen on giving it, and usually on occasions when it isn't solicited. I'm sure this says something highly unflattering about my character, but I'm not asking for your advice on my character. You're here to get some advice from me. You just don't know it yet.

So bear with me for a few minutes while I invent some advice to give you. In the mean time, I'm going to stall.

One of the things that I like to tell people is to code at the highest possible level of abstraction. I tell them this even if they ask me what toppings they should get on their pizza. I also stole it from people who are much smarter than me (but then, isn't advice all about stealing things from smarter and wiser people?).

The basic idea behind this is that good programming and good design centers around abstraction. Bits are an abstraction of electrical voltage levels. Bytes are an abstraction of bits. Integers are an abstraction of bytes. Beating the digital crap out of zombies is an abstraction of (among many other things) integers. I think there's something that's an abstraction of beating the digital crap out of zombies, and thus ad infinitum, but I think that access to those planes is regulated by Zen and/or LSD.

In practical terms, coding at high levels of abstraction means using the right tools for the job. There's a whole design aspect to it that I could spend a lot of time discussing, but if I do that I'll forget what I really was talking about and we'll all leave with our minds full of garbage and feeling vaguely drugged. In terms of pure coding, though, the big players are tools and languages. I'll focus on languages, since the point I originally set out to make with all this drivelling had to do with languages.

Languages are beautiful tools of abstraction. They're a way to make the intent of a block of code clear to the programmer. CPUs only understand opcodes; few programmers, however, understand them (I sure as heck don't!). Assembler is a step up, because we've got nice english-looking letters and such instead of those icky hex numbers. But the intent of assembler code is hardly clear at a glance, unless the code is laced with enough comments to make War and Peace look like a History Channel factoid. The progression continues up through all the usual suspects: C, C++, Smalltalk, Java, BASIC, Python, et. al. So-called "high-level" languages are high not in terms of being deeply acquainted with reefer, but in terms of being highly abstract. Python knows about strings intrinsically. Assembler doesn't.

High abstraction (usually) means easier implementation. There are some exceptions, but they are fairly easy to recognize when they occur, and I'm making this up on the fly so I'll pretend to have lots of examples, but leave the thinking up of said examples as an exercise to the reader. Games have been using scripting for years as a way to exploit abstraction. Writing code in a scripting engine is much more abstract - and therefore more efficient - than doing it in raw C++, or assembler, or hooking up a battery to the pins on the CPU and tapping in the signals by hand.

Scripting for abstraction has three benefits. First, and most importantly, it makes intent clear. PoliceShip.StartKillEnemiesAndLand(); is a (real life from Egosoft) script command. Isn't that a lot more obvious than three pages of threading code, calls into various AI, collision detection, and 3D rendering libraries, and a handful of housekeeping logic? You don't even have to know KC (the scripting language sampled) to know what that does. The second benefit is that it promotes encapsulation. In KC, there's not a magic function StartKillEnemiesAndLand() that ties directly in to the engine; there's actually a complete game logic system built in the language, and the low-level engine calls are quite a bit more basic and atomic than that. But we never have to worry about them, because they're wrapped in nice simple abstract calls. Entire dramatic battles can be laid out and set into blazing, exploding motion with just a few lines of abstract code. The third benefit of abstraction is that logic is localized to a single place. For instance, if there's a bug in the way one ship does StartKillEnemiesAndLand(), we can fix it once, and all ships will benefit from the fix. The logic for that operation is in one spot, not scattered implicitly across thousands of lines of engine code.

Scripting is good, but it's usually restricted to a simple (and false) dichotomy: engine vs. scripts. One of the things that I've thought about after reading The Pragmatic Programmer is that this should be a continuum, not a set of discrete layers. Of course it's eventually going to resolve into discrete layers (i.e. several different languages), because we haven't invented continuum languages yet.

At Egosoft, we have one of these dichotomies. There's an engine structure, with all of the modules and libraries and such built in, and there's the game logic layer implemented in KC. The KC layer has its own modules, libraries, and structure. It knows quite a lot about the engine, but that knowledge is constrained to wrapper functions and layers. In fact, KC even implements another scripting engine, that is highly abstract. The script engine controls things like AI and various goings-on in the universe. However, it's too abstract; it doesn't provide access to things like the menu system, or the 3D engine. It could, but adding that kind of access is neither easy to build or easy to use.

This is going somewhere... I think. Bear with me while I stall a bit more and pretend to have a purpose. (I'm really just drooling on my keyboard and seeing how long you'll watch before you give up and go play Ninja Loves Pirate.)

Engines, and scripting logic, have implicit layers of abstraction of their own; this is where design comes into play. For those of us who embrace the holy truth of OOP, we've got things like class hierarchies that let us abstract and encapsulate. A typical design has a lot of "basic worker" classes that exist simply to do specific things, and "logic" code that makes use of the workers to actually do something useful, like make the heads on zombies explode. In a scripted design, a lot (but not all) of this logic will be in the form of scripts, perhaps with additional layers of abstraction on top of that.

However, most of these layers of abstraction are split between a very small number of languages. The largest I've seen is four, on X3 (X2 also used a similar model): assembler, C++, KC, and the scripting engine. There are implicit layers in each language, even though each layer needs only a specific subset of the language's capability. Specifically, layer of abstraction N needs only the ability to talk to layer (N-1), and the ability to expose functionality to layer (N+1) if needed.

I have a vague feeling that this can be exploited. For instance, instead of writing layers of abstraction in the same language, why not make a simple language framework, and build each layer in a separate "dialect?" Stuff like template metaprogramming in C++ comes close to this, but is still constrained to a single dialect. What I'm thinking is more along the lines of having a "language template" where the basic control structures and syntax is specified, but the available entities are generated dynamically from the lower layer. Basically, you could have an engine layer in C++ (or whatever) that does all of your "do stuff" code, and then a scripting framework engine. We'll call the "do stuff engine" layer 0. Layer 1 can use some kind of info about layer 0 (an automatically generated map of the classes, maybe?) to build a scripting dialect that the script engine can interpret. Then, layer 1 can build up some abstractions and "do stuff" layers of its own, and expose a dialect that can be spoken up in layer 2. Repeat this as much as you need.

The cost? It'd take a lot of up-front work to build such a system, and it would have to be done from scratch. The benefits? Many. Firstly, you get all of your layers in the same dialect. One of the things that bugs the crap out of me with Egosoft's method is that each layer is a different language entirely; I don't know the highest level of the scripting system, but I know the lower three. That seems backwards to me. I should be able to work at the highest possible level of abstraction - always.

The second benefit is localization of knowledge. Having discrete layers promotes encapsulation, and demands a good design. To wit, it ensures that each layer does precisely what it should - no more, no less. If it tries to do more, it will fail, because each layer's dialect doesn't have the vocabulary to do it. If it tries to do less, the system won't run - it may not even compile.

The real bottom line, though, is that each level of abstraction is automatically the right one. Each level is built on the knowledge of the level below it, and the dialect of the scripting language at that level does precisely what it needs to do. Each layer is therefore the optimal layer to do the work of that layer. You don't have to worry about whether or not Language Foo is the Right Tool For The Job; you fabricate the right tool.

I have doubts. I'm not sure if this is really practical in a large-scale project. I have a very clear idea of how I'd do it (down to building the script engine itself and the layer-generation mechanisms) but I'm not really sure how I'd use it in a real-world system. I think it might look different in practical use than in theory; there might be some automatic generation that creates "scripting" that actually is compiled C++ for performance reasons, while non-performance-critical stuff can be done in bytecode compiled languages or even interpreted languages. The cool thing is, if the scripting dialect generator is built right, it should be able to make a dialect that can target any of those endpoints. This means that the same scripting language, syntax, and philosophy can be "compiled" to C++, Foobletch, bytecode, or even straight interpreted. It could even change "compile targets" dynamically; does Layer N not run fast enough interpreted? Drop it down a layer and bytecode compile it. One extra step of preprocessing before your build is done, sure, but if you have a good automated build system that just means you can read one more post on GDNet per build than before. Bytecode not doing the job? Compile it straight into your engine by generating C++ code from the script on the fly.

I think I'll give it a shot with the Habanero engine. I've already sneakily built the basic layers so that they can be transported to other projects trivially. If this multiple-layer scheme pays off, it could usher in a whole new level of reusable code in my own work. That would be cool.

Now I know you've sat through this whole thing, eagerly awaiting the bit of advice that I promised you at the beginning. Well, I don't believe you. I think you just skipped to the end to get the juicy advice, and didn't mess with all that scary-looking nonsense up there. Well I'll show you: no advice! Hah!

Previous Entry The Habanero Chronicles: Phase 2

Next Entry How to kill yourself in five easy steps

0 likes 2 comments

Comments

evolutional

That post made a good read, especially for someone who's building a scripting environment into an engine.

December 06, 2005 11:26 AM

Thunder_Hawk

I don't think I've ever laughed as much reading something mildly informative as I did reading this post. You deserve something for that at least. [wink]

December 14, 2005 03:18 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

ApochPiQ

Author

Coding and Abstraction

Comments

ApochPiQ

Latest Entries

A Few Farewells

Code Reuse In Actual Practice

Source-Level Debugging For Epoch Programs

Using Poison to Reverse Engineer Code

Using Poison to Reverse Engineer Code

Debugging Information Success

Debugging Information Success

Debugging Epoch Programs

Debugging Epoch Programs

Epoch 64-bit compiler progress

Coding and Abstraction

Comments

ApochPiQ

Latest Entries

A Few Farewells

Code Reuse In Actual Practice

Source-Level Debugging For Epoch Programs

Using Poison to Reverse Engineer Code

Using Poison to Reverse Engineer Code

Debugging Information Success

Debugging Information Success

Debugging Epoch Programs

Debugging Epoch Programs

Epoch 64-bit compiler progress

Reticulating splines