• # Design Tradeoffs For Customized Script Interpreters

General and Gameplay Programming

A person can always use a common off-the-shelf scripting language, such as Lua or Python or Squirrel or JavaScript or similar, but sometimes there may be reason for a person to implement their own scripting language or interpreter (either for practical reasons, or maybe as a learning experience, or doing it for fun). Writing a compiler or interpreter can also be an interesting experience. (Granted, a person may ask themselves whether or not implementing something like this is actually good idea, but assuming here that this much is already decided). But, what sort of interprerer for what sort of language? While there may be many possible choices in the design process, not all choices may necessarily carry equal weight. There may also be a range of relative complexity and performance, all based on what a person wants to do. For example, anywhere from a simple interpreter directly parsing and executing lines of code line-by-line, to producing highly optimized machine-code with performance on par with that of traditional static compilers. A person may also realize that for their uses, they never need to go much beyond a parser and directly executing the syntax tree, and this may well be a good option as well.

# General Design Tradeoffs

There may be a number of tradeoffs in the design of both a scripting language and the underlying interpreter. I will attempt to address a few of them here.

## Syntax

There are many directions a person can go with syntax. Often, this ends up boiling down to one of several major alternatives:
• Traditional C-family style syntax
• Usually a specialized and often minimalistic syntax
• Pretty much everything else...

## Objects, Anyone?

Class/Instance Object Systems are pretty much the standard solution to the matter of objects, but these are not the only solution. For example, a language may use prototype objects instead, or possibly an engine-specific 'entity' type as the primary, or sole, type of compound object. Within class/instance, there is the tradeoff of single inheritance + interfaces, or the implementation of a multiple inheritance model. Single Inheritance has the advantage of being simpler to implement (since every child class simply appends its contents onto the parent class), and is "generally good enough" for comon situations. Interfaces and Default Methods can also address many of the limitations of single inheritance. Multiple Inheritance adds some possibilities, but also a lot of potential complexities, and a few "ugly issues" regarding things like language semantics. For example, what happens when a child class derives from two parents which both derive from a common parent? What happens if both parents provide an implementation of a given method? ... Prototype Objects are another possiblilty, where every object is essentially an unstructured bag of fields and methods, which may be modified freely. Instead of class inheritance, we may instead use delegation, where each object may in turn link to other objects which implement other fields and behaviors. Another option is also using class/instance, but making the ability to dynamically add fields or delegate to other objects be available as optional features.

## Static vs Dynamic Types

Static types allow superior performance, but may place limits the kinds of things which may be readily done, and by itself may require going through contortions in order to build dynamic data structures (How do you effectively build a structure which may hold several types of item if the language only ever allows a single type of item for each variable?). Dynamic Types are easier initially, and can be optimized via type-inference, but a VM using type-inference is typically more complex than one using a simplistic static type system. By themselves dynamic types may introduce the problem of needing to endlessly check types at runtime (reducing performance), and may hinder the ability of the compiler to effectively detect or warn about "obvious" programming errors. While it may seem like dynamic typing eliminates the need for dealing with type issues, more often it simply shifts them from being a compile-time problem to being a run-time problem, and may in effect more serve to obscure the nature of holes or deficiencies in the type system resulting in possible problematic edge case behaviors. For a behavior to work, it either needs to exist statically (within the compiler itself) or dynamically (within the runtime). A more reasonable tradeoff may be having both, with both statically typed values, and also a 'variant' type or similar, which may implement dynamically typed semantics. A drawback of doing this is that the implementor may need to deal with some of the issues of both static and dynamic type-systems in order to make something like this work correctly. It may also require essentially implementing alternate versions of many of the same operations (such as both an operation for adding two intergers as statically-typed values, and for adding two integers as variant values). There are differences in how to best represent these type-systems as well. For example, in a VM such as the JVM, each static type is represented via a "signature", which is an ASCII representation of the type (the .NET CLR also uses signatures, but they are binary). Each primitive type may then be assigned a sequence of one or more characters, and complex types may be identified via a character-sequence followed by a qualified name for the resource. for example (partly derived from the IA64 ABI's C++ name mangling scheme):
• 'a': signed char / byte (8-bit)
• 'b': bool (8-bit, packed 1-bit)
• 'c': char (8-bit)
• 'h': unsigned char / byte (8-bit)
• 's': short (16-bit)
• 't': unsigned short (16-bit)
• 'w': char (16 bit)
• 'i': int (32-bit)
• 'j': unsigned int (32-bit)
• 'x': long long / int64
• 'y': unsigned long long / uint64
• 'n': int128 (128-bit)
• 'o': unsigned int128 (128-bit)
• 'f': float (32-bit)
• 'd': double (64-bit)
• 'g': float128 (128-bit)
• 'v': void
• ...
Then, we might run out of single lower-case letters, and start using letters like A/B/C/D/G or similar as prefixes to define additional types. Likewise, we could define complex types like:
• Uqname; Named extension type.
• Xqname; Struct or Value-Class.
• Lqname; Normal Class.
• Ptype Pointer to type.
• Rtype Reference to type.
• Qtype Unsized array to type.
• Asize;type Sized array of type.
• ...
A question would then be, what about function signatures? One nifty idea is putting the argument types in parenthesis, and a type as a suffix, for example: void foo(int x, float y); would have a signature (if)v, and void (*foo)(int x, float y) as P(if)v. A related idea is doing similar for modifier flags and attributes, which can be more flexible and extensible than the use of a fixed-size flags field, and potentially more convenient than representing things like this using explicit data structures. Granted, it may be necessary to generally unpack into flags and data-structures prior to use. For dynamic or variant types, it may be instead more useful to include the type in the value itself. One common way to do this is via tags. For example, one idea here would be to use 64 bit values, and then use the high order 4 bits as a tag, for example:
• 0, Pointer (Positive)
• 1, Integer (Positive, Fixnum)
• 2, Integer (Positive, Fixnum)
• 3, Double (Positive, Flonum)
• 4, Double (Positive, Flonum)
• 5, -
• 6, -
• 7, Subdived Tag Space.
• 8, Flonum2 (Double >>> 4)
• 9, -
• 10, -
• 11, Double (Negative, Flonum)
• 12, Double (Negative, Flonum)
• 13, Integer (Negative, Fixnum)
• 14, Integer (Negative, Fixnum)
• 15, Pointer (Negative)
In this scheme, it can be noted that pointers are very easy to convert (all valid pointer values, say, on x86 or x86-64, are also valid references of the pointer type). Most of the common Double range also maps exactly with no loss of precision (apart from very large/small values, which lose 4 bits). This also allows integer values of up to 62 bits.
This will still require use of another mechanism for identifying the type of on-heap objects.

## Lexical Scope and Closures

Sometimes we may want lexical scope and closures in a language. In this case, we need not only to refer to the current scope, but also to the captured enclosing scope. One option here is simply to make locals and arguments simply be part of the lexical environment, using a similar representation regardless of whether or not anything is captured, possibly using a single index value to refer both to the current scope and also to any parent scopes. However, this makes more of a mess of using the local environment for registers. Another option is to use a 2D index, which is omitted for local variables and arguments:  LXLOAD_I R0, 1, 5 //Lexical Load Level 1 Index 5 into R0  This makes lexical scoping the special case, and may provide fast/simple access to locals. Another alternative is to instead make registers be their own thing, and use explicit variable LOAD/STORE operations.  LLOAD_I R0, A5 //local load Arg5 into R0  However, this introduces additional operations and may potentially reduce performance. Another question may be regarding whether or not captured variables should themselves be special, or if closures will simply capture the entire parent binding frame (Simple, but may waste memory and prevent destruction of non-captured bindings). One option here is to make captured locals special, using LXLOAD/LXSTORE, for these bindings, in contrast to other non-captured bindings. Typically, we may also need an explicit operation to capture the current environment in the creation of a closure, potentially clean-up the environent in the non-capture case, and it may also make sense to be able to have more efficient handling in the cases of lambdas which do not capture bindings, and maybe also those which can be determined not to exceed the lifetime of the parent scope.

## While() we're at it

Assuming we go beyond simply executing ASTs or similar, another question may be how to best handle execution. A typical answer is we compile to bytecode. But, bytecode isn't the end of the story, since we will probably need to execute it somehow. A common and simple answer here is the use of a "while()" loop and a big "switch()" statement. Then we will decode each operation in every iteration of the loop.  rs=0; ip=ctx->ip; while(!rs) { op=*ip++; switch(op) { ... case OP_ADD_I: c=*ip++; a=*ip++; b=*ip++; ctx->regs[c].i=ctx->regs[a].i+ctx->regs.i; break; ... } }  While this works pretty well, there may be potentially faster options. For example: Each instruction is a small struct containing a function pointer (to the instruction-specific logic), maybe some data, and the function pointer returns the next instruction. Then, we can have an inner interpreter loop something like:  void Ctx_Run(Context *ctx) { cur=ctx->op; while(cur) { cur=cur->fcn(ctx, cur); } }  With opcode functions something like:  Opcode *Op_AddI(Context *ctx, Opcode *op) { ctx->regs[op->c].i=ctx->regs[op->a].i+ctx->regs[op->b].i; return op->next; }  And, may observe potentially faster raw execution speeds than when using a while loop and a switch. We can potentially use a while loop and switch to build these instruction chains from the input bytecode, treating these chains primarily as an intermediate structure. Taken a little further, a person may make another observation: In the vast majority of cases, the operation may simply, naively, and always, return the following instruction. A speedup here can be gained by recognizing these cases, and instead grouping the instructions into "traces", which operate in a loop like the above (each trace then returns the next trace). The trace itself may then contain an unrolled version of the above loop:  Trace *Tr_ExecuteBasic5(Context *ctx, Trace *tr) { Opcode *op; op=tr->op; op=op->fcn(ctx, op); op=op->fcn(ctx, op); op=op->fcn(ctx, op); op=op->fcn(ctx, op); op=op->fcn(ctx, op); return tr->next; } 

## Getting The JITters

The next step, in the quest for high execution speeds, may be going and writing a JIT. This can range from fairly simple to much more complex. A simple strategy is basically to produce "call threaded code", which basically just mean that the "traces" above are simply spit out as a chain of call instructions or similar wrapped in a function call/return sequence. This allows getting a little closer to the execution speeds of native code, without significantly increasing complexity. The next step up from this is to directly handle certain instruction sequences, like instead of emitting a call to 'AddI', directly emitting the machine-code sequence for adding the integers and storing off the results. Then lots of other common operations can be given similar treatment, while maybe still relying on the interpreter for handling more complex operations. Beyond this, a person might start getting into things like register allocation and peephole optimization, ... But, something like this is getting a bit more advanced (There be dragons here...). There are various ways to handle emitting the machine-code sequences, ranging from direct options, like directly emitting byte sequences into a buffer, to slightly more glossing over it (say, using an ASCII "command notation" mixing hex values possibly with shorthand notation for generating other sequences of bytes), to something like using an assembler.

## Interesting Points

I have done a lot of this before, in some cases having good experiences, in other cases having much pain and spending time working on things to have them turn out to be useless, or having things fall apart or turn out badly with little ability to make it better, but in some ways, this is just life sometimes. For example, my first real code generator also became very ambitious, with me trying to write essentially a full featured code generator like one used in a "real" compiler. This however didn't turn out well, and ultimately I was unable to fully debug the thing, and it has since been dropped from the project. However, with a much simpler JIT, I can get speeds not that much drastically slower than native C, and sometimes this is good enough. Do we really need to try to compete performance-wise with optimized C compiler output? Not necessarily.

# Conclusion

Whether or not a person actually goes and does something like this is up to them, it can be a fun or interesting experience, or can potentially literally end up eating years of a person's life, this much is up to them... Not every project or language needs a complex interpreter with a JIT and so on, and sometimes something simpler, like directly parsing or executing commands, or simply parsing and executing a syntax tree, may well turn out to be sufficient for the task at hand.

# Article Update Log

2013-04-04: Writing article

Report Article

## Create an account or sign in to leave a review

You need to be a member in order to leave a review

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account