• entries
743
1924
• views
582944

# Om script - template special :)

1419 views

Following on from my last entry where i "discovered" you could use templates to provide a common interface to two unrelated classes, I've now implemented this technique in Om, my scripting compiler and virtual machine and thought I'd just write this up a bit.

Om has various [font='courier new']Om::Value[/font] types, including [font='courier new']Om::String[/font] and [font='courier new']Om::List[/font]. Strings are obviously strings of characters and lists are lists of [font='courier new']Om::Values[/font], and can themselves contain strings and lists and so on. Both are reference-counted entities under the hood.

Their behaviour and use are very different, but the operations you can perform on them have a great deal of overlap. For example, both have a built-in [font='courier new']length[/font] property that returns either the length of the string or the number of values in the list as an [font='courier new']Om::Int[/font]. You can apply [font='courier new']operator[][/font] with an [font='courier new']Om::Int[/font] argument which, for the list, returns the value at that index but for the string returns an [font='courier new']Om::String[/font] containing the single character at that index (char types are not supported by this language).

In the previous incarnation of this project, I wrote all of this twice, once for the list and once for the string. It wasn't that bad, but it always felt like there should be a way to combine the overlap into one set of code.

So in this latest incarnation, using template specialization, that is what I have been able to do. First, these are a rough-out of the [font='courier new']Entity[/font] classes that are used to store the data under the hood. Note that for the purposes of this topic, the fact that both inherit from [font='courier new']Entity[/font] is irrelevant and all of this would work equally well with completely unrelated classes.

namespace Om{class StringEntity : public Entity{public: pod_string text; // pod_string is an internally used string class};class ListEntity : public Entity{public: pod_vector elements; // pod_vector an internal vector class, TypedValue is explained below};}[font='courier new']TypedValue[/font] is a POD-class used to store values internally that has to have its reference-counting manually incremented and decremented. From a user perspective, the [font='courier new']Om::Value[/font] class is used publically, which looks after this automaticaly. But internally we often need to, for example, remove a [font='courier new']TypedValue[/font] from the stack, use it in a sort of "expiring" state, then decrement it when we are done.

[font='courier new']TypedValue[/font] is defined partially like this:

class TypedValue{public: TypedValue() : t(Om::Type::Null) { *reinterpret_cast(d) = 0; } explicit TypedValue(Om::Type type, int v) : t(type) { *reinterpret_cast(d) = v; } explicit TypedValue(Om::Type type, uint v) : t(type) { *reinterpret_cast(d) = v; } explicit TypedValue(Om::Type type, float v) : t(type) { *reinterpret_cast(d) = v; } explicit TypedValue(Om::Type type, bool v) : t(type) { *reinterpret_cast(d) = v ? 1 : 0; } TypedValue(Om::Type type, const char *data) : t(type) { *reinterpret_cast(d) = data ? *reinterpret_cast(data) : 0; } Om::Type realType() const { return t; } Om::Type userType() const { return t; } const char *data() const { return d; } int toInt() const { return *(reinterpret_cast(d)); } uint toUint() const { return *(reinterpret_cast(d)); } float toFloat() const { return *(reinterpret_cast(d)); } bool toBool() const { return *(reinterpret_cast(d)) ? true : false; }};So you need to use it carefully and ensure that you haev checked the type before you call the [font='courier new']toWhatever()[/font] methods and so on, unlike [font='courier new']Om::Value[/font] that the user uses, which is checked.

Looking at the [font='courier new']length[/font] property first, when the compiler builds a dot-syntax node, it emits instructions to push the target onto the stack, the text ID of the right-hand-side (stored in a reference-counted text cache) and then either a [font='courier new']GetMb[/font] or [font='courier new']PutMb[/font] instruction depending on if we are reading or writing.

So if we have:
var o = "hello";out o.length;We end up with something like:

Abstract Syntax Tree:
block var [o] string [hello] out dot [length] symbol [o] out newlineGenerated Virtual Machine Code
 0: MkEnt string 4 6: GetLc 211: GetMb 216: Out17: OutNl18: PopN 223: RetHere, [font='courier new']length[/font] has an id of 2 in the text cache, so when [font='courier new']GetMb[/font] is called, the left-side of the dot is on top of the stack, and the [font='courier new']length[/font] text id is encoded into the [font='courier new']GetMb[/font] instruction.

To unify the code to be able to have a template function work on either a [font='courier new']StringEntity[/font] or a [font='courier new']ListEntity[/font], we first need a common interface for the two types. I took the generic name [font='courier new']Sequence[/font] to use to represent this and wrote the following header:

template class Sequence { };template<> class Sequence{public: Sequence(StringEntity &e) : e(e) { } int length() const { return static_cast(e.text.length()); } TypedValue get(State &state, int index) const { TypedValue v(Om::Type::String, state.allocate()); StringEntity &c = state.entity(v.toUint()); c.text.append(e.text[index]); return v; } void set(State &state, int index, const TypedValue &v, Om::Value &result) { pod_string c = state.entity(v.toUint()).text; if(c.length() > 1) { result = Om::ValueProxy::makeError(state, stringFormat("cannot assign multiple characters via subsscript - ", c), 0); return; } e.text[index] = c[0]; }private: StringEntity &e;};template<> class Sequence{public: Sequence(ListEntity &e) : e(e) { } int length() const { return static_cast(e.elements.size()); } TypedValue get(State &state, int index) { return e.elements[index]; } void set(State &state, int index, const TypedValue &v, Om::Value &result) { if(!dec(state, e.elements[index], result)) return; e.elements[index] = v; inc(state, v); }private: ListEntity &e;};The [font='courier new']length[/font], [font='courier new']get[/font] and [font='courier new']set[/font] methods are what we need to implement the current operations and are implemented very differently as you can see. For example, the list needs to increment the reference count of its value when adding it to the list, whereas the string has to construct a new [font='courier new']StringEntity[/font] to return the character at the given index.

But having these written, we can now express operations in a function templated on the entity type.

In the [font='courier new']Machine[/font] class, executing a [font='courier new']GetMb[/font] or [font='courier new']SetMb[/font] instruction is handled by first looking for the built-in methods.

InternalMethod findInternalMethod(Om::Type type, uint id){ TRACE; InternalMethod m; switch(type) { case Om::Type::String: m = sequence_method(id); break; case Om::Type::List: m = sequence_method(id); break; default: break; } if(m.valid()) return m; // snip, handle other things return InternalMethod();}So here, if we have a string or a list, the first thing we do is call [font='courier new']sequence_method[/font], templated on the correct type and the compiler generates a version of [font='courier new']sequence_method[/font] for each type for us. We only have to write it once:

template void sequence_length(State &state, const TypedValue &v, Stack &vs, Om::Value &result){ TRACE; vs.push_back(TypedValue(Om::Type::Int, static_cast(Sequence(state.entity(v.toUint())).length())));}template InternalMethod sequence_method(uint id){ TRACE; switch(id) { case DefinedStrings::Length: return Properties::sequence_length; default: break; } return InternalMethod();}Note how the [font='courier new']Sequence[/font] interface is then used to extract the [font='courier new']length[/font] from the entity.

Similarly, with [font='courier new']operator[][/font] on a string or a list, the compiler will push the target onto the stack, then the expression inside the square brackets, then call [font='courier new']GetSc[/font] or [font='courier new']PutSc[/font]. We translate this into a call to the template method [font='courier new']rangeOp[/font].

bool Machine::sc(AccessType type, Om::Value &result){ TRACE; TypedValue v = vs.pop_back(); TypedValue o = vs.pop_back(); TypedValueGuard guard({ v, o }); // snip, handle other uses switch(o.realType()) { case Om::Type::String: rangeOp(state, type, vs, o, v, result); break; case Om::Type::List: rangeOp(state, type, vs, o, v, result); break; default: result = Om::ValueProxy::makeError(state, stringFormat("subscript applied to invalid type - ", Om::typeToString(o.userType())), mapToLine()); } return guard.release(state, result);}The [font='courier new']rangeOp[/font] method looks like this:

template void rangeOp(State &state, Machine::AccessType type, Stack &vs, const TypedValue &o, const TypedValue &v, Om::Value &result){ Sequence seq(state.entity(o.toUint())); process(state, type, seq, v, vs, result);}We construct a local [font='courier new']Sequence[/font] so we can pass it by non-const reference into the guts of the system, then we call a template method to implement the reading or writing using [font='courier new']operator[][/font], all expressed via the [font='courier new']Sequence[/font] interface instead of [font='courier new']StringEntity[/font] or [font='courier new']ListEntity[/font].

template void process(State &state, Machine::AccessType type, Sequence &s, const TypedValue &v, Stack &vs, Om::Value &result){ if(v.realType() != Om::Type::Int) { result = Om::ValueProxy::makeError(state, stringFormat("subscript expression of invalid type - ", Om::typeToString(v.userType())), 0); return; } int index = v.toUint(); if(index < 0 || index >= s.length()) { result = Om::ValueProxy::makeError(state, stringFormat("subscript expression out of range - ", index), 0); return; } if(type == Machine::AccessType::Read) { vs.push_back(s.get(state, index)); inc(state, vs.back()); } else if(type == Machine::AccessType::Write) { TypedValue c = vs.pop_back(); s.set(state, index, c, result); dec(state, c, result); vs.push_back(v); }}Again, the two versions the C++ compiler will generate here are quite different, but we have been able to express it once in the same way.

As I now add more built-in methods and properties to strings and lists, I can now easily decide whether each new addition shoudl be just for a string, just for a list or for both and implement it in the relevant place, just once, for it to work as I wish. I can also extend the [font='courier new']Sequence[/font] interface if I need to support any other operations, for example removing items, clearing items out etc.

In the unlikely event that some other kind of sequence container is added to the language, again I can just specialise Sequence for that container and it will all just snap into place.

In summary, I'm sure this is well studied and knowm but I had never before thought about using template specialisation to create a common interface for unrelated classes and, for this particular situation where code size is pretty much not a consideration but efficiency and maintainability are key, it has turned out to be a success.

There is a certain amount of boiler-plate needed at the start but once this is done, this approach seems to extend very nicely and prevents a great deal of duplication of code that, previously, had felt icky and wrong to me.

So there you go. Slightly longer entry this time :) In the unlikely event that anyone has made it this far and is still reading, I'd like to wish you an excellent Christmas if you are into that sort of thing, and all the best for the new year. Thanks for stopping by.

There are no comments to display.

## Create an account

Register a new account