Jump to content

  • Log In with Google      Sign In   
  • Create Account

Journal of Aardvajk

Featured

Framework Fun

Posted by , 19 January 2017 - - - - - - · 342 views

About time I posted an update here. Been a while.

 

I have kind of lost a bit of heart in Om, my scripting language, since I realised that circular references lead to memory leaks unless I implement some kind of garbage collection. I've not given up on it - I will indeed try at some point to add GC to Om, but I decided I needed to start a new game project to cheer myself up.

 

Whenever I start a game, I end up copying loads of files from the previous project into the new one. This is a habit I got into maybe ten years ago, when I discovered that trying to maintain my own libraries across several projects at once was a nightmare. I'd modify the library to do something in the current project and from that point on, only the current project would compile ever again :)

 

But, I'm 42 years old, I should be mature enough to be able to maintain my own personal library. So I've been working on creating a game framework, strictly for personal use. I've called it Gx, which stands for, erm, Glorious Xylophones or something.

 

Setting up a library is a good exercise actually. It's making me really think about and improve upon the shared code that all my game projects end up using. The last couple of days, I've been working on my resource mapping classes which I primarily use for managing graphics resources (but there was no reason to not write the map stuff in a more generic way).

 

I have a Gx::GraphicsResource base class that all of the graphics resources use as an interface. It contains methods like release(), reset(), isDeviceBound() and so on which allow us to treat all the resources in a uniform manner during a device reset.

 

Graphics resources in my games end up needing to be stored in such a way that I can iterate over them before and after a device reset. There are also always two types of graphics resource - what I'll tentatively call "global" resources and what we can call "local" resources for the sake of this discussion.

 

For example, I commonly have vertex declarations defined once on startup for the various vertex types I'm using and these are shared across the code. But an Entity may well need to own a vertex buffer of its own, and the buffer's lifetime should match that of the Entity.

 

So the resource map supports two types of storage - one by a std::string url for the global resources and another by using handles which allows us to RAII-away the resources when the owner of the handle goes bye-bye.

 

An improvement I've made while porting all this into Gx is to make the handles more strongly typed, using templates, so that you cannot accidentally use a handle from one map type in another. Previously the handles were just a loose wrapper around an index into the map but now I've also added some type information.

 

We start with Gx::ResourceId. In my previous projects this was just a typedef for unsigned int which meant it had no information about what it was pointing at. Now, we typedef Gx::Index from DWORD (from the Win API) and make Gx::ResourceId a template class.

 

This lives in GxResourceHandle.h where we also define the generic Gx::ResourceHandle and Gx::TypedResourceHandle. The former you cast to type when you access it, the latter binds the type to the handle when it is declared.

#include <GxCore/GxCoreTypes.h>
#include <GxCore/GxSignal.h>

namespace Gx
{

template<class Base> class ResourceMap;

template<class Base> class ResourceId
{
public:
    ResourceId() : id(invalidIndex) { }
    bool valid() const { return id != invalidIndex; }

private:
    friend class ResourceMap<Base>;

    explicit ResourceId(Index id) : id(id) { }

    Index id;
};

template<class Base> class ResourceHandle
{
public:
    ResourceHandle() : id(invalidIndex), map(nullptr) { }
    ~ResourceHandle(){ destroyed(id); }

    template<class T> T &value();
    template<class T> const T &value() const;

    Gx::Signal<Index> destroyed;

private:
    friend class ResourceMap<Base>;

    Index id;
    ResourceMap<Base> *map;
};

template<class Base, class T> class TypedResourceHandle
{
public:
    TypedResourceHandle() : id(invalidIndex), map(nullptr) { }
    ~TypedResourceHandle(){ destroyed(id); }

    T &value();
    const T &value() const;

    Signal<Index> destroyed;

private:
    friend class ResourceMap<Base>;

    Index id;
    ResourceMap<Base> *map;
};

}
Gx::Signal is part of my signals/slots implementation which I wrote an article for the site about a while back. It uses varadic templates to provide a reasonably efficient system for communication between objects that are ignorant of each other's type.

 

The actual implementation of the value() methods is implemented in GxResourceMap.h as they need to access methods of the map but divding the files up this way means I can just use forward-declaration in other headers for classes that own a handle and just include the actual map in the .cpps.

 

We also have Gx::SharedResourceHandle<Base, T> for consistency. This is a very simple, copyable class that you can use to extract out one of the global resources by url. This just allows us to access a global resource using the same handle interface as the others, and binds its type to the handle when declared to avoid accidentally casting the wrong type.

 

template<class Base, class T> class SharedResourceHandle
{
public:
    SharedResourceHandle() : id(invalidIndex), map(nullptr) { }

    T &value();
    const T &value() const;

private:
    friend class ResourceMap<Base>;

    Index id;
    ResourceMap<Base> *map;
};
GxResourceMap.h is a larger file, so I'll just present the interface to the map, then look at some usage code.

 

namespace Gx
{

template<class Base> class ResourceMap
{
public:
    class iterator
    {
    public:
        bool operator==(const iterator &o) const { return v == o.v && i == o.i; }
        bool operator!=(const iterator &o) const { return v != o.v || i != o.i; }

        iterator &operator++(){ i = next(i); return *this; }
        iterator operator++(int){ Index n = i; i = next(i); return iterator(v, n); }

        Base &operator*(){ return *((*v)[i]); }
        Base *operator->(){ return (*v)[i]; }

    private:
        friend class ResourceMap<Base>;

        iterator(PodVector<Base*> *v, Index i) : v(v), i(i) { }

        Index next(Index i){ if(i == v->size()) return i; ++i; while(i < v->size() && !((*v)[i])) ++i; return i; }

        PodVector<Base*> *v;
        Index i;
    };

    class const_iterator
    {
    public:
        bool operator==(const iterator &o) const { return v == o.v && i == o.i; }
        bool operator!=(const iterator &o) const { return v != o.v || i != o.i; }

        const_iterator &operator++(){ i = next(i); return *this; }
        const_iterator operator++(int){ Index n = i; i = next(i); return const_iterator(v, n); }

        const Base &operator*() const { return *((*v)[i]); }
        const Base *operator->() const { return (*v)[i]; }

    private:
        friend class ResourceMap<Base>;

        const_iterator(const PodVector<Base*> *v, Index i) : v(v), i(i) { }

        Index next(Index i){ if(i == v->size()) return i; ++i; while(i < v->size() && !((*v)[i])) ++i; return i; }

        const PodVector<Base*> *v;
        Index i;
    };

    ResourceMap(){ }
    ~ResourceMap();

    template<class T> T &add(const std::string &url, T *p);
    template<class T> T &add(ResourceHandle<Base> &handle, T *p);
    template<class T> T &add(TypedResourceHandle<Base, T> &handle, T *p);

    template<class T> T &get(const ResourceId<Base> &id){ return *static_cast<T*>(resources[id.id]); }
    template<class T> T &get(const std::string &url){ return get<T>(id(url)); }

    ResourceId<Base> id(const std::string &url) const;

    Base *operator[](Index id){ return resources[id]; }

    iterator begin(){ return iterator(&resources, 0); }
    iterator end(){ return iterator(&resources, resources.size()); }

    const_iterator begin() const { return iterator(&resources, 0); }
    const_iterator end() const { return iterator(&resources, resources.size()); }

private:
    Index internalAdd(Base *resource);
    void handleDestroyed(Index id);

    Receiver receiver;

    PodVector<Base*> resources;
    PodVector<Index> free;
    std::map<std::string, Index> mapping;
};
Nothing particularly earth-shattering in the implementations. But now we have the map methods defined, we can implement the value() methods of Gx::ResourceHandle and Gx::TypedResourceHandle:

template<class Base> template<class T> T &ResourceHandle<Base>::value(){ return *(static_cast<T*>((*map)[id])); }
template<class Base> template<class T> const T &ResourceHandle<Base>::value() const { return *(static_cast<const T*>((*map)[id])); }
template<class Base, class T> T &TypedResourceHandle<Base, T>::value(){ return *(static_cast<T*>((*map)[id])); }
template<class Base, class T> const T &TypedResourceHandle<Base, T>::value() const { return *(static_cast<const T*>((*map)[id])); }
template<class Base, class T> T &SharedResourceHandle<Base, T>::value(){ return *(static_cast<T*>((*map)[id])); }
template<class Base, class T> const T &SharedResourceHandle<Base, T>::value() const{ return *(static_cast<const T*>((*map)[id])); }
In a debug mode, we could use dynamic_cast checks here to ensure we have the correct type, but this has never actually been an issue when using this type of map for me, so I'm just using static_cast now.

 

The other point of interest regarding the handles is in the add() methods.

template<class Base> template<class T> T &ResourceMap<Base>::add(const std::string &url, T *p)
{
    Index id = internalAdd(p);
    mapping[url] = id;

    return *p;
}

template<class Base> template<class T> T &ResourceMap<Base>::add(ResourceHandle<Base> &handle, T *p)
{
    handle.id = internalAdd(p);
    handle.map = this;

    receiver.connect(handle.destroyed, this, &ResourceMap<Base>::handleDestroyed);

    return *p;
}

template<class Base> template<class T> T &ResourceMap<Base>::add(TypedResourceHandle<Base, T> &handle, T *p)
{
    handle.id = internalAdd(p);
    handle.map = this;

    receiver.connect(handle.destroyed, this, &ResourceMap<Base>::handleDestroyed);

    return *p;
}
The simplest is the std::string-based url add, that simply uses a std::map<std::string, Gx::Index> to keep the assocaition.

 

The handles are non-copyable, but we pass them by reference into the add methods and we have declared the ResourceMap<Base> as a friend of the handle classes, so it can populate their internals.

 

The Gx::ResourceMap also has a Gx::Receiver, to which we can connect Gx::Signals, so we connect the handles destroyed(Index) signal up so that when the handles go out of scope, the map is informed and can remove the resources they point to.

template<class Base> void ResourceMap<Base>::handleDestroyed(Index id)
{
    delete resources[id];
    resources[id] = 0;

    free.push_back(id);
}
We also have a shared() method to get back a global, url-based resource so we can use it with a consistent handle interface:

template<class Base> template<class T> SharedResourceHandle<Base, T> ResourceMap<Base>::shared(const ResourceId<Base> &id)
{
    SharedResourceHandle<Base, T> handle;
    handle.id = id.id;
    handle.map = this;

    return handle;
}

template<class Base> template<class T> SharedResourceHandle<Base, T> ResourceMap<Base>::shared(const std::string &url)
{
    return shared<T>(id(url));
}
Simple as that. So now we can look at some usage. Gx::Application contains a protected Gx::Graphics object, which is a composition of a Gx::GraphicsDeice and Gx::ResourceMap<Gx::GraphicsResource>. So lets have examples of url, generic resource and typed resource handles.

class Application : public Gx::Application
{
public:
    virtual bool createResources();
    virtual void render(float blend);

    Gx::ResourceHandle<Gx::GraphicsResource> vertexBuffer;
    Gx::TypedResourceHandle<Gx::GraphicsResource, Gx::VertexShader> vertexShader;
    Gx::SharedResourceHandle<Gx::GraphicsResource, Gx::VertexDeclaration> decHandle
};

bool Application::createResources()
{
    graphics.resources.add("colorvertexdec", new Gx::VertexDeclaration(/* ... */)).reset(graphics.device);
    graphics.resources.add(vertexBuffer, new VertexBuffer(/* ... */)).reset(graphics.device);
    graphics.resources.add(vertexShader, new VertexShader(/* ... */)).reset(graphics.device);

    decHandle = graphics.resources.shared("colorvertexdec");
    
    return true;
}

void Application::render(float blend)
{
    graphics.device.setVertexDeclaration(graphics.resources.get<Gx::VertexDelcation>("colorvertexdec"));

    // or

    auto id = graphics.resources.id("colorvertexdec"); // auto -> Gx::ResourceId<Gx::GraphicsResource>
    graphics.device.setVertexDeclaration(graphics.resources.get<Gx::VertexDelcation>(id));
    
    // or
    
    graphics.device.setVertexDeclaration(decHandle.value());
    
    Gx::VertexBuffer &buffer = vertexBuffer.value<Gx::VertexBuffer>(); // generic handle, cast on access

    buffer.begin(D3DLOCK_DISCARD);
    /* ... */
    buffer.end();

    Gx::VertexShader &shader = vertexShader.value(); // typed handle, type encoded in declaration

    Gx::Matrix wvp = /* ... */

    graphics.device.setVertexShader(shader);
    graphics.device.vertexShader().setMatrix("worldviewproj", wvp);
    
    graphics.device.renderTriangleList(buffer);
}
The vertex declaration stays in the map until it is manually removed or the map is destroyed, whereas the buffer and shader have their lifetimes controlled by the owning Application object.

 

Internally then in Gx::Application, we handle device reset and the game loop like this:

int Gx::Application::exec()
{
    ShowWindow(hw, SW_SHOW);

    MSG msg;
    PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE);

    Timer timer;

    const float delta = 1.0f / 60.0f;
    float accumulator = delta;

    while(msg.message != WM_QUIT)
    {
        while(PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }

        if(!graphics.device.isLost())
        {
            if(graphics.device.isReadyToReset())
            {
                for(auto &r: graphics.resources)
                {
                    if(r.isDeviceBound())
                    {
                        r.release();
                    }
                }

                if(!graphics.device.reset())
                {
                    return errorWindow("Unable to reset graphics device");
                }

                for(auto &r: graphics.resources)
                {
                    if(r.isDeviceBound())
                    {
                        if(!r.reset(graphics.device))
                        {
                            return errorWindow("Unable to reset graphics resource");
                        }
                    }
                }

                if(!graphicsDeviceReset())
                {
                    return errorWindow("Unable to reset graphics resources");
                }
            }

            float t = timer.elapsed(Gx::Timer::Option::Restart);

            accumulator += t;

            while(accumulator >= delta)
            {
                update(delta);
                accumulator -= delta;
            }

            graphics.device.begin();
            render(accumulator / delta);
            graphics.device.end();
        }
    }

    return 0;
}
So any resources currently in the map that are device-bound are released and reset at the appropriate time. Otherwise, a standard Gaffer-On-Games. fix-your-timestep game loop calling a couple of virtual methods for the class deriving from Gx::Application to fill in.

 

I'm not actually sure if the generic Gx::ResourceHandle is needed since as far as I can remember, I have always known the type that a handle should point at when I declare the handle, but that's the thing about assembling a library - you tend to start thinking in slightly different ways and YAGNI doesn't apply quite as strongly.

 

Gx is currently composed of:

GxCore
    GxCoreTypes.h
    GxDataStream.h
    GxFlags.h
    GxPodVector.h
    GxPtrVector.h
    GxResourceHandle.h
    GxResourceMap.h
    GxScopedPtr.h
    GxSignal.h
    GxSize.h
    GxStringFormat.h
    GxTimer.cpp
    GxTimer.h
    GxWindows.cpp
    GxWindows.h

GxGraphics
    GxCubeMap.h
    GxDepthStencilSurface.h
    GxDisplaySettings.h
    GxFont.h
    GxGraphics.h
    GxGraphicsBuffer.h
    GxGraphicsDevice.h
    GxGraphicsResource.h
    GxRenderContext.h
    GxShader.h
    GxTexture.h
    GxVertexDeclaration.h
    GxVertexElement.h

GxMaths
    GxMathTypess.h
    GxMatrix.h
    GxQuaternion.h
    GxRay.h
    GxVec2.h
    GxVec3.h

GxApplication
    GxApplication.h
Plan is to add GxPhysics, a wrapper around Bullet that I already have written but will rewrite as part of the port and GxAnimation, my own implemetation of a skeleton animation system closely tied to my 3D model editor (Charm).

 

It is difficult sometimes deciding what should be in the Gx library and what is specific to the game. Plan is, once I have the basics of GxPhysics working in my test driver application, to start the game project and set up the includes to go directly into the source tree of the Gx project, so I can easily change Gx while I am working on the game.

 

Plan is to write a 3D platformer and KEEP IT SIMPLE this time. I always end up trying to write Tomb Raider when I should be trying to write more of a Super Mario 3D style platform game. So I'm going to have a simpler central character - no edge grabbing for now, nice air-controls for cartoon jumping and just try to stay on the straight and narrow and not get over-ambitious.

 

That is all for now. Thanks for stopping by.




Om Module system

Posted by , 09 January 2017 - - - - - - · 630 views

Thought I'd try blogging about this before I started implementing anything to try to get it straight in my head. I've decided to change how the Om module system works a little, based on playing around with writing a driver to allow me to execute Om scripts from within Windows explorer.

 

At the moment, state of play is that you can do this:

 

import print;

print("hello world");
The import statement tells the compiler that the symbol print is to be looked up in the modules list that the engine maintains in its shared state. It doesn't have to be populated until the first time that print is used - it just informs the compiler to emit the OpCode::Type::GetMd instruction when it finds the symbol.

 

Currently, the host application has to call Om::Engine::addModule(name, value) to put the relevant module value into place.

 

What I want to do differently is for the import statement itself to take care of this. I also want two different types of module to be avaialble.

 

I was playing around writing DLLs to support extending Om with native code. It turns out to be pretty easy (using the same toolchain at least) to write a DLL that can expose native code as an Om module. For example:

 

#include <iostream>

#include <OmEngine.h>

Om::Value print(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    for(auto &p: params) std::cout << p.toString().c_str();
    return Om::Value();
}

extern "C" __declspec(dllexport) void omExtend(Om::Engine *engine, Om::Value *result)
{
    engine->addModule("print", engine->makeFunction(print));
    *result = Om::Value();
}
Having built the Om library as a DLL as well, I just get QtCreator to build the above as a DLL, linking to the stub libOm.a. I can then, in a driver program, call LoadLibrary on the DLL, find the omExtend method and call it, passing in a pointer to the engine and a value and I then have the native print function available as a module in the engine.

 

Equally it is possible to write an Om script that returns a value, evaluate this with the engine and add that to the modules list to have it available to other scripts.

 

So what I'm thinking is that I want there to be two forms of the import statement.

 

import print;
import "path/to/print.om";
import "path/to/print.om.dll";
In all cases, the symbol "print" should be given to the compiler as a module symbol to be checked for with OpCode::Type::GetMd when it is encountered in the source.

 

Assuming the file these are in is located in C:/Projects/script.txt, and the Om::Engine has a search paths variable set up as something like "C:/Om/extend;C:/MyOmModules", the first statement should look for the following in the following order:

 

C:/Projects/print.om
C:/Projects/print.om.dll
C:/Om/extend/print.om
C:/Om/extend/print.om.dll
C:/MyOmModules/print.om
C:/MyOmModules/print.om.dll
The second import statement should look in

 

C:/Projects/path/to/print.om
C:/Om/extend/print.om
CL/MyOmModules/print.om
and the third in

 

C:/Projects/path/to/symbol.om.dll
C:/Om/extend/symbol.om.dll
CL/MyOmModules/symbol.om.dll
When the first one is found, it is evaluated in the relevant way and added to the module list.

 

But when should this happen? I think perhaps the first time the module is actually used as this allows us to create some circular relationships that will otherwise be impossible. Obviously this means there could be some unexpectedly long operations occurring in unexpected places, but I don't think it would be a problem.

 

So in both cases, the compiler is going to have to store the argument to import in the text cache and embed the id in with the OpCode::Type::GetMd instruction, along with the id for just the symbol. So the psuedocode for the GetMd will be something like:

 

bool Machine::md(uint path, uint symbol, Om::Result &result)
{
    auto m = state.modules.find(symbol);
    if(m == state.modules.end())
    {
		m = loadModule(state, path);
		if(m == state.modules.end())
		{
	        result = Om::ValueProxy::makeError(state, stringFormat("module not found - ", state.tc.text(id)), mapToLine());
	        return false;
		}
    }

    vs.push_back(m->value);
    inc(state, vs.back());

    return true;
}
Where loadModule looks something like:

 

pod_map<uint, TypedValue>::iterator loadModule(State &state, uint pathId)
{
	pod_string path = state.tc.text(pathId);
	auto s = doSearch(path);
	
	Om::Value v;
	if(s.type == om)
	{
		v = state.engine.evaluate(s.path, Om::Engine::EvaluateType::File);

		if(v.type() == Om::Type::Error) return state.modules.end();
	}
	else if(s.type == dll)
	{
		HMODULE h = LoadLibrary(s.path);
		FARPROC p = GetProcAddress(h, "omExtend"); // check for errors, return state.modules.end()
		
		typedef void(*Func)(Om::Engine*,Om::Value*);
		reinterpret_cast<Func>(p)(&state.engine, &v);
	}

	inc(state, v);
	return state.modules.insert(v);
}
or whatever.

 

This should then mean that, like with C and C++, you can reference both a standard installation directory (maybe stored in an OM environment variable or set by options to the driver program, as well as using a local directory structure for a current project.

 

Can't quite decide if this is the right approach or not so need to ponder a little more. Playing a bit of Skyrim at the moment so that is helping me unwind my mind.




Om - postponing the worry

Posted by , 07 January 2017 - - - - - - · 1,398 views

I've decided to just postpone worrying about cycles and memory leaks in Om for the time being. I may have to address this in the future, but for now I've discovered that there are actually so many things the user can do that cause leaks, infinite loops, stack explosions etc that I cannot efficiently detect or prevent, that the memory leak issue with a cycle is only one small issue.

 

So I'll put this at the back of my mind and continue on with what is otherwise a pretty solid little language.

 

This morning I've been playing around with building Om as a static library and linking it into a standalone test application. The static library is 900 kb in size which is nice and small, and the test application, which just loads a script and prints some output to std::cout at the moment, feels quite nice and tidy. Figuring out the PRO file settings to build the lib was easy enough. If I'm going to start developing Om in this model I'll need a better workflow but for now I'll probably keep developing Om in its test bed context, which is a command line application, then update the file in my lib directory when I hit milestones.

 

Obviously I'm planning to use Om in my games as a scripting backend, but I've also thought I'd like to be able to use it on a day to day basis for sort of batch and general purpose usages, so I think I'm going to put together something that allows me to write an Om script in a text file, then execute it from Windows explorer with a right-click or something, so I'll probably get working on some kind of context program for this. I come across tasks in my day to day work that I could easily benefit from being able to run a quick Om script to solve so be nice to have this working on my computer.

 

As well as the context to execute it, this is going to involve slowly developing an Om standard library for things like file streams and so on. Best if this is just done as I go along and I need things. Perhaps this will form another static library for now that I'll link into my desktop context program. Om was designed to be extensible in this way, so that the library can be developed independently from the core language itself. We shall see how well this works out in practice.

 

I think it unlikely that cycles will be accidentally created in normal use and part of me just wants to say "Don't do this" rather than start to try to develop some kind of complex garbage collection system. I don't see how any system is going to work in all cases anyway and keeping this simple and robust is my chief concern.

 

I'll be updating this post later. Few bits to do this morning and time is a'wasting.




Mission Control - Om has a problem.

Posted by , 05 January 2017 - - - - - - · 1,580 views

I realised today why JavaScript doesn't have deterministic destruction. Thought I was so smart basing Om around it.

Cycles.

If one object A references an object B and B references A, Om will never release the entities when they go out of scope. Memory leaks.

This is a well enough studied problem to assume there is no available solution. So where does this leave Om?

It is hardly worth turning it into a garbage collected language. That defeats the whole point of the language in the first place.

Equally, it is too easy to accidentally create cycles, especially with long chains of objects or lists, to just say "Don't do this in Om or you leak memory". I would lose all heart in the project then.

So we are in a quandary. Maybe have to toast an interesting project goodbye and move on.

What really hurts is whenever I go online to research this, everyone is describing the approach I have been using for the last few years as naive :)

To be fair, most of the recent work on Om has been copy-pasting code from version 3 into version 4, so in a way it is nice to have a new problem to think about. Toying with the idea of a hybrid system that does strict reference counting unless it (somehow) detects cycles then falls back on asynchronous garbage collection. I see other languages have accepted this. A real shame though. But I'm unlikely to be the first living human to find a better solution, frankly. I'm good enough at what I do to be sure of that much.

Any suggestions from the floor?

 

[EDIT] It is even worse than I thought, with lists.

 

var o = [ ];
var n = [ o ];
o.push(n);
When I try to output the state at the end of the program now, we get an infinitely recursive loop going on until Om runs out of stack space.

 

This is pretty much impossible to detect. Easy enough in a silly example like this, but a general solution just doesn't exist I don't think.

 

I'm pretty much starting to lean towards just making it a rule that circular references can cause memory leaks or infinite loops and leaving it at that at the moment. Leaves a bad taste in my mouth but I'm not sure what else I can do.




The Om Programming Language :)

Posted by , 04 January 2017 - - - - - - · 773 views

The Om Programming Language :)

Rather a long journal entry today. I hope that someone sticks with it as I'm really getting quite excited about how my scripting langauge is starting to develop now.

I started trying to write a general overview of Om last night as I've been posting about my scripting language here for a while and never really provided such a thing, but I was quickly overwhelmed. It's really hard to write a concise overview, partly because the language has developed far more features than I realised until I took a step back but also because it is generally hard to know where to start and what order to discuss things in, given the interlated nature of language features.

Om is designed to be a lightweight scripting language that is easy to integrate into existing C++ projects, implemented entirely in standard C++ itself and provide a reasonable level of efficiency of code execution in terms of speed and resource usage. It is not trying to compete with Google V8 and similar. I wouldn't be that foolish. But while the syntax of Om is quite similar to Javascript, there are a number of differences that have motivated the developent of the language in the first place.

Firstly, there is no garbage collection in Om and it offers completely deterministic destruction of objects to enable RAII style coding - something I personally find hard to live without. Complex objects are carefully tracked by reference count and are released at the exact point their reference count reaches zero and, in the case of an Om::Type::Object, an Om::Type::Function can be set up to be called at this point.

Secondly, the syntax for using objects is slightly simpler than Javascript in that there is no need for a new keyword. The closest we come to constructors in Om are free functions that return an instance of an Om::Type::Object.

Finally, extending the language by the provision of native-side code is designed to be extremely simple. There is one Om::Function typedef, defined as:

Om::Value someFunction(Om::Engine &engine, const Om::Value &object, const Om::ValueList &parameters);

This single type of function can be used to provide rich modules of shared native code to be accessed from the script as well as allowing the script to pass pretty much anything back to the host application.

I'm going to focus on Om::Type::Object in this post and gloss over the other details which I hope will be fairly obvious from the example code. The only thing to bear in mind is that Om is entirely dynamically typed, with variables inferring their type from what is assigned to them, carrying their type around along with their value.

Also bear in mind that Om::Type::Functions are entirely first-class entities in Om and can be assigned to variables or passed as parameters as simply as one would pass an Om::Type::Int or any other type.

So to pick a random example, here's a simple example, entirely in script for now, of how one might implement a Person class.

Om::Type::Objects are declared with the syntax { }, and are essentially a string-value mapping with some special properties discussed later on. Unlike Javascript, Om::Type::List, declared with [ ] syntax, is a completely separate type from Om::Type::Object and functions as a resizable, heterogenus array of values.


import print;

var makePerson = func(name, age)
{
    return
    {
        name = name;
        age = age;
    };
};

var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];

for(var p: people)
{
    print(p.name, " is ", p.age, " years old");
}
Okay, so let's now think in a more OO way and make the description a method on a person instead.



import print;

var makePerson = func(name, age)
{
    return
    {
        name = name;
        age = age;

        describe = func
        {
            print(this.name, " is ", this.age, " years old.");
        };
    };
};

var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];

for(var p: people)
{
    p.describe();
}
Om supports prototype-based inheritence in a very simple fashion.



import print;

var proto =
{
    hair = "brown";
};

var makePerson = func(name, age)
{
    return
    {
        prototype = proto;

        name = name;
        age = age;

        describe = func
        {
            print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair.");
        };
    };
};

var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];

people[1].hair = "blonde";

for(var p: people)
{
    p.describe();
}
Note that all instances of the person object now share the "brown" value when we are reading, but when we write the "blonde" value to Eddie before the output loop, Eddie then has his own "hair" property which overrides the one in the prototype. This is a very simple system to implement but extremely flexible.


Note the this.value syntax has to be explicit in Om. The reason is that the function has no idea it is a member function as it is being compiled. Indeed it is quite possible to call the same function once as a method on an object and then again as a free function.


import print;

var proto =
{
    hair = "brown";
};

var makePerson = func(name, age)
{
    return
    {
        prototype = proto;

        name = name;
        age = age;

        describe = func
        {
            print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair.");
        };
    };
};

var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];

people[1].hair = "blonde";

var speak = func(word)
{
    if(this.type == "object")
    {
        print(this.name, " says ", word);
    }
    else
    {
        print(word, " is generally spoken :)");
    }
};

speak("hello");
people[0].prototype.speak = speak;

for(var p: people)
{
    p.describe();
    p.speak("hello");
}
This outputs:



Om: hello is generally spoken :)
Om: Paul is 42 years old and has brown coloured hair.
Om: Paul says hello
Om: Eddie is 23 years old and has blonde coloured hair.
Om: Eddie says hello
Om: Jill is 78 years old and has brown coloured hair.
Om: Jill says hello
Lastly for now, if we assign an Om::Type::Function taking no parameters to an objects destructor property, this will be called when the object is destroyed.



import print;

var proto =
{
    hair = "brown";

    destructor = func { print("goodbye from the prototype"); };
};

var makePerson = func(name, age)
{
    return
    {
        prototype = proto;

        name = name;
        age = age;

        describe = func
        {
            print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair.");
        };

        destructor = func
        {
            print("goodbye from ", this.name);
        };
    };
};

var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];

people[1].hair = "blonde";

var speak = func(word)
{
    if(this.type == "object")
    {
        print(this.name, " says ", word);
    }
    else
    {
        print(word, " is generally spoken :)");
    }
};

speak("hello");
people[0].prototype.speak = speak;

for(var p: people)
{
    p.describe();
    p.speak("hello");
}

people[2] = null;

print("end of program");
The above program will output the following:



Om: hello is generally spoken :)
Om: Paul is 42 years old and has brown coloured hair.
Om: Paul says hello
Om: Eddie is 23 years old and has blonde coloured hair.
Om: Eddie says hello
Om: Jill is 78 years old and has brown coloured hair.
Om: Jill says hello
Om: goodbye from Jill
Om: end of program
Om: goodbye from Paul
Om: goodbye from Eddie
Om: goodbye from the prototype
Note how assigning null to people[2] destroys Jill at that point, since that causes Jill's refernece count to drop to zero.


Om::Type::Object has a built in members property that returns an Om::Type::List of the names of its members. Om::Type::Object supports lookup by both the dot operator and via dynamic text and the subscript operator so you can use these together to implement a form of reflection.


import print;

var o =
{
    name = "Paul";
    age = 41;
    car = "Rover";
};

for(var m: o.members)
{
    print(m, " = ", o[m]);
}
Using the subscript operator is far less efficient than the dot operator so should only be employed when the name of the property is not known. Using the dot operator in the VM equates to doing a binary search for an unsigned integer in a sorted array whereas using the subscript operator requires actual text comparisons at runtime.


Final note on Om::Type::Object is that, like Om::Type::List and Om::Type::String, default copy is by reference.


import print;

var o = { name = "Paul"; };
var c = o;

o.name = "Eddie";

print(c.name);
This will output "Eddie", not "Paul". However, all types support the clone() method so we can explicitly perform a deep copy here instead.



import print;

var o = { name = "Paul"; };
var c = o.clone();

o.name = "Eddie";

print(c.name);
This will output "Paul" as expected.


clone() is supported by every type although does nothing in the case of the value types. Evem constants can use the dot operator in Om and the following is all perfectly legal and well-defined:


import print;

print(10.type); // prints "int"
var n = 10.clone(); // equivalent to var n = 10 :)

var s = "hello".length; /// s = 5

print({ name = "Paul"; age = 42; }.members.length); // prints 2
print({ name = "Eddie"; age = 23; }.members.length.type); // prints "int"
Now we have a bit of an overview of the langauge itself, let's take a look at how the C++ API is used to integrate Om scripting into an existing C++ application.


The two key classes exposed by the API are Om::Engine and Om::Value.


#incude "om/OmEngine.h"

int main()
{
    Om::Engine engine;
    Om::Value v = engine.evaluate("return (1 + 2) * 3;", Om::Engine::EvaluateType::String);
    
    if(v.type() == Om::Type::Error)
    {
        std::cerr << "Error: " << v.toError().text << "\n";
        return -1;
    }

    std::cout << "v is " << v.toInt() << "\n"; // will print "v is 9"
}
When reference types like Om::Type::String or Om::Type::Object are stored in Om::Values, the Om::Value takes care of keeping track of reference counts and so on, seamlessly from the user's point of view.


Om::Value can directly construct value types, but the constructors are marked explicit to avoid accidental conversions.


void f()
{
    Om::Value i(123); // Om::Type::Int
    Om::Value f(12.34f); // Om::Type::Float
    Om::Value b(true); // Om::Type::Bool
}
Reference types have to be generated from the Om::Engine.



void f(Om::Engine &engine)
{
    Om::Value s = engine.makeString("hello");
    
    Om::Value o = engine,makeObject();
    o.setProperty("name", engine.makeString("Paul"));
    o.setProperty("age", Om::Value(42));
}
If we construct an Om::Value with an Om::Type::Function, it is compiled and stored, but not executed until we choose to later on.



int main()
{
    Om::Engine engine;
    Om::Value f = engine.evaluate("return func(a, b){ return a + b; };", Om::Engine::EvaluateType::String);
    
    if(f != Om::Type::Error)
    {
        Om::Value r = engine.execute(f, Om::Value(), { Om::Value(2), Om::Value(3) });
        std::cout << "result " << r.toInt() << "\n"; // prints "result 5"
    }
}
In more detail, the execute method is Om::Value Om::Engine::execute(const Om::Value &function, const Om::Value &object, const Om::ValueList &parameters), allowing you to pass in an optional this-object and a parameter list to the function.


Om provides a simple but flexible mechanism for writing and reusing modular code. There is no preprocessing or file inclusion in Om. The compiler is only ever looking at exactly one source file (or string) at a time.

Om::Engine provides the addModule(const Om::String &id, const Om::Value &value) method. Any type of Om::Value can be added to the modules list and then imported into another script.

For example, all the previous examples begin with import print; As Om is entirely unaware of the context in which it is running, I have set up a simple native-side function to print values to std::cout in the test bed. C++-side, this looks like this:


void out(std::ostream &os, const Om::Value &value)
{
    if(value.type() == Om::Type::List)
    {
        os << "[";
        for(int i = 0; i < value.count(); ++i)
        {
            os << " ";
            out(os, value.property(i));
        }

        os << " ] ";
    }
    else if(value.type() == Om::Type::Data)
    {
        os << value.toData();
    }
    else
    {
        os << value.toString(); // toString() provides a text representation of most types
    }
}

Om::Value printFunc(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    std::cout << "Om: ";

    for(auto p: params)
    {
        out(std::cout, p);
    }

    std::cout << "\n";
    return Om::Value();
}

int main()
{
    Om::Engine engine;
    Om::Engine::OutputFlags flags(Om::Engine::OutputFlag::HideDefinedStrings);

    engine.addModule("print", engine.makeFunction(printFunc));
    
    engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);
}
The import keyword is scope aware and only introduces the symbol into the import's scope.



var n = 20;

if(n > 10)
{
    import print;
    print(n);
}

print("end"); // compile error - print symbol not found
The actual module lookup is peformed at runtime so it is quite possible to compile a function that references modules that habe not yet been added to the engine, as long as they are added before the function is executed. As a result it is possible to create two-way relationships between modules without issues with circular dependancy.


In the print example, the module is simply an Om::Type::Function.

Let's look at a slightly more complex example using the script to define a module instead - a modular reworking of the Person examples above. Firstly we define the Person module in a normal script file:


import print;

return
{
    base =
    {
        hair = "brown";
    };

    make = func(name, age)
    {
        return
        {
            prototype = this.base;

            name = name;
            age = age;

            describe = func
            {
                print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair.");
            };

            destructor = func
            {
                print("goodbye from ", this.name);
            };
        };
    };
};
Note we are returning an Om::Type::Object here, which gives as a place to store our prototype instance as well as the make function. The make function is the Om equivalent of a constructor here.


In the C++ setup, we can simply do:


int main()
{
    Om::Engine engine;
    Om::Engine::OutputFlags flags(Om::Engine::OutputFlag::HideDefinedStrings);

    engine.addModule("print", engine.makeFunction(printFunc));
    engine.addModule("person", engine.evaluate("person.txt", Om::Engine::EvaluateType::File));

    engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);
}
Note in the real world, one would evaluate person.txt into an Om::Value so one could check for compiler errors. The evaluate method will return an Om::Type::Error rather than the object if errors are thrown up by the compiler.


We can now use this module in sample.txt as follows:


import person;

var people = [ person.make("Paul", 42), person.make("Eddie", 23), person.make("Jill", 78) ];

people[1].hair = "blonde";

for(var p: people)
{
    p.describe();
}
Note that there is no need to import print; into sample.txt now as it is not used directly.


It is also possible to extend Om with types implemented in native code. For example, because Om::Type::Strings are immutable, it is not optimal to concatenate lots of strings together in the script as it produces a great deal of temporary values. Much as in other langauges, what we really need is a stringBuilder that can do this kind of concatenation more efficiently. I'll now describe how to create such a facility in native C++ to make available to the scripts.

A special Om::Type provided for use in the C++ API is Om::Type::Data. This allows the user to store a void* pointer in an Om::Value. We can then access this data from the object instance in the usual way and use it to implement custom object types that interface with C++ code.

Our string builder is going to be based on std::ostringstream, so first of all we can define a representation in C++.


class Rep
{
public:
    Rep(){ }
    Rep(const std::string &s){ os << s; }

    std::ostringstream os;
};
Next, we need to provide a function that the script can call to create an instance of the string builder. In this function, we assign the properties of the string builder, using other native functions. I didn't want to provide a void* constructor for Om::Value as that could potentially lead to some dangerous conversions, even with an explicit constructor, so instead there is a static fromData() method instead to make this even more explicit.



Om::Value makeObject(Om::Engine &engine, const std::string &init)
{
    Om::Value o = engine.makeObject();

    o.setProperty("data", Om::Value::fromData(new Rep(init)));

    return o;
}
Now we can add the methods the script needs to be able to call on the string builder, specifically add() and value(). Om::Value provides a convenience template function, toUserType><T>(), to make it slightly more concise to cast back the pointer.



Om::Value add(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    if(params.count() != 1 || params[0].type() != Om::Type::String) return engine.makeError("incorrect parameters");

    Rep *rep = object.property("data").toUserType<Rep>();
    rep->os << params[0].toString().c_str();

    return Om::Value();
}

Om::Value value(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    Rep *rep = object.property("data").toUserType<Rep>();

    return engine.makeString(rep->os.str().c_str());
}

Om::Value makeObject(Om::Engine &engine, const std::string &init)
{
    Om::Value o = engine.makeObject();

    o.setProperty("data", Om::Value::fromData(new Rep(init)));
    o.setProperty("add", engine.makeFunction(add));
    o.setProperty("value", engine.makeFunction(value));

    return o;
}
For the final step, we need to define a destructor for the object so we can clean up the allocated memory. Note we are really just using existing features of the language here rather than having to implement any special functionality.



Om::Value destroy(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    delete object.property("data").toUserType<Rep>();

    return Om::Value();
}

Om::Value makeObject(Om::Engine &engine, const std::string &init)
{
    Om::Value o = engine.makeObject();

    o.setProperty("data", Om::Value::fromData(new Rep(init)));
    o.setProperty("add", engine.makeFunction(add));
    o.setProperty("value", engine.makeFunction(value));

    o.setProperty("destructor", engine.makeFunction(destroy));

    return o;
}
Caution needs to be taken here though. Much like C++'s rule of three (or five), if you are providing a custom destructor and using the Om::Type::Data type, you almost certainly also need to overload clone() or terrible things will happen. The built-in clone() will do a by-value copy of the "data" property, meaning you end up with a double-delete if you clone the object in the script.


Since Om::Type::Objects can override any of the built-in methods with their own properties, we can simply add:


Om::Value clone(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    Rep *rep = object.property("data").toUserType<Rep>();

    return makeObject(engine, rep->os.str());
}

Om::Value makeObject(Om::Engine &engine, const std::string &init)
{
    Om::Value o = engine.makeObject();

    o.setProperty("data", Om::Value::fromData(new Rep(init)));
    o.setProperty("add", engine.makeFunction(add));
    o.setProperty("value", engine.makeFunction(value));

    o.setProperty("destructor", engine.makeFunction(destroy));
    o.setProperty("clone", engine.makeFunction(clone)); // will override the built-in clone() for this object specifically

    return o;
}
Now we are safe to clone() the object inside the script.


We can define all of the above in a cpp file and provide a simple interface function in the header.


Om::Value omStringBuilder(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    return makeObject(engine, "");
}
Then business as usual setting up the module in main().



int main()
{
    Om::Engine engine;

    engine.addModule("print", engine.makeFunction(printFunc));
    engine.addModule("stringBuilder", engine.makeFunction(omStringBuilder));

    engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);
}
Now off we can go into the script and use our custom type:



import print;
import stringBuilder;

var s = stringBuilder();

s.add("one, ");
s.add("two ");
s.add("and three.");

print(s.value()); // prints "one, two and three."
For types that should not be cloned, for eaxmple a wrapper around a file stream or similar, one can instead provide a clone implementation in C++ like this:



Om::Value clone(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params)
{
    return engine.makeError("unable to clone object");
}
Then, in the script, if an attempt is made to clone the object, a runtime error will be generated and the script will exit. The destructors will still be called though so the memory clean up will still take place.


A couple of other snippets to mention. Script functions can be defined to take a variable number of parameters using the following syntax:


var f = func(a, b, c...)
{
};
The ellipses must be attached to the right-most parameter and when calling the function, you must provide values for the normal parameters. Any additional parameters (more than two in this example) are then accessible using the 'c' symbol which will be of Om::Type::List, containing the additinoal parameters.


Better explained with exmaple code here.


import print;

var f = func(a, b, c...)
{
    print(a);
    print(b);
    print(c);
}

f(1, 2); // prints 1, 2, [ ] (an empty list)
f(1, 2, 3, 4); // prints 1, 2, [ 3, 4 ]

var x = func(p...)
{
    for(var i: p) print(i);
};

x(); // prints nothing
x(1, 2, 3, 4); // prints 1 2 3 4
The C++ style ternary operator is supported in Om, as well as short-circuit evalution of and and or. These are of particular use in a dynamically typed language as they can be used to concisely avoid evaluating expressions that would generate a runtime error.



var f = func(a);
{
    print(a.type == "object" ? a.name : "no name");

    if(a.type == "object" and a.name == "Paul") doStuff();
}
In both cases the dot operator would throw a runtime error if the variable was not an object, so avoiding evaluating these is useful.


I think that is enough information for now. If anyone has made it through, thank you for your perserverence and I'm keen to answer any queries anyone may have.

 

[EDIT] I've just finished implementing Om's version of the switch statement, and also more general break statements for early-exiting out of loops, both of which work much as you would expect. Unlike C an C++, Om's switch is never turned into a jump table so while it isn't quite the efficient beast we know and love, it also doesn't have the C++ switch limitations - the switch expression and even the case expressions can be of absolutely any expression type. Fallthrough works the same as in C++ though.

 

Next Time: If anyone is interested, I'll maybe start to lift the lid on how all of this is actually implemented. Here, as a sneak peak, is the current complete OpCode set for the virtual machine, upon which everything above is based. A surprisingly small set of codes I think.

 

namespace OpCode
{

enum class Type
{
    Call,
    Ret,

    Push,
    Pop,
    PopN,
    Peek,

    Jmp,
    JmpT,
    JmpF,

    GetLc,
    PutLc,
    GetMb,
    PutMb,
    GetSc,
    PutSc,
    GetNl,
    PutNl,
    GetMd,

    Math,
    Cmp,
    Una,

    Bool,

    MkEnt,
    AddCh,

    FeChk,
    FeGet,

    Inc,
    Dec,

    Invalid
};

const char *toString(Type type);
const char *parameters(Type type);

enum class Math
{
    Add,
    Sub,
    Mul,
    Div,

    Invalid
};

const char *toString(Math type);

enum class Cmp
{
    Eq,
    Neq,
    Lt,
    LtEq,
    Gt,
    GtEq,

    Invalid
};

const char *toString(Cmp type);

enum class Una
{
    Neg,
    Not,

    Invalid
};

const char *toString(Una type);

template<class T> const char *text(uint id){ return toString(static_cast<T>(id)); }

}








January 2017 »

S M T W T F S
1234567
891011121314
1516171819 20 21
22232425262728
293031    

Recent Entries

Recent Comments

Recent Entries

Recent Comments