Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


over-engineered script interpreter?...

  • You cannot reply to this topic
11 replies to this topic

#1 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 27 March 2013 - 09:20 PM

well, in my case, my scripting VM represents a pretty big part of the 3D engine (both in terms of architecture, and total kloc within the codebase).

 

arguably it is a little over-engineered for a game scripting-language though, where many other projects uses much more minimalist scripting languages, ...

 

 

like, for example, the opcode listing:

http://cr88192.dyndns.org:8080/wiki/index.php/BGBScriptVM_ByteCode

 

and, also, a language spec:

http://cr88192.dyndns.org:8080/SilvVMSpec/2012-12-05_BGBScript16.html

 

 

opcodes are currently numbered up to around 859, which is considerably more than either the JVM or .NET VM, albeit many of these opcodes are a bit more specialized than those in the JVM or .NET VM. sort of like in the JVM, explicit types are often used, but in many cases via the use of type-prefixes (so, the opcode itself often omits the type, more like those in .NET, although the type is still given explicitly in the bytecode).

 

some of this may be because for the most part, the high-level language has been used in interpreters, which costs some in terms of instruction dispatch, often making the use of a larger number of more specialized opcodes preferable to a smaller number of more general ones regarding performance (though, typically the bytecode is not executed directly, but converted into a form of "threaded code", and when used, the JIT basically just spits out a mix of some directly-handled instructions, but mostly a lot of call-threaded code).

 

note: http://en.wikipedia.org/wiki/Threaded_code

 

 

also, the language can be used over a range of styles, for example (using mostly dynamic types):

public class monster_enemyhead extends monster_generic2
{
    public function monster_enemyhead(ent, sent)
    {
        printf("monster_enemyhead: ctor\n");
        super(ent, sent);
    }

    public function think_idle(self)
    {
        if(btRandom()<0.1)
        {
            btSound(self, BT_CHAN_VOICE, self->snd_idle,
                1.0, BT_ATTN_NORM);
        }
    }

    public function think_fire(self)
    {
        var org, dir;
//        org=self->origin;
//        dir=btYawVector(btCurrentYaw(self));
//        dir=BT_TargetDirection(self, self->enemy);
        org=BT_AimOrigin(self);
        dir=BT_AimDirection(self, self->enemy, 600);
//        BT_FireRocket(self, org, dir, 10, 600, 25);
//        BT_FireBlaster(self, org, dir, 10, 600, 25);
        BT_FireRocket(self, org, dir, 60, 600, 160);
    }

    public function init(self)
    {
        printf("monster_enemyhead: init A self=%p\n", self);

        self->solidtype=BT_SOLID_SLIDEBOX;
        self->movetype=BT_MOVE_STEP;
    
        btSetModel(self, "models/monsters/enemyhead/enemyhead.model");
        self->snd_sight="sound/soldier/solsght1";
        self->snd_idle="sound/soldier/solidle1";
    
        self->origin=self->origin + #[0, 0, 256];
    
        self->mins=#[-64, -64, -32];
        self->maxs=#[64, 64, 64];
        self->health=900;

        btFlymonsterStart(self);
    }
}
 

 

and, also in a more traditional statically-typed way (using a more conventional syntax):

package bsvm.util
{
    public class Random extends Object
    {
    private static final long multiplier = 4294967291L;
    private long seed;

    public Random()
        { this(bgbrng_genvalue()); }
    public Random(long seed)
        { setSeed(seed); }

    protected int next(int bits)
    {
        seed=seed*multiplier+1;
        return ((seed>>>(64-bits))&((1L<<bits)-1) as int);
    }

    public double nextDouble()
        { return((next(24) as double)/16777216.0); }
    public float nextFloat()
        { return((next(24) as float)/16777216.0); }

    public double nextGaussian()
        { return(nextDouble()*nextDouble()); }

    public int nextInt()
        { return(next(32)); }
    public long nextLong()
        { return(((next(32) as long)<<32)+next(32)); }

    public void setSeed(long seed2)
        { seed=seed2*multiplier+1; }
    }
}

 

 

and, another fragment using declared types and a different declaration syntax:

function selsort(a:int[], n:int)
{
    var i:int, j:int, k:int;
    
    for(i=0; i<n; i++)
        for(j=i+1; j<n; j++)
            if(a[j]<a[i])
    {
        k=a[i];
        a[i]=a[j];
        a[j]=k;
    }
}
 

 

so, does all this seem a little over-engineered?...


Edited by cr88192, 27 March 2013 - 09:31 PM.


Sponsor:

#2 swiftcoder   Senior Moderators   -  Reputation: 10360

Like
0Likes
Like

Posted 28 March 2013 - 05:27 AM

You've created a scripting language as verbose and hard to read as Java...

 

I think that deserves some kind of award.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#3 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 28 March 2013 - 10:18 AM

You've created a scripting language as verbose and hard to read as Java...

 

I think that deserves some kind of award.

 

some parts of the language design may have been influenced by Java...

there are places though where it differs (both regarding syntax and semantics) from Java though, as well as differences in the language semantics...

 

some parts were also influenced by ActionScript3 and C#, and it also borrows some things from C, ...

 

 

compared with Java: it currently lacks inner-classes (does have nested classes though), anonymous classes, and generics.

but, adds lots of other features: closures, variant types, package-level and top-level functions, typedefs, structs, ex-nihilo objects, ...

...

 

also parsing does not depend on declaration context (unlike C or similar).


Edited by cr88192, 28 March 2013 - 10:46 AM.


#4 swiftcoder   Senior Moderators   -  Reputation: 10360

Like
0Likes
Like

Posted 28 March 2013 - 11:52 AM

My comment wasn't directed towards language features, it was about syntax.

 

Why, why, why would anyone think that the whole public static void main disease was a sensible idea to imitate?


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#5 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 28 March 2013 - 02:45 PM

My comment wasn't directed towards language features, it was about syntax.

Why, why, why would anyone think that the whole public static void main disease was a sensible idea to imitate?


lack of many clearly better alternatives within mainstream OO languages?...

also, some design choices were made on the basis of expected familiarity, as people tend to often not really like "weirdness" within their languages (generally liking things which resemble things they are likely already familiar with).

to some extent, the syntax was designed by evaluating languages as if they "voted" for various syntax choices, and some number of common languages were considered in this process. like, some of the types of languages considered: Java, C#, C, C++, JS, AS3, PHP, ... some of this may have also been guided by language rankings (such as TIOBE), ...

in some cases, choices were arbitrated by personal preference and ease-of-implementation, and some deviations were made to eliminate syntactic ambiguities (such as moving '*' for pointer-types into prefix position, dropping traditional C-style cast syntax in favor of "expr as type" and "expr as! type", ...).


actually, in some cases, the keywords mess gets worse, like when dealing with security or similar...

luckily, there is another language feature for this case:

public static native(C.myapp.my_exp) _setugid _ugid(server) _mode(0x750) ifndef(MYAPP_NOEXPORT) ...
{
void myapp_func1(float x, float y) { ... }
void myapp_func2(string[] argv) { ... }
void myapp_func3(**cchar argv) { ... }
}

where the modifiers can apply to all declarations within the block (may have other effects on executable statements), allowing not having to repeat all of them for every declaration.

Edited by cr88192, 28 March 2013 - 02:50 PM.


#6 Bacterius   Crossbones+   -  Reputation: 9262

Like
0Likes
Like

Posted 29 March 2013 - 01:08 AM

I would've just used Python or something.. I agree it's interesting even though the syntax looks extremely familiar, but creating a huge interpreted scripting language backed up by a VM just looks over-engineered for the comparatively trivial tasks you're going to be using it for. It doesn't look like a scripting language any more. Why should someone writing some AI routines care about whether he is using floats or doubles? This is low-level stuff, far below the concerns of a scripting language. And so on...

 

In all honesty I think you've taken the concept of a scripting language way too far. Just my opinion, though, and I know less than you think I do.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#7 Trienco   Crossbones+   -  Reputation: 2222

Like
0Likes
Like

Posted 29 March 2013 - 01:19 AM

For a really bad "language" I'd offer a custom PDL. It's what you get when the assignment says "implement algorithm x" and instead you a) decide that hard coding the problem domain isn't good enough and b) you start translating common planning problems to your custom "language", which then c) get's silly "features" like types, inheritance, polymorphism, constraints and conditional constraints and an "any"-type (because half the time strict typing is getting in the way). To top things off, you decide your syntax should resemble the symbols used for the theoretical part behind it.

 

 
type disc
type stack
 
atom disc D1 D2 D3 D4
atom stack S1 S2 S3
 
predicate On(disc:a,any:b)
predicate Clear(any:a)
 
op move(disc:x,any:y,any:z)
precond move On(x,y) ^ Clear(x) ^ Clear(z)
del move Clear(z) ^ On(x,y)
add move On(x,z) ^ Clear(y)
 
constraint move ~eq(x,y) ~eq(x,z) ~eq(y,z)
 
condconstraint move all eq(x,D1)=> ~in(z|D1,D2,D3,D4) ~in(y|D1,D2,D3,D4)
condconstraint move all eq(x,D2)=> ~in(z|D2,D3,D4) ~in(y|D2,D3,D4)
condconstraint move all eq(x,D3)=> ~in(z|D3,D4) ~in(y|D3,D4)
condconstraint move all eq(x,D4)=> ~in(z|D4) ~in(y|D4)
 
init On(D4,D3) ^ On(D3,D2) ^ On(D2,D1) ^ On(D1,S1) ^ Clear(D4) ^ Clear(S2) ^ Clear(S3)
goal On(D4,D3) ^ On(D3,D2) ^ On(D2,D1) ^ On(D1,S3)

f@dzhttp://festini.device-zero.de

#8 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 29 March 2013 - 03:09 AM

Bacterius, on 29 Mar 2013 - 02:13, said:
I would've just used Python or something.. I agree it's interesting even though the syntax looks extremely familiar, but creating a huge interpreted scripting language backed up by a VM just looks over-engineered for the comparatively trivial tasks you're going to be using it for. It doesn't look like a scripting language any more. Why should someone writing some AI routines care about whether he is using floats or doubles? This is low-level stuff, far below the concerns of a scripting language. And so on...

In all honesty I think you've taken the concept of a scripting language way too far. Just my opinion, though, and I know less than you think I do.


well, that it is over-engineered is sort of the point of posting about it I guess...

in its original form, it wasn't really intended to be such a beast (it started out much closer to a JavaScript knock-off), but sort of just ended up going this way over the course of a period of years...

my first scripting language was originally Scheme-based, but later imploded, at the time I realized problems due to the language being so different from C, leading both to mental-conversion issues and difficulties moving code fragments between script-code and C.


but, concerns ranging from performance to ease of copy/pasting code to/from other languages to interacting with C to having "commonly expected" features to ... kind of eventually led to this result... (there was little real up-front design either).

like, while not used much for speed critical code, being significantly slower than native can still be a drawback, pushing some in the direction of statically-declared types, ...
(say, while 3x or 5x slower is ok, 100x or 200x slower is pushing it...). ( yes, granted, 3x to 5x wont really happen with a plain interpreter, but type-handling related costs may still make up a big part of the pie, giving static types an advantage performance-wise. )


I guess a mystery here would be how a scripting language could be much simpler while preserving "reasonably good" performance and a similar "look and feel" to mainstream languages.

it probably doesn't really help that the sorts of languages that mostly end up coming to mind are QuakeC and early Java...

and, if one starts off with something Java-like and tries to patch up what parts are most awkward/painful, it isn't entirely clear how it would be that much different, ...
another possible starting point could be GLSL.
...


FWIW: at least WRT implementation size, Python isn't exactly small or lightweight either, CPython is actually a considerably bigger project code-wise.
likewise, it is still tiny vs "real" compilers, like GCC or LLVM/CLang... which still manage to make my VM look tiny.


but, granted, it is "pretty damn big" if compared with something like QuakeC or Doom3 Script or Squirrel...

it is more along similar lines to something like SpiderMonkey...

Edited by cr88192, 29 March 2013 - 03:44 AM.


#9 Krohm   Crossbones+   -  Reputation: 3238

Like
1Likes
Like

Posted 29 March 2013 - 04:15 AM

Sure over engineered. I fully agree. My bests:

  • Strings as a built-in language type with special behaviour.
  • Native C interface. Before having the need to do so.
  • Considerations on malicious code. 
  • "values which are defined to be false"
  • automatic numeric conversions
  • the declarations and statement documentation is a contrived.
  • goto
  • exceptions.
  • shorts? for a scripting language? 8-bit chars? native quaternions? and no matrices?
  • volatile
  • I'm very surprised your VM has PUSH/POP instructions.
  • special call functions for... tail recursion I guess? special PUSH immediate instructons. Conditional jumps. Ops 128-143, PUSH_SV_C, TOSTRING. Everything from JMP_L_FN to ... the end I guess.

What can I say. It clearly looks like you've spent a lot of effort on implementing this. Not as much on considering the implication of your choices!



#10 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 29 March 2013 - 01:25 PM

Sure over engineered. I fully agree. My bests:

  • Strings as a built-in language type with special behaviour.
  • Native C interface. Before having the need to do so.
  • Considerations on malicious code. 
  • "values which are defined to be false"
  • automatic numeric conversions
  • the declarations and statement documentation is a contrived.
  • goto
  • exceptions.
  • shorts? for a scripting language? 8-bit chars? native quaternions? and no matrices?
  • volatile
  • I'm very surprised your VM has PUSH/POP instructions.
  • special call functions for... tail recursion I guess? special PUSH immediate instructons. Conditional jumps. Ops 128-143, PUSH_SV_C, TOSTRING. Everything from JMP_L_FN to ... the end I guess.

What can I say. It clearly looks like you've spent a lot of effort on implementing this. Not as much on considering the implication of your choices!

 

yep.

 

strings are builtin because they are shared with C, and are implemented on the C side as a typed memory blob (string proper) or by pointing into a string table (symbols). some strings may also be UTF-16, but this is less common (most are UTF-8).

 

the native C interface was the main thing prompting me to revive it at one point.

originally, I was like "meh... writing all this glue boilerplate sure does suck..." and ended up mostly just writing everything as C.

then went off and wrote a C compiler, which could directly handle calls to/from native C, and was like "this C compiler sure is slow/awful/buggy/...".

later on, the C compiler was "re-purposed" mostly as a tool for generating glue between native C and my scripting language, problem solved (mostly...).

 

numeric conversions are helpful, and were added in the move to declared types mostly because C had them.

 

'goto' was a "why not?" feature. can't really say it is used. computed goto was mostly because GCC had it.

 

exceptions / classes / ...

at one point, after the great C compiler disaster, I started trying to implement a Java compiler and JVM knock-off.

then, I was left to realize just how awful/broken the Java language and JVM architecture were in some areas, and building my own wouldn't fix it, and was just like "you know what?!" and just back-ported much of the code written to support this machinery (mostly the object system and a few other parts) back to my scripting language.

 

shorts: because C/Java/... had them.

native vectors and quaternions: in most of the script code, this is mostly what is being used.

no native matrices: because they weren't used much in script code. hadn't got around to implementing built-in matrices for script code.

 

PUSH/POP: because it is a stack-machine (PUSH is basically LDC and POP is basically DROP).

 

tail-recursion: backend isn't smart enough to figure this one out on its own.

'PUSH_SV_C': because char and int aren't exactly the same (different tagged references).

'TOSTRING': because it uses a special handler in C parts of the VM, and at the time, adding an opcode was the fastest/cheapest solution.

 

a later solution was the addition of the "UNARYINTRIN_S" and "BINARYINTRIN_S", which are used for a lot of other later-added intrinsics.

 

 

a lot of those later instructions were added mostly to shave off clock-cycles on then common combinations of instructions.

 

like:

JMP_G_LZFN var, addr

 

then you can have your load, compare with constant (0), and jump, all done in a single instruction, and be specialized for fixnum.

this allows more work to be done in C land with fewer iterations via the opcode dispatch loop.

 

but, the FN and FL instructions have largely fallen into disuse (since the move mostly to static types, where XI and XF instructions take up their role).

also, the move to threaded code made a lot of these super-opcodes much less relevant, as prefixes can be used to similar effect (internally, there is still special handling code for each special case though).



#11 Krohm   Crossbones+   -  Reputation: 3238

Like
0Likes
Like

Posted 03 April 2013 - 04:43 AM

No need to justify anything man. Actually, the stuff you write doesn't make much sense to me as I don't know your system.



#12 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 03 April 2013 - 11:34 AM

No need to justify anything man. Actually, the stuff you write doesn't make much sense to me as I don't know your system.


fair enough.

it is a fairly big codebase (previously, the project in whole was 1.25 Mloc, but I dropped a bunch of generally unused code and now it is 671 kloc), of mostly C code.
a lot of this C code also uses dynamic types (as a library feature). the C code and script language more-or-less share the same type-system.

after the shave down, infrastructure+VM was 257 kloc, with 51 kloc as the VM proper. 414 kloc was the 3D engine front-end, which is now dominated by the renderer (~ 160 kloc) and other client-side functionality.

the script code then wraps over some parts of the engine, but generally very thinly (much of the script code is using functions and data-structures declared in C).

Edited by cr88192, 03 April 2013 - 11:35 AM.






PARTNERS