JIT compiled scripting language features

Started by
3 comments, last by RAZORUNREAL 18 years, 3 months ago
Hi, I'm working on a JIT compiled scriptingish language thing, and it's got to the stage where need to think about what sort of language it's actually going to be. I'm not sure if it's technically a scripting language, but it's meant to be embedded so it might count... Anyway, I wanted to get some feedback on a few language features. Please bear with me, I havn't had a formal education in this sort of thing yet, so I don't know the terminology and some of my ideas might be naive. I'll endeavor to explain exactly how I plan on doing things, partly because I'm not sure how to describe the features any other way, and partly because I want feedback on the syntax etc. too. Firstly, lets say it's a bit like C or java. Just saying that so you know it's not like some of the more (and even some of the less) obscure languages out there. 1. Any number of named returns I'm thinking of something like this:

(int return1, int return2)Foo(int param1, int param2)
Rather than using eax for return values, which only works for things 32bit or less anyway, I thought I could pass them by reference. And I might as well pass the parameters by const reference while I'm about it, and throw pass by value out the window. So, that would make the above equivelant to the c++ snippet:
void Foo(int& return1, int& return2, const int& param1, const int& param2)
2. All variables must be initialised I can't think of a time where saying a variable = 0 when you create it isn't possible, and uninitialised variables can cause bugs. And honestly, I think leaving a variable uninitialised is a pretty pointless micro optimisation. Especially when my language is already overkill, and is only going to get more so because I can compile it to the system, using things like sse if available. But I'm getting off topic. The real point is it makes feature 3 easier. 3. Type inference/template replacement Basicly, a var type that can be anything. This could cause alot of pain, and for all I know it's impossible, but it would just be so handy. It goes like this:

(var result)Square(var num)
{
   result = num * num;
}
Basicly, the compiler generates a new function each time it encounters a call using a different type. Pretty much like a template function, only not inlined. Pretty much like this template function in fact:

template<class T, class U>
void Square(T& result, const U& num)
{
   result = num * num;
}
Of course you could end up generating huge amounts of code without even noticing, but that's really the point in it anyway... One thing that you can't normally do is:

var someVar = templateFunc(someOtherVar);
Forcing variable initialisation means you can't give a var different types based on some run time condition, which would make my head explode. If you can think of any flaws with all this I'd be very grateful. 4. Custom operators The language should be flexible enough that the program it's embedded into can create new operators etc. But what I'm not sure about is if you should be able to in code... Probably not, given the purpose of the language. Anyway, it would look like this:

operator (vector result)X(vector lhs, vector rhs) : + // Same precedence as addition.
{
   // You all know how to perform a cross product right?
}

vector C = A X B;
For ease of parsing I'd require a space between operators and operands. I was thinking precedence would be set in relation to other operators, rather than a flat value. Like the above operator would have the same precedence as addition. It might also be useful to say an operator has precedence 1 lower than some existing operator. And perhaps if you don't provide any sort of comparison it has the highest precedence. That must be about enough for one day. If this goes well I might be back later with some more unusual ideas, but this post is getting long. So please, any and all feedback. Even if it's just to say that none of this is new (probably true) or it's badly thought out (wouldn't be surprised) or impossible to implement (why please) or X language is better in every way.
___________________________________________________David OlsenIf I've helped you, please vote for PigeonGrape!
Advertisement
Quote:Original post by RAZORUNREAL

1. Any number of named returns
I'm thinking of something like this:
(int return1, int return2)Foo(int param1, int param2)

Rather than using eax for return values, which only works for things 32bit or less anyway, I thought I could pass them by reference. And I might as well pass the parameters by const reference while I'm about it, and throw pass by value out the window. So, that would make the above equivelant to the c++ snippet:
void Foo(int& return1, int& return2, const int& param1, const int& param2)



You could add a built in notion of tuples, along with pattern matching syntax. Check out ML, it handles this rather well.

Quote:
2. All variables must be initialised
I can't think of a time where saying a variable = 0 when you create it isn't possible, and uninitialised variables can cause bugs. And honestly, I think leaving a variable uninitialised is a pretty pointless micro optimisation. Especially when my language is already overkill, and is only going to get more so because I can compile it to the system, using things like sse if available. But I'm getting off topic. The real point is it makes feature 3 easier.


Right, a variable is NEVER 'uninitialized', because that sort of behavior is absolute nonsense. I would also recommend making your language single assignment ... that is, variables cannot be rebound to values. If you say 'var x = 3', then in that lexical scope, you cannot rebind x. This sort of behavior does not preclude mutation (although the mechanism for expressing mutation in this sort of system is a bit untraditional), but it does make the language clearer to both humans (in my opinion) and machines. Things are certainly much easier for the compiler and garbage collector.

Quote:
3. Type inference/template replacement
Basicly, a var type that can be anything. This could cause alot of pain, and for all I know it's impossible, but it would just be so handy. It goes like this:
(var result)Square(var num){   result = num * num;}

Basicly, the compiler generates a new function each time it encounters a call using a different type. Pretty much like a template function, only not inlined. Pretty much like this template function in fact:
template<class T, class U>void Square(T& result, const U& num){   result = num * num;}

Of course you could end up generating huge amounts of code without even noticing, but that's really the point in it anyway... One thing that you can't normally do is:
var someVar = templateFunc(someOtherVar);

Forcing variable initialisation means you can't give a var different types based on some run time condition, which would make my head explode. If you can think of any flaws with all this I'd be very grateful.

I recommend you investigate some existing type systems here, I can personally recommend ML (again!). Yes, type inference is very possible and reduces the pedanticism of the language, which in this case is nice, compared to 'VarTypeName varName = new TypeConstructor()'. Just my personal whim though.

Quote:
4. Custom operators
The language should be flexible enough that the program it's embedded into can create new operators etc. But what I'm not sure about is if you should be able to in code... Probably not, given the purpose of the language. Anyway, it would look like this:
operator (vector result)X(vector lhs, vector rhs) : + // Same precedence as addition.{   // You all know how to perform a cross product right?}vector C = A X B;

For ease of parsing I'd require a space between operators and operands. I was thinking precedence would be set in relation to other operators, rather than a flat value. Like the above operator would have the same precedence as addition. It might also be useful to say an operator has precedence 1 lower than some existing operator. And perhaps if you don't provide any sort of comparison it has the highest precedence.


Ehh, not my cup of tea. I can't imagine why this would actually be beneficial. But hey, it's your language, go with whatever syntax you think will be best for you.

And on a side note, don't get too caught up in implementation details. Nobody should design a language thinking 'man, I'm really going to break out of the mold, I am not going to returh values in the eax register!'. Heres my recommended route to success (or most likely, mediocrity):

a) Pick a SIMPLE syntax and some language features, and implement a simple interpreter (in a language with a garbage collector, for starts). Refine the semantics of the language first ... figure out what defining features exist, what sort of infrastructure you desire, etc, and play around with it a bit.

b) Add your convoluted syntax on top of this core language, pile on any unneccessary features, etc. Basically, do what you can to add gizmos without concern for the well-being of the developers. Extend your interpreter to support this.

c) Start thinking about the ideal machine to run programs written in your language. Keep the instruction set simple for the time being. Write an interpreter for this machine, and come up with some magical code transformations to compile your language to this machine. It may be beneficial to define some intermediate format and have multiple passes ... whatever works for you. Ohh, and by now you are going to have to think about other runtime services ... such as your library and your garbage collector. And use a bytecode format to store programs in the artificial machine language , it makes things pretty dense and trivial to parse.

d) If you made your machine low level enough, you may be able to establish an isomorphism with whatever ISA you want to JIT compile to. Even if you don't have that, compiling from one machine language to another isn't a huge deal (assuming you created a stack machine with a small number of registers).

e) Optimize now, at all levels. You will most likely find the biggest benefits are in optimizing the high-level compiler, this is where a few intelligent optimizations can yield orders of magnitude improvements.

f) Take a vacation.
Quote:Original post by The Reindeer Effect
[...]I would also recommend making your language single assignment ... that is, variables cannot be rebound to values. If you say 'var x = 3', then in that lexical scope, you cannot rebind x. This sort of behavior does not preclude mutation (although the mechanism for expressing mutation in this sort of system is a bit untraditional), but it does make the language clearer to both humans (in my opinion) and machines. Things are certainly much easier for the compiler and garbage collector.[...]
I have to strongly disagree. Most people have a hard enough time coming up with 1 good name, you certainly don't want to force inexperienced programmers (generally the target of 'scripting languages' meant to make a program extensible) to come up with more than 1 name for something.

With this kind of requirement, you'll see tons of scripts that have variables suffixed with a number (DeltaAngle, DeltaAngle2, DeltaAngle3). This kind of 'coding practice' will lead to many, many errors that are difficult to debug because people will forget the suffix or use the wrong one (especially if they add an extra step in their calculation - the state of things will revert to the days of required line numbering in basic where programmers would increment the number by 10 or 100 to have room for adjustments, and remebering that the proper sequence is [DeltaAngle100, DeltaAngle113, DeltaAngle125, DeltaAngle200] after a few algorithm modifications isn't easy)

You can easily make the compiler internally rename a variable each time it is assigned to in order to get all the same benefits in the scripting system, and there is no reason to make the human do the work. Even the processor itself does a form of this optimization with register renaming.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Extrarius
Quote:Original post by The Reindeer Effect
[...]I would also recommend making your language single assignment ... that is, variables cannot be rebound to values. If you say 'var x = 3', then in that lexical scope, you cannot rebind x. This sort of behavior does not preclude mutation (although the mechanism for expressing mutation in this sort of system is a bit untraditional), but it does make the language clearer to both humans (in my opinion) and machines. Things are certainly much easier for the compiler and garbage collector.[...]
I have to strongly disagree. Most people have a hard enough time coming up with 1 good name, you certainly don't want to force inexperienced programmers (generally the target of 'scripting languages' meant to make a program extensible) to come up with more than 1 name for something.

With this kind of requirement, you'll see tons of scripts that have variables suffixed with a number (DeltaAngle, DeltaAngle2, DeltaAngle3). This kind of 'coding practice' will lead to many, many errors that are difficult to debug because people will forget the suffix or use the wrong one (especially if they add an extra step in their calculation - the state of things will revert to the days of required line numbering in basic where programmers would increment the number by 10 or 100 to have room for adjustments, and remebering that the proper sequence is [DeltaAngle100, DeltaAngle113, DeltaAngle125, DeltaAngle200] after a few algorithm modifications isn't easy)

You can easily make the compiler internally rename a variable each time it is assigned to in order to get all the same benefits in the scripting system, and there is no reason to make the human do the work. Even the processor itself does a form of this optimization with register renaming.

The way I read it it sounded like he was proposing going with a purely functional design. Of course, I could be completely wrong.
Quote:Original post by The Reindeer Effect
You could add a built in notion of tuples, along with pattern matching syntax. Check out ML, it handles this rather well.


Ok, I'll check it out.

Quote:Original post by The Reindeer Effect
Right, a variable is NEVER 'uninitialized', because that sort of behavior is absolute nonsense.


Do you know how sarcastic that sounds?

Quote:Original post by The Reindeer Effect
I would also recommend making your language single assignment ... that is, variables cannot be rebound to values. If you say 'var x = 3', then in that lexical scope, you cannot rebind x. This sort of behavior does not preclude mutation (although the mechanism for expressing mutation in this sort of system is a bit untraditional), but it does make the language clearer to both humans (in my opinion) and machines. Things are certainly much easier for the compiler and garbage collector.


Hmm... That would wreak havoc with the register allocator. I'd prefer not.

Quote:Original post by The Reindeer Effect
I recommend you investigate some existing type systems here, I can personally recommend ML (again!). Yes, type inference is very possible and reduces the pedanticism of the language, which in this case is nice, compared to 'VarTypeName varName = new TypeConstructor()'. Just my personal whim though.


You like ML don't you? But yea, best to read up on it before jumping in the deep end.

And I've decided to throw out the custom operator thing. I like the idea of making things look like built in types, but given the purpose of this language it would be a waste of time. It's embedded, if you want another type you can add one.

Quote:Original post by The Reindeer Effect
Nobody should design a language thinking 'man, I'm really going to break out of the mold, I am not going to returh values in the eax register!'.


Um. That was not the thinking behind that idea. Look, I have to interface with c++, because that's the language it's intended to be embedded in. Returning things in eax is the obvious thing to do, because then the languages can call each others functions, plain and simple. But c++ doesn't return everything in eax. If it's 64bit, it's returned in a pair of registers. If it's floating point it's returned in st0. I don't want to go there. So in actual fact it's a technical consideration.

Quote:Original post by The Reindeer Effect
a) Pick a SIMPLE syntax and some language features, and implement a simple interpreter (in a language with a garbage collector, for starts). Refine the semantics of the language first ... figure out what defining features exist, what sort of infrastructure you desire, etc, and play around with it a bit.

b) Add your convoluted syntax on top of this core language, pile on any unneccessary features, etc. Basically, do what you can to add gizmos without concern for the well-being of the developers. Extend your interpreter to support this.

c) Start thinking about the ideal machine to run programs written in your language. Keep the instruction set simple for the time being. Write an interpreter for this machine, and come up with some magical code transformations to compile your language to this machine. It may be beneficial to define some intermediate format and have multiple passes ... whatever works for you. Ohh, and by now you are going to have to think about other runtime services ... such as your library and your garbage collector. And use a bytecode format to store programs in the artificial machine language , it makes things pretty dense and trivial to parse.

d) If you made your machine low level enough, you may be able to establish an isomorphism with whatever ISA you want to JIT compile to. Even if you don't have that, compiling from one machine language to another isn't a huge deal (assuming you created a stack machine with a small number of registers).

e) Optimize now, at all levels. You will most likely find the biggest benefits are in optimizing the high-level compiler, this is where a few intelligent optimizations can yield orders of magnitude improvements.

f) Take a vacation.


Hmm. Perhaps I should have explained where I'm coming from. I don't know a whole lot about language design. But I'm not just talking features without even having started. I have a (currently stand alone) thick-as-two-short-planks compiler that spits out some bytecode. This is then fed into the backend, which translates it to assembly pretty well. Softwire does the last bit. It works quite well. The backend does anyway, I'm not to fond of the compiler. Anyway, I thought now, just perhaps, I could talk language features. It being the only part I need help with and all. You have to at some point right? And I thought after I'd got it going but before it became a useful language would be a good time...

Still, you're probably right on some counts.
___________________________________________________David OlsenIf I've helped you, please vote for PigeonGrape!

This topic is closed to new replies.

Advertisement