• Advertisement
Sign in to follow this  

What do you think of my scripting language?

This topic is 4245 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm developing an interpreted programming/scripting language for doing AI and other things. My goal is to have a simple language where I (or other people) can develop tools (like level editors) or do scripting for games. I'm working on the documentation for this language, which I call "Anaphase". Here's the link: www.gameboxsoftware.com/files/Anaphase.zip Do you think the language is too primitive? I want to maximize development time b using it, so I want it to be simple and have things like garbage collection. It's like a combination of C++ and C# but stripped down a lot. And I know what you're thinking: Steph, this is WAY complex. Dude, it'll take you forever to make this! Well I've already made an IDE for it, and I have a few ideas for the compiler (the source code is converted to bytecode, which is faster to interpret). The way i see it, if Python is really successful, maybe I could do something similar and simpler (but this is mostly for me). I guess it could be a great learning/teaching tool too. Oh, and here's the IDE I've been working on. Please note that it doesn't compile yet...obviously. Do you think the direction I'm heading with this is good? www.gameboxsoftware.com/files/Anaphase IDE.zip

Share this post


Link to post
Share on other sites
Advertisement
The repeat keyword is nifty. Other than that, it looks like C# on sleep pills. What use cases were you aiming at when you designed it?

Share this post


Link to post
Share on other sites
Quote:
Original post by ouraqt
And I know what you're thinking: Steph, this is WAY complex. Dude, it'll take you forever to make this!


I think it's quite all right. I've written compiler for a language of similar complexity, and it's taken me about 2 months. If you have a bit of knowledge of how C++ code is compiled, then you'll have no huge problems with all this.

As for your design document, I understand that it is pre-preliminary. You've got to cover a lot of issues before writing the compiler, even before writing parser - all to spot potential logic errors within the language and change its syntax accoordingly.
(are functions virtual? are members hold by reference or by value? are arguments being passed by ref or by value? how are variables declared within function scope? what model of GC are you using? what's the connection between destructors and GC?, etc.)


Also, I didn't like the design decision to access object members via accessor functions only. It seems to me like an overuse of OO, an irritiating feature, expecially that this is supposed to be for scripts, not for any larger projects that could benefit from this tactic, wrt. rationale you've given.

Good luck!
~def

Share this post


Link to post
Share on other sites
Ah, it is preliminary but it's mostly a programming guide, not a technical reference to how the language works underneath the shell.

Also, I just realized that there isn't really any way to implement strings with this. At first I figured I could use character arrays (like in C) but then I found out that there isn't a way to pass an array to a function. Grr...I don't want to implement pointers.

Also, I was thinking about taking out the repeat statement. The same effect can easily be accomplished by using a for loop...what do you think? I don't want redundancies in the language, but I do want convenience.

And to answer more questions...
Functions are not virtual. Everything is by value, not by reference. (references and pointers can get yucky, I'm trying to keep it simple) Variables can be declared anywhere, but will only last until the scope ends (as documented). And by GC, you mean garbage collection, right? Any memory allocated with the new keyword will automatically be freed when the scope that contains the declaration of the variable (not the allocation) ends.

What do you think of the IDE? I guess I should change the font to Courier New..10 pt. Maybe add syntax highlighting if I ever figure out how...

Share this post


Link to post
Share on other sites
I think the repeat keyword is a good idea.


repeat(10)
{
// stuff here
}



mov ecx, 10
loopbegin:
test ecx
jz loopend
; stuff here
dec ecx
jmp loopbegin
loopend:


something like that as opposed to:


for( i = 0; i < 10; i++ )
{
// stuff here
}



mov dword ptr , 0
loopbegin:
mov eax, dword ptr
cmp eax, 10
jge loopend
; stuff here
mov eax, dword ptr
add eax, 1
mov dword ptr , eax
jmp loopbegin
loopend:


it would definitely increase performance if it's used often.
i'm not sure how accurate my asm was-i haven't used it in a while, but you get the general idea.

[Edited by - F-Kop on July 8, 2006 2:33:27 PM]

Share this post


Link to post
Share on other sites
Increased performance? Not likely. Any half-decent compiler would emit assembler that basically is identical to your first version; that second monstrosity (while possibly technically correct) is a lot more complicated than what a modern compiler would generate.

Share this post


Link to post
Share on other sites
I think he was referring to coding efficiency, but I may be mistaken.

And, yes, I have to agree that repeat() is a nice feature. I'd like a way to access which number it's on, though.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Increased performance? Not likely. Any half-decent compiler would emit assembler that basically is identical to your first version; that second monstrosity (while possibly technically correct) is a lot more complicated than what a modern compiler would generate.



for( i = 0; i < 10; i++ )
0041A359 mov dword ptr ,0
0041A360 jmp main+1Bh (41A36Bh)
0041A362 mov eax,dword ptr
0041A365 add eax,1
0041A368 mov dword ptr ,eax
0041A36B cmp dword ptr ,0Ah
0041A36F jae main+23h (41A373h)
{
//
}
0041A371 jmp main+12h (41A362h)


this was compiled and disassembled with VC8. i wasn't exact, but i was pretty close. if an operation has to be done more than once without a counter variable, a repeat keyword would generate fewer assembly instructions, and it would look nicer.

Share this post


Link to post
Share on other sites
Using what compiler settings? I had trouble getting VC8 to not unroll or simplify any of my test loops radically, when using full optimizations.


[edit] I fell back to VC6 because I already had a project set up for something similar. Here's my input code:

	int x;
int i;
for(i = 0; i < 10; ++i)
{
// Foo
cin >> x;
// Blah
}

cout << x << endl;



Stripping out the cin stuff, the generated assembly boils down to:

	mov	esi, 10					; 0000000aH
npad 8
$L8546:
dec esi
jne SHORT $L8546



Well gee, what do you know about that - and that's VC6. I know for a fact VC8's optimizer is even smarter [smile]


[edit 2] Actually... my IA32ASM is rusty, but I believe that npad is excessive as well (I *think* it's related to cin also). Which means the final result, less the "stuff", is all of three instructions. That's actually shorter than your by-hand version by two instructions - and will have significantly less chance of borking branch prediction in the CPU.

[edit 3] It's a compiler directive from VC; just pads with NOPs to ensure the jump address is aligned nicely. So it can indeed be ignored.

Share this post


Link to post
Share on other sites
Funny..this is what I got with VC6:


6: for( i = 0; i < 10; i++ )
00401028 C7 45 FC 00 00 00 00 mov dword ptr [ebp-4],0
0040102F EB 09 jmp main+2Ah (0040103a)
00401031 8B 45 FC mov eax,dword ptr [ebp-4]
00401034 83 C0 01 add eax,1
00401037 89 45 FC mov dword ptr [ebp-4],eax
0040103A 83 7D FC 0A cmp dword ptr [ebp-4],0Ah
0040103E 7D 02 jge main+32h (00401042)
7: {
8: // stuff
9: }
00401040 EB EF jmp main+21h (00401031)

Share this post


Link to post
Share on other sites
All of this ASM is making my brain hurt. I'm not actually developing a compiler/assembler that generates machine code, just one that produces bytecode - which is then interpreted. The bytecode is fairly high-level.

"And, yes, I have to agree that repeat() is a nice feature. I'd like a way to access which number it's on, though."

That's exactly what for loops are for. I have nothing against the concept, but what kind of things would you use a repeat() loop for?

Anyway, I'm still deciding how I'm going to implement strings. Should I use character arrays or a std::string-like class? I'm probably just going to do C-style character arrays and I can write a class to handle all the gritty work (like std::string but implemented in Anaphase).

EDIT: Actually, I'm curious about something. How low-level should I make my bytecode? Should I try to convert all the Anaphase source code into assembly? Or should I try a simpler approach like just removing all the whitespace/comments? Which would be faster (ASM probably)? Also, I guess the bytecode would be simple and language-independent (like MSIL for .NET) if it was just assembly. But it would be a lot of work...

Share this post


Link to post
Share on other sites
Quote:
Original post by ouraqt
And to answer more questions...
Functions are not virtual. Everything is by value, not by reference. (references and pointers can get yucky, I'm trying to keep it simple) Variables can be declared anywhere, but will only last until the scope ends (as documented). And by GC, you mean garbage collection, right? Any memory allocated with the new keyword will automatically be freed when the scope that contains the declaration of the variable (not the allocation) ends.


In other words, there's no GC needed at all.
Honestly, without references (or pointers, whatever), this will be too simple. If it's just for learning, and then you're planning on moving on to implement more complicated features, then that's ok. But if you get down to it, implementing the language, as it is, is getting really simple. You should, IMO, consider at least passing function arguments by reference.

String can be implemented as an intrinsic type. That's how it's done in most cases (in most scripting languages) anyway, so worry not about that.

Oh, and about repeat keword: it's almost purely syntactic, so it can be added anytime in the future, once you get the basic compiler working. But I think it's a neat feature. [smile]

-----

Quote:
Original post by __many_people__
[about repeat, and for, and assembly output]


People, people, don't turn it into optimization wars - again...

Share this post


Link to post
Share on other sites
It sounds like repeat is a good idea.

"String can be implemented as an intrinsic type. That's how it's done in most cases (in most scripting languages) anyway, so worry not about that."

I was considering that, but then I realized that the memory for strings would need to be dynamically allocated...which is fine, I guess, but all the other intrinsic variables are created on the heap. It just doesn't seem consistant to me. ...Would I still need to create a 'char' type, then? I suppose chars could just be strings with a length of 1.

"In other words, there's no GC needed at all.
Honestly, without references (or pointers, whatever), this will be too simple. If it's just for learning, and then you're planning on moving on to implement more complicated features, then that's ok. But if you get down to it, implementing the language, as it is, is getting really simple. You should, IMO, consider at least passing function arguments by reference."

Well GC isn't needed, but it is handy! :) When exactly is GC really needed? I guess it's when the programmer is too lazy to free up his memory (ie. me) or when the language doesn't allow you to free it manually. ...Anyway, why should function arguments be passed by reference?

ALso note that this language will be used for rapid application development, not intricately optimised programs. Kind of a C# thing...

PS: How do I quote someone, like in one of those cool quote boxes? I don't see a button anywhere.

Share this post


Link to post
Share on other sites
Quote:
Original post by ouraqt
I was considering that, but then I realized that the memory for strings would need to be dynamically allocated...which is fine, I guess, but all the other intrinsic variables are created on the heap. It just doesn't seem consistant to me. ...Would I still need to create a 'char' type, then? I suppose chars could just be strings with a length of 1.


Using the heap for a language, that has only variables limited to scope that they were created in, is (simply put) not needed at all. Stack is completely sufficient. Thus irrelevance of GC.

You could implement string as (std::string*), seriously! Managing the internals would be imlemented in your virtual machine. String variables could be kept as std::string* type. I doubt you'd need more than string addition and comparison (no, [] operator is not a must).

Quote:
Original post by ouraqt
When exactly is GC really needed? I guess it's when the programmer is too lazy to free up his memory (ie. me) or when the language doesn't allow you to free it manually.


When the programmer (or the compiler) is not capable of directly controlling the scope of some variable (who's seeing it, how many times it is referenced in the program, what objects keep reference to a specific variable/object and when the last reference vanishes, so it is safe to release (delete) the variable/object). This issues arise all the time when some objects are referenced by other objects created on the heap (their lifetime is independent from direct program flow or how the code is structured). But that doesn't happen if in the language objects cannot hold references to another objects, nor can they be created on the heap (both of those requiments have to be present for GC to be useful).

Quote:
Original post by ouraqt
...Anyway, why should function arguments be passed by reference?


To save space.
To speed up the execution.
To allow the script writer to split the code chunks without changing the script logic (eg. function that can modify an object, and object is passed by reference).

Quote:
Original post by ouraqt
ALso note that this language will be used for rapid application development, not intricately optimised programs.


By intrinsic I didn't mean optimised, but that the operations will be handled directly by the virtual machine, and so they can be implemented directly in the underlying language of the VM implementation (C++ ?), and thus they do not pose any additional requiments on the scripting language design nor its implementation - as you said that you'd have some additional issues with dynamic char arrays, this is not what the script user should be concerned about. This should be built into the language itself.


Quote:
Original post by ouraqt
PS: How do I quote someone, like in one of those cool quote boxes? I don't see a button anywhere.

Use a button in the upper-right corner of each post. Or just use [_quote_]quoted text[_/quote_] tags (without the "_"'s).

Share this post


Link to post
Share on other sites
I think you may be right. Despite the fact that strings are dynamic and could potentially require different amounts of memory during their lifetime, they should be built into the language as intrinsic types. But should I still implement the 'char' type?
Quote:
To allow the script writer to split the code chunks without changing the script logic (eg. function that can modify an object, and object is passed by reference).
That may save a little speed, but it's less object oriented then simply using the functions to modify variables indirectly. Although I guess I should allow it because I want it to be a multi-paradigm language.

Share this post


Link to post
Share on other sites
Quote:
Original post by F-Kop
Funny..this is what I got with VC6:
*snip*



My bad - I've got the VC7 compiler hooked up to my VC6 install (long story). So that was VC7's output.

OK, I'm done hijacking, I promise [smile]

Share this post


Link to post
Share on other sites
Is it better for a scripting language to be compiled in a list of assembly-like instructions (bytecode)? My original design was NOT like this, but now I realize it might be a lot faster (and harder to implement).

Share this post


Link to post
Share on other sites
There's no "best" that works for everyone. What is best for you depends entirely on your goals.

What are you designing this to be used for? How fast does it need to be? How much programming language/compiler technology do you want to learn?


On the one end the scale are purely interpreted languages, where every time the program runs, the interpreter reads the original source code and figures out how to run it. There's nothing at all wrong with this - it's pretty simple to implement, although it has a bit of storage overhead and is usually one of the slower ways to implement a language. However, if you don't need huge amounts of speed (e.g. writing an entire game or something) then it's fine. Purely interpreted languages like QuickBASIC were the bread and butter of many a game programmer back in the early/mid 90's.

The opposite extreme is compiling directly to machine code. More realistically, you'd usually compile to a dialect of assembly, and use an existing assembler program to get machine code. Another popular approach is to compile to C, and use a C compiler to generate the final machine code. (C++ was actually started that way.)


A reasonable middle-of-the-road is to compile to bytecode and write a virtual machine. This is a fantastic exercise as it really helps understand what's going on under the hood of any of your code, and it'll be a good programming experience. However, it's not a small job - be prepared to spend a bit of time and study a lot of things you probably don't work with on a regular basis.

How abstract your bytecode gets is entirely up to you. Personally, I say look for a medium that balances easy implementation of the compiler with easy implementation of the VM; easy compiler implementation is probably the side to favor, because writing a VM is quite simple by comparison.

Share this post


Link to post
Share on other sites
I'm sticking with the compile to bytecode method, and surprisingly, it's going very well. I'm not looking for a tremendous amount of speed, however, I don't want the interpreter/virtual machine to be spending most of its time trying to parse a file.

The bytecode is very language-specific, however. That's not much of a problem though, because I'm only planning to make one scripting/programming language anyway.

Also, if I didn't say it already, I want this to be a general purpose scripting language because I believe in reusable code. Right now I favor rapid application development (it's more fun that way) and so I need a fairly simple language that is still somewhat flexible (but I'm too lazy to go out and learn a scripting language like Python/Lua!?! I must be crazy. I would rather go out and implement my own compiler/interpreter for a language than spend a few hours learning one. w-o-w. Oh well...it's a great learning experience.).

I'm uploading a new version of the design document right now. Tell me if you think this language is suitable for scripting NPC behavior in games as well as developing full-gui applications (rapidly, not really for efficiency). (ex. level editors)

Oh, and this language will have a bunch of intrinsic functions when I complete it, sort of like a standard library (but intrinsic to the language itself). If you've ever used Game Maker before, it will be like GML.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement