Scripting Language - Is This Efficient?

Started by
7 comments, last by GameDev.net 18 years, 6 months ago
I have made a very basic scripting language, and, so far, I am really happy with it. The only thing I have to wander about is, is it programmed efficiently? What I did was loop through the script, and if it detects a character, it checks ahead to see if the character begins a part of a command. If so, it executes the command, checking parameters and such. If it sees "END" then it stops looping. Otherwise, it will have an overflow. I know it isn't much now, but I'm going to add error handling and such, maybe even a syntax checker. I want this to be a good learning experience, and I want to learn to write code efficiently and effectively. So, is it a good way?
The best thing to do is just choose whatever you think you'd prefer, and go for it. -Promit
Advertisement
while that method does have advantages (possibility of editing code on the fly) it is far from efficient or fast.

you'd be better off translating/compiling the script into "bytecode" then executing the bytecode using a switch or something. this doesn't mean you have to create a seperate compiling program, you could convert it to bytecode when the script is loaded (like python) so its transparent to the script writer. this would still allow possibilites of editing code on the fly, but would require a little more work to get it to do so.

the approach i recommended is used by things such as java and python. it is the fastest way short of translating it to a native language.
Boris The Sneaky Russian
You mean, taking the code and translating it into smaller pieces so it isn't so huge? Such as, taking "END" and translating it to "E"?

EDIT: Ok, looked it up, and it doesn't seem so easy, especially with functions that have parameters, such as my "PRINT()" command.

[Edited by - orcfan32 on September 24, 2005 9:41:13 AM]
The best thing to do is just choose whatever you think you'd prefer, and go for it. -Promit
Quote:Original post by orcfan32
You mean, taking the code and translating it into smaller pieces so it isn't so huge? Such as, taking "END" and translating it to "E"?

EDIT: Ok, looked it up, and it doesn't seem so easy, especially with functions that have parameters, such as my "PRINT(strTextToPrint)" command.

If you look up Assembly, that gets translated into (multi)byte code, also known as machine code, which the CPU can then execute. You want to do the same, translating your code into something that is as small as possible while still retaining all the information, so it executes faster. Like assigning the byte value 0x01 to the END command. That's 2 bytes you've saved right there. And the more you do of this, the more compact your code is. Oh, BTW, if you aren't doing it already, and your scripts aren't thousands of lines long, you should cache the entire script in memory, and read it from there, as it's a huge amount faster then reading from the disk one character at a time, that being about the slowest you could possible do things.
Free speech for the living, dead men tell no tales,Your laughing finger will never point again...Omerta!Sing for me now!
So I should use a pointer to store the script in memory AND "dissassemble" it into smaller bytes and execute it from there. Can you explain why memory allows you to read faster?
The best thing to do is just choose whatever you think you'd prefer, and go for it. -Promit
Quote:Original post by orcfan32
So I should use a pointer to store the script in memory AND "dissassemble" it into smaller bytes and execute it from there. Can you explain why memory allows you to read faster?
Yes. Memory is far faster than reading from disk. If you read your script from disk each time, then the hard drive has to seek around the disk to read the file, which is slow. Actually, the slowest part of reading a file is opening it. That can easily take several 10's of milliseconds, which is a hell of a lot. You could read the whole thing from memory in a fraction of a millisecond.

I played around with making a scripting language once, and I translated it into bytecode. First, I parsed each word into tokens (End is a keyword, "Blah" is a variable with the type of string and the value of "Blah", etc etc). Then I checked that the syntax was correct, and then I assembled it into bytecode. For instance, the Print function had the value 0x04 I think, and if you did Print "foo: "+5 then it'd get compiled into push foo, push 5, add, print which would get translated into something like "\x01foo\0\x02\x00\x00\x00\x05\x0a\x04". Which will be a lot faster to parse.

Note: In the above example, 0x01 is the code for "push string", followed by the null terminated string, 0x02 is the code for "push integer", followed by the number (4 bytes), 0x0a is the code for "add the top two values on the stack and push the result", and 0x04 is the code for "print whatever's at the top of the stack"
Are you finding the right command just by comparing to each one? This can be very slow (especially if you have lots of possible commands) and perhaps you can make a big optimization if you use a proper data structure such as a hash-table or some sort of sorted structure like a avl tree.

as for the memory optimization,
you probably know this allready but just in case:

The CPU is very very fast, the bottleneck is how fast data can enter/leave the CPU. For example fetching a memory can be 100 CPU cycles, in which the CPU does nothing but wait for data. Reading from the harddisc is many times slower (i forget the number, maybe 200 times slower than memory).
There are of course optimizations happening in the background such as harddisc cache and memory cache and cpu cache. Because of such background optimizations it is much faster to read a bunch of bytes from a file instead of one byte after another. It is also faster to read a consecutive bytes from memory.

If you read all the file at once into memory you get the best performance the harddisc can give you. If you shrink the memory size of the code you can fit more of it into the cpu cache (where reading a byte is just 1 cycle instead of 100). It is also faster to compare because instead of comparing 4-8 characters you only need to compare one, but since the CPU is ultra fast this isnt a big issue. The most important difference is that more code fits in the cache.


Iftah.

Quick question: when I dissassemble my script, do I have to loop through it to find the functions and stuff? If so, then it kind of defeats the purpose, but not all the way.
The best thing to do is just choose whatever you think you'd prefer, and go for it. -Promit
What I do for an embeddable scripting language I wrote (CellScript) is first parse the entire source file into a token stream. A single token his represented by a class that contains the type of token and its source file representation. This is called a Lexer.

My parser then "parses" the token stream to verify language syntax and generates a syntax tree if everything goes the way it's supposed to. From the syntax tree I then generate the byte code which can be executed in memory or saved to disk to be loaded at a later date. The VM is stack based and uses a giant switch to execute the individual byte codes. In C++ it might be more efficient to contain an array representing a hash table of function pointers to functions that execute an individual command. In C# (which CellScript is currently written in) I think the switch is a bit faster than an array based delegate solution but i havn't done any testing to verify it.

Instead of generated the syntax tree into byte code you could also just execute the syntax tree itself. There's many different ways to approach language creation, if you want to take a look at how I did CellScript you can download it from the public CVS at http://www.sf.net/projects/cellframework

-Corillian

This topic is closed to new replies.

Advertisement