Help with custom language variables

Started by
12 comments, last by digital_phantom 16 years, 8 months ago
For some time now I have been working on my own scripting language, it's work in progress, and I have been using it in a lot of my applications and games I have been working on. Well I just noticed just a few ago that my game's FPS dropped from 500~fps down to 30~ and decreasing when I added a few variable command lines (like adding, setting, and getting variable values) to the script. The only intensive thing going on in these lines is that every-time a variable command is called, the program searches for the variable names in the "global variable list" and returns (or sets) the values necessary. I wouldn't think running 'For' loops each loop would make a difference, but something is. So my question is in theory, what can I do to cancel out this major speed drop without breaking script compatibility? I had a theory but I don't know if it would be wise, which is when the script gets compiled, that the compiler replaces all matching variable names with increment indexes. Ex: Var1 = 0, Var2 = 1, MyName = 3, YourName = 4. So anyone out there that can give me some insight or help on this matter?
Advertisement
Is the "global variable list" literally a list? If so, as you add variables the access time will keep increasing since it takes longer to find the correct element in the list. It should be more of a "global variable map" that indexes the variables by name. The map could be implemented as a balanced tree or a hash table.

Naive string comparisons can also be expensive when the strings share a common previx (ie MyVar1, MyVar2, since it will first check that the first 5 letters are the same before reaching the 6th and realizing that they are different). If this is expected, it may be better to store a hash value with the strings and use that to more quickly discover when strings are unequal, and even for storing them in a map when lexicographic order isn't important.

Mapping variable names to integers wouldn't be a bad idea for increasing speed. If all variables must be declared before program execution then this could definitely speed things up. Languages like C++ do something like this for global variables, which are assigned to fixed memory addresses at compile or link time.

In more dynamic languages, like Python, where variables can be created and destroyed at run-time, it's still necessary to maintain the mapping from names to addresses (or indexes) while the program is running. Depending on the semantics of the language, frequent lookups might be necessary; for example, in Python bindings tend to be done very late, to the point that global variables accessed from inside of functions always look up the global variable by name (if a global function is used frequently, storing it and calling it in a local variable is an optimization). This can be further optimized by maintaining a mapping from names to unique identifiers associated with that name. This unique identifier is then used to look up the actual value bound to the name. In most circumstances, names can be replaced with their unique identifiers. This identifier would be an integer instead of a string, to allow for very fast comparisons.

So basically depending on if names are bound to variables before or after load time, replacing names with integers will require one or two levels of indirection.

[Edited by - Vorpy on July 30, 2007 12:27:04 AM]
Vorpy,

Pretty much the "global variable list" is an always adjusting array list, each item in the array is a structure containing the variable's:
* Name
* Value
* Scope (Global, Local)
* Is Linked (Which Variable Name Is Linked To)
And, * Local ID (If local scope is used, this is the 'Unique Function Name' that this variable belongs to, only used when searching for a variable through.)

So you brought up a good point, like with Python, because my language is both Static, and Dynamic.

There are variable management commands, and there are special "outside the environment" variable management commands as well that can be changed by the user at any moment as long as the parent application allows it.

I don't know if all of this will spice it up a bit or not.
Yeah, this global variable list could use a redesign.

There's no reason to store local variables in the structure you are using to look up global variables. Usually local variables are stored on a stack, and they are known before the function is called so there is no need for dynamic binding. I guess a language could allow the local variables to be dynamically bound, but it would be a fairly useless feature. The function could just create a table or map or whatever if it wanted to dynamically bind things within its own scope. The static binding of local variables allows for increased efficiency since everything just has a position on the stack. This also allows for recursion.

Alternatively, each function can be allocated a chunk of memory for it to store local variables. This introduces slightly more overhead since each function needs to have its own pointer to its data, while with a stack one pointer can be shared by all functions and there aren't memory allocation issues. Not having a stack allows for some really wondrous control structures and makes coroutines easier to implement.

Instead of using an array to store global variables, use a map to look up the structures by name. Possibly optimize it by mapping names to unique integers that index into an array (that possibly contains indexes into another array containing the actual values, if using late binding like Python). This stuff can get very confusing. What is the "is linked" field used for?
Vorpy,

The reason why I have all global and local variables stored under the same variables listing is because of my 'Is Linked' flag.

Just like in C/C++, in my language you can link a variable to another variable... For an example, let's say you have two variables, named varA and varB... And varA contains the number 18, but varB does not contain anything yet... So if you link varB to varA, when you go change varB, it would change varA too (really, varA is the only one being set here since varB is just linked to it)

The linking feature is also how I make function's arguments "by reference" instead of just "by value". And that means that when a function is parsed, it's arguments are parsed as variables and later added to the variable list when that function is called.

So I was thinking more about the variables list, if I am to add everything by index, right now I have the variable list always re-allocated every-time a new variable needs added... but this runs into a little problem.

So when a variable is getting set/get using the "compiler index" values, it is most likely going to have a larger value then the allocated size of the table... like so we're going to set the variable #80 into the variable table of like size 30.... well that isn't going to work quite well... so...
I was thinking about that, I see two choices... either (A) When setting a variable to a smaller table, automatically allocate the table to that variable's larger index ... or ... (B) Have a fixed memory allocation for the variables.

By my programming habits, I don't like anything that is fixed, I want everything to be dynamic, more adjustable... So what do you think on this issue? Should it be a fixed or dynamic list ?
Just expand the array when necessary.

Instead of linking variables, create a reference type. In most modern languages, value types don't even exist and everything is a reference (and garbage collection is used instead of explicit deallocation). If everything is a reference then it is easy to make two variables refer to the same thing.
Well referring to what your talking about, wouldn't that mean I would have two lists now (or more), a memory one for variable values, and another one having the names, memory location, etc...

And I don't want to sound stupid or anything but what is a garbage collection?
Garbage collection is a memory management system that automatically deallocates memory when it is no longer accessible to the program. As long as the program holds a reference to an object, it is not garbage collected. Once all the references are gone, the object is eligible to be collected. Most high level languages use garbage collection. C++ does not and requires the programmer to explicitly delete objects to free memory.

There are a lot of different garbage collection strategies. Reference counting is very simple but it is possible to create circular references so that objects are inaccessible but still reference each other, so more complicated techniques are used. A mark-sweep algorithm follows the references held by the program to find all the accessible objects, and it marks them as being ineligible for collection. Then it sweeps over all the objects, looking for the ones that weren't marked so it can deallocate them. There are some complex ways that the garbage collector and memory allocation can interact.
Copying and generational garbage collection is arguably the best bet. But it's probably not something you want to care about for the moment. And if you use a modern language to implement your virtual machine in, then you can get garbage collection for free.

You could implement it naively with two layers of indirection:

  • variables are stored in a hash table where each value points to
  • a reference that points to
  • a value.

Then you can make two variables A and B point to the same reference. That way, when you change the reference A points to, B "changes" too (since the reference it points to changes).

Vorpy: In functional languages, which is where most of the "new" features for main stream languages (the imperative languages like Java or C# that I assume are the ones you call modern) are taken from, have ONLY value types (or at least, references is a special case ;-)
Thanks so far Vorpy and Ahnfelt,

I am going to look more into garbage collection on google.

What I am doing right now is I am writing another small application for my scripting language which is what I will call for now a "Linker".... It has two steps, (1) first function is opening up a compiled script, finding all variables, and storing their names into a table, the (2) second function is reading that stored table and replacing all the variable names with the indexes.

I am going to implement the whole memory collection for variable values, for they can be linked together, but how will this work for variables that are arrays? That each array item will link to a different area of memory, or will the variable to link a part of memory which that part of memory will have a flag saying if it's an array or not ?

Next, on talking about variables... I think that after my linker is created and partially, or fully tested, I think the virtual machine runtime environment, needs to also have access to the linker data, and it will at first, pre-create the whole variable list based on the linker data... and then add any additional variables for any dynamic runtime variables... What do you guys think on this idea? Or what idea would be best suited, or none at all and something else instead?

I am trying to look for methods that avoids any loops at all during the virtual machine script loop.

This topic is closed to new replies.

Advertisement