Sign in to follow this  
choffstein

Array Iterating in Scripting Language design

Recommended Posts

I recently redid my virtual machine a bit, and realized that I have one major flaw in my code design. Let me show you:
method main{
   object a[15]
   object i
   mov i 0
   FILL_UP:
      mov a[i] i
      inc i
      je i 15 DONE
      jmp FILL_UP
   
   DONE:
      exit 1
}
So basically, that just fills an array, where each index is filled by its index value. My issue, however, is how would this translate as opcodes? When I use an array, I simply use a value called "iOffsetValue". So all objects in an array have the same "stack" value, but are simply offsets. For example, in the mov opcode, it might look something like this 10 2 8 0 1 8 1 0 That would "mov a[1] i" The first number is the function opcde, the second is the number of parameteres. The third is the type of the first parameter...in our case an object. The object index is 0, and the offset is 1...making it a[1]. Next, we get the second object at index 0...i[0]...and therefore i. However, how would I do this if the offset is a variable? Should I add another value to my opcodes that asks what TYPE the offset is (int or object)? This is really one of the only feasable options I can think of... I am really stuck on this one. Thanks for any ideas -visage

Share this post


Link to post
Share on other sites
Your move instruction looks backwards compared to what I'm used to, which is source first, destination second.

Anyway, all you need is an indirect indexed addressing move instruction. Or in plain English, an instruction that takes the address of the object which supplies the offset, instead of supplying the offset directly as a parameter. So, 3 parameters; the variable to be indexed, the variable holding the index, and the value/variable you're going to read from.

Share this post


Link to post
Share on other sites
Kylotan,
I appreciate the response...however, I really have no idea what you are talking about, or where you want me to implement it =D

I tried implementing my method, but soon experienced a possible problem, demonstrated below

a[a[a[a[i]]]]

I would have no way of dealing with this sort of ...issue. All I would be able to deal with in my method is "a[i]." I tried to create some sort of recursive algorithm, but soon realized that I may simply just have a problem in my design. Let me explain:

As I said in my original post, my arrays are simply stack offsets. When an object is created in my compiler, they are added to an object table, and their stack index is set as 0 (as in, its unknown at the time.) So when I have an opcode that deals with an array, I simply use the op value for an object index and a value for an offset.

Now, if we look at the above example, I may have a huge problem. But lets start small.

a[a[i]]

My compiler would look at this and break it into:
a a[i]
Where a is the object name, and a[i] is the index
It would then recurse, seeing that a[i] itself has an index, and split them up:
a i

So I would end up with:
a a i

It would look at i and say "The offset type is a variable...so set the offset type as a variable and the offset as the object index of i."

So now we have:
a a + index of i at object table

But what do I do here?
What I want the VM to see is:
a + value at index of (a + value at i)?

My opcodes dont really allow for this type of thing to happen. Any ideas on how to solve this one? I can break all the data apart perfect with recursive functions. I tried:

a[a[i]]

What happens is it returns the index of i (say its 1). So then we have a and a[1]. Which is, obviously, NOT what I want. Because I dont have the value for i, I need to insert the object table index. However, it just ends up looking like an index value instead of a the object table index.

Any ideas on how to deal with this beast of an issue? Thanks!

-visage

[Edited by - visage on December 4, 2004 6:15:48 PM]

Share this post


Link to post
Share on other sites
I have been tossing around the idea of bitshifting for this problem. Here is how it would work:

a[a[a[a[1]]]] has 4 arrays being used...3 of which are relative and 1 of which will be absolute at runtime.
a[a[a[a[i]]]] has 4 arrays being used, all of which are relative.

So basically we break down the array like this:


a[a[a[a[i]]]]
|
a--a[a[a[i]]]
|
a--a[a[i]]
|
a--a[i]
|
a--i


Starting from the bottom, we find the objectTable index for i. Lets say its 1. That is the offset value, and MEM_REF is the offset type. If we used the first example, where it was absolute, the value would be 1 and the type would be INT_LITERAL.

numArrays starts at 0.

So we go like this: OffsetValue += (MEM_REF << 1 + Value) << numArrays++;

So next we are at:

a[a[a[a[i]]]]
|
a--a[a[a[i]]]
|
a--a[a[i]]
|
a--a[i]


We know, however, that a[i] is a RELATIVE type, because it has a size. So we say TYPE_RELATIVE, and add the index of the base array (lets say a has the object table index of ... 3)
OffsetValue += (TYPE_RELATIVE+index) << numArrays++;

It is understood that relative types are automatically MEM_REFs.
By the end, it would look something like this:

OffsetValue = (RELATIVE+Index)(RELATIVE+Index)(RELATIVE+Index)(MEMREF+Index)

So then I just go through, right to left. I solve for the first absolute, then go through the rest of the relatives knowing what I have already solved for.

Does this sound like a decent plan? Anyone see any potential problems?

Thanks
-visage

Share this post


Link to post
Share on other sites
To be honest, I think the underlying problem here is that you're writing an 'assembly-like' language but you aren't sticking to normal assembly-like constraints. Normal assembly has a one-to-one equivalence between the mnemonic instructions like MOVE or BEQ, but yours cannot do that, because you allow for complex concepts such as nested indices. Real assembly doesn't tend to allow that to happen, so a high level language takes one statement and breaks it down into many assembly instructions. Given that it always has to do this, they have a much more complex syntax parsing system that deals with these issues elegantly, whereas you are finding yourself having to hack in support for these special cases.

eg. A high level programming language has a grammar that is recursively defined and this statement is just fine:

LET i = a[b[c[d[e[f[g[h[i[j[k[l[m[n[o]]]]]]]]]]]]]]]

Yet you cannot possibly provide a single opcode that can accommodate that possibility. So you have to make a choice:

1) Decide that your programming language is essentially going to be high level, provide support for parsing the above but acknowledge that you will spit out several opcodes for it; or

2) Tell the programmer that this simply isn't possible and that they have to generate the indices themselves. You can allow a[10] which generates one opcode, a[i] which generates a second opcode, but disallow a[b[i]].

Conceptually, if you allow the first syntax and expand it out for them, or if your programmers do it by hand, the above is going to unroll into something like this:

move n[o] -> TMP
move m[TMP] -> TMP
move l[TMP] -> TMP
...
move b[TMP] -> TMP
move a[TMP] -> i

Obviously this is in reverse order of declaration, so if doing it programmatically you parse out the instruction using a stack, placing each unresolved array index operation onto the stack and when you hit either a constant or object in the subscript you can output either a constant index opcode or an indirect indexed opcode, and pop it off the stack.

But it's imperative that you have an indirect indexed instruction, because regardless of how you implement array indexing, you need an opcode that will allow the program to wait until runtime to evaluate how much offsetting you need. In other words, you need an opcode that can remember where to get the index from, without knowing what that index is going to be at compilation time.

In fact, in real assembly, I suspect that this sort of indexing is all you'll see and you will probably never see any direct indexed addressing (eg. a[10]) because that will just evaluate to a constant. (Some processors don't support indexed addressing and it's done instead with pointer addition/multiplication instead but the principle is the same.)

Is that any clearer?

Share this post


Link to post
Share on other sites
Oh wow. You are totally the man. I cant believe I didn't think of that! Thank you so much!

I was essentially trying to make my assembly language as "complex" as possible to make the eventual language easier to translate...but this method:

move n[o] -> TMP
move m[TMP] -> TMP
move l[TMP] -> TMP
...
move b[TMP] -> TMP
move a[TMP] -> i


Is perfect for this low level code. Thanks again man!

-visage

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this