Jump to content
  • Advertisement
Sign in to follow this  
Aardvajk

Compiler/Virtual Machine - advice

This topic is 4416 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So my compiler/vm project progresses well. The language is fully recursive like c/c++ with functions in expressions and so on, function overloading, declare-anywhere syntax and so on. At the moment, I'm building polymorphic trees from expressions that are used to syntax check and then either statically calculate their value or spit out byte-code for the vm, but control constructs like if-else and while are just being constructed as they are recursive-descent parsed, along with declarations and so on as well as any global constructs. I've been reading vaguely recently about compiling control constructs into trees as well before converting into byte code for the vm and was wondering: a) What is the advantage of doing this b) Where do you stop? Should entire functions be compiled into trees before they are converted to the actual byte code? Or even entire programs? Would it be advantageous to even turn global declarations into some kind of intermediate tree structure before turning them into static data sections and symbol table entries? Sorry this is a general question but after general responses. The language and vm are for text adventures (nice and simple) so sort of relevant to gamedev. When it is up and running I'm hoping to use it as a basis for a graphics-based language. Ta

Share this post


Link to post
Share on other sites
Advertisement
The biggest advantage to parsing a langage into a treelike intermediate language is that you can then take advantage of a large body of literature on optimization (eg. tree-SSA optimizers, etc). If you don't intend to optimize the generated byte code, or if all you want is basic peephole optimization or simple thigs like invariant hoisting or common subexpression elimination, then an intermediate language (or tree structure) is unnecessary.

Share this post


Link to post
Share on other sites
Cheers. That leads me onto another question about peephole optimisation actually.

Once I have generated a load of byte code, I can think of a few peephole optimisations I could apply but if I then remove or change instructions as a result, obviously all my links, including local relative jumps in compiled control structures, go off.

If I want to modify the compiled byte code, do I need to also maintain a record of every single reference to a code address, then everytime I change the byte code, work out which references are affected and update them accordingly?

Seems like a bit of a nightmare and hopefully I am staring an easier solution in the face but buggered if I can see it.

[Edit] I was wondering about turning the byte code BACK into some kind of post-intermediate list or tree like structure before optimising it but then my brain exploded.

Share this post


Link to post
Share on other sites
Quote:
Original post by EasilyConfused
If I want to modify the compiled byte code, do I need to also maintain a record of every single reference to a code address, then everytime I change the byte code, work out which references are affected and update them accordingly?


This depends entirely on your addressing mode.

Java VM is designed in a flexible manner, and allows run-time code manipulation. If you find that updating the addresses is too much of a problem, then you need to rethink the model.

You could of course use labels instead of addresses, where you use and cache proxies, but that would result in extra lookup on each call.

Do you allow absolute, relative addressing, or just symbol lookup (method references), do you have labels? In case of first two, you will need to update and recalculate everything. Java took this into consideration during design (other dynamic VM languages did too, of course). While you do have jumps, they are method local, so recalculating them is easy. Everything else is referenced through methods and fields.

Share this post


Link to post
Share on other sites
Well, I'm using absolute addressing for function calls and global data, and relative addressing for local jumps and obviously stack data is relative to the vm stack pointer.

Problem with post-compilation optimising is that even though the local jumps are indeed method local, changing the code would mess up all the absolute references in the program subsequent to the changes, plus seems like a bit of a headache to work out which other relative jumps had been affected.

Cheers for the input, Antheus. I reckon I'm going to give up on post-compiler optimising and just do as much as I can in the compiler as I go along.

[Edit] No, no, no - an intermediate "object code" format that has a header with a vector of addresses then, instead of references in the file, indexes into the vector. Optimiser can then bugger about with the code as much as it wants and just update the vector, then a linker program that resolves any references with the final values, then converts local jumps into relative jumps afterwards. Waffling a bit now.

[Edited by - EasilyConfused on April 21, 2006 4:57:56 AM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!