Jump to content
  • Advertisement
Sign in to follow this  
3Dgonewild

Implementing Virtual machine + Compiler

This topic is 3563 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone...how's it going?(im not expecting answers :P ). Well , anyway , ... First of all let me tell you that im working on a scripting language. I have nearly finished the virtual machine , and now im trying to create a parser(for the compiler) , and here comes the wave of questions : 1) What's the best approach of creating a parser for a compiler? Right now the parser work's like this:
load code $given_source_file$

while reading stream
 skip white space

 std::string token  = getDynamicToken(" \n\r");//stream to token buffer and return what we want

 do string comparison with token
  on "if" structure call parseIfStructure
  on "class" call parseClassStructure
  on "int" call parseIntIdentifier
 inc stream.head 
  

2) What about expressions ? what's the most efficient way to handle them?. For example , how should i handle an expression like this one:
(((b+128)*a) + (u * q %m) );

-> bytecode =
//@compiler module

set get_idf(b)
push
pop //vm->b is now $b
set const 128 //vm->a is now 128
add // vm->a is now $b + 128
push get_idf(a)
mul
push
set $u
pop
set $q
mul
push
set
mod

3) (bytecode related) Do i have to worry about endianess or no? 4) Any tips on how to organize such a project? That's all ..., Thanks for your time...

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by 3Dgonewild
1)
What's the best approach of creating a parser for a compiler?

Use an existing parser generator like Yacc/Bison, ANTLR or Lemon.
Quote:
2)
What about expressions ? what's the most efficient way to handle them?.

I have no idea what you're asking.
Quote:
3)
(bytecode related)
Do i have to worry about endianess or no?

Depends on whether or not your byte code contains data that are multiple bytes or not.

Share this post


Link to post
Share on other sites
What SiCrane said and...

Quote:
Original post by 3Dgonewild
2)
What about expressions ? what's the most efficient way to handle them?.


In standard compilers, they would fall out naturally from your parser/syntax. C#'s grammar is a pretty good example of how the different components are ordered so that your resultant tree is already in the proper order of operations.

Share this post


Link to post
Share on other sites
Quote:
Original post by 3Dgonewild
1) What's the best approach of creating a parser for a compiler?


If you are looking for fun and experience, there's a minimalistic example at wikipedia. A more elaborate example can be found on http://llvm.org, in the Kaleidoscope Tutorial.

If you want something for production, something that's scalable and a grammar definition that is easy to read, see SiCranes post. Also have a look at boost.spirit, which let's you write a kind of EBNF inside your C++ code (header files only).

In general, have a look at the freely available book Compiler Construction by Niklaus Wirth. If you can afford some money, get a copy of the Dragon Book.

Also, wikipedia has in the meanwhile tons of (imho) good articles on compiler construction.

In general, it will be easier if your virtual machine be stack based at first. But don't write assembly directly, instead use an abstract syntax tree, and go the frontend/backend approach (frontend==language definition and parsing, backend==code emmission) which enables you to add more languages and targes (native binary, or maybe just your next awesome version of your VM).

And, b t w.

Quote:
2) What about expressions ? what's the most efficient way to handle them?.

There are different ways, see wikipedia and the kaleidoscope tutorial. You could do Operator Precedence Parsing. Or use a hack that early FORTRANS used.

But if you use something EBNF based, precedence will come naturally from your language definition.


Quote:
4) Any tips on how to organize such a project?


Frontend / (Middleend /)? Backend.

Share this post


Link to post
Share on other sites

Thanks for the replies!...

Quote:

I have no idea what you're asking.


What i was asking for , is a way to parse complex expressions and generate the proper bytecode.
For example , a simple expression like this one:


uptr x = 128 * 2;



...can be handled easily , ... but what happens when
we have complex expressions , like this one:


uptr x = (128 * 2 - (88 + 12) / 2) * 2;



Obviously the parser(in combination with compiler) must be smart in order to generate the proper
bytecode , since it requires loads of stack related operations(when needed).

Anyway , i think i found what i was asking for(thanks to phresnel), ...i have
to implement an operator precedence function.


Quote:

If you are looking for fun and experience...


Just for the experience of course :).

Quote:

In general, have a look at the freely available book Compiler Construction by Niklaus Wirth.


Great! , looks really interesting!...


Quote:

Frontend / (Middleend /)? Backend.


Frontend i guess :P...

Share this post


Link to post
Share on other sites
Quote:
Frontend i guess :P...


Ah by that I meant the architecture of the framework. The frontend would contain the parser, which e.g. generates an abstract syntax tree. Then, optimizations could be done on the AST. After all, the backend is going to take the AST and generate target machine code.

Share this post


Link to post
Share on other sites
Something like flex+bison will allow you to define very specific code emision rules for EVERY syntax rule.
The standard approach is to make the bison grammer output a parser that contains the C++ to generate an Abstract Syntax Tree. The tree structure defines the expression in operator precedence order (order is defined in your syntax file you give to bison).



128 * 2 - ((88 + 12) / 2) * 2;
becomes:
(-)
============
| |
(*) (*)
====== ======
| | | |
128 2 div 2
======
| |
+ 2
=====
| |
88 12


You then recurse over the tree to emit the bytecode. with functions that would look like:

EmitSymbols ( treeNode *pNode )
{
switch ( pNode->type )
{
case SYM_MINUS: EmitSymbol_Minus(pNode->child); break;
case STM_MULT: EmitSymbol_Mult(pNode->child); break;
// etc....
}
}

EmitSymbol_Minus( treeNode *pNode )
{
EmitSymbols ( pNode->left );
EmitByteCode_Minus();
EmitSymbols ( pNode->right );
}

Share this post


Link to post
Share on other sites

Thanks , that makes sense!...
I think that i understand the concept of generating the tree ,
im just not sure how i should generate the final bytecode...hmm i'll try
to find a way(i guess...)...

Here's some "code" that i plan to use for the stream->tokenizer...



struct TTokenType
{
TTokenType(){data=NULL; type = enums::TTypes::unknown_sym; prc_level = -9;}
TTokenType(const TTokenType t,const ptr prc_lvl){type = t; prc_level = prc_lvl;data=NULL;}
TTokenType(const TTokenType t,const ptr prc_lvl,const uiptr* &value){type = t; prc_level = prc_lvl;data=strdup(value);}
enums::TTypes type;
ptr prc_level;
uiptr* data;

};

struct TNodeElement
{
TNodeElement(){next=previous=child=NULL;}
TTokenType token;
TNodeElement* next;
TNodeElement* previous;
TNodeElement* child;
};


TNodeElement* head , *tail



............



void tokenizeStream(const uiptr* &stream,const uptr stream_length,TNodeElement* tree);
{
tokenize each character and update the tree...:

while in bounds
skip white space

if is alpha
{
token = getToken
check to see if identifier exists
if yes
{
v = get value(token)
TTokenType tt(enums::value,0,v));
register token tt
}

otherwise rollback token

}else{
switch **character
case '(' :
{
TTokenType tt(enums::paren_open,-1));
register token tt
}

}




void imaginaryHandleByteCodeFunction();...



Share this post


Link to post
Share on other sites
Just as a note:

Most CS programs require a compiler theory class, and it generally includes this type of thing.

There are many textbooks that could explain these topics better than simplified online tutorials. Good inexpensive textbooks will cover everything from the basics of making an assembler for the VM all the way through optimizing the final generated VM code.

A little bit of Google can help you know if the book is any good, and used or old editions are generally very cheap.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!