Jump to content
• Advertisement

# Implementing Virtual machine + Compiler

This topic is 3471 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

## Recommended Posts

Hi everyone...how's it going?(im not expecting answers :P ). Well , anyway , ... First of all let me tell you that im working on a scripting language. I have nearly finished the virtual machine , and now im trying to create a parser(for the compiler) , and here comes the wave of questions : 1) What's the best approach of creating a parser for a compiler? Right now the parser work's like this:
load code $given_source_file$

while reading stream
skip white space

std::string token  = getDynamicToken(" \n\r");//stream to token buffer and return what we want

do string comparison with token
on "if" structure call parseIfStructure
on "class" call parseClassStructure
on "int" call parseIntIdentifier
inc stream.head


2) What about expressions ? what's the most efficient way to handle them?. For example , how should i handle an expression like this one:
(((b+128)*a) + (u * q %m) );


-> bytecode =
//@compiler module

set get_idf(b)
push
pop //vm->b is now $b set const 128 //vm->a is now 128 add // vm->a is now$b + 128
push get_idf(a)
mul
push
set $u pop set$q
mul
push
set
mod


3) (bytecode related) Do i have to worry about endianess or no? 4) Any tips on how to organize such a project? That's all ..., Thanks for your time...

#### Share this post

##### Share on other sites
Advertisement
Quote:
 Original post by 3Dgonewild1) What's the best approach of creating a parser for a compiler?

Use an existing parser generator like Yacc/Bison, ANTLR or Lemon.
Quote:
 2)What about expressions ? what's the most efficient way to handle them?.

I have no idea what you're asking.
Quote:
 3)(bytecode related)Do i have to worry about endianess or no?

Depends on whether or not your byte code contains data that are multiple bytes or not.

#### Share this post

##### Share on other sites
What SiCrane said and...

Quote:
 Original post by 3Dgonewild2)What about expressions ? what's the most efficient way to handle them?.

In standard compilers, they would fall out naturally from your parser/syntax. C#'s grammar is a pretty good example of how the different components are ordered so that your resultant tree is already in the proper order of operations.

#### Share this post

##### Share on other sites
Quote:
 Original post by 3Dgonewild1) What's the best approach of creating a parser for a compiler?

If you are looking for fun and experience, there's a minimalistic example at wikipedia. A more elaborate example can be found on http://llvm.org, in the Kaleidoscope Tutorial.

If you want something for production, something that's scalable and a grammar definition that is easy to read, see SiCranes post. Also have a look at boost.spirit, which let's you write a kind of EBNF inside your C++ code (header files only).

In general, have a look at the freely available book Compiler Construction by Niklaus Wirth. If you can afford some money, get a copy of the Dragon Book.

Also, wikipedia has in the meanwhile tons of (imho) good articles on compiler construction.

In general, it will be easier if your virtual machine be stack based at first. But don't write assembly directly, instead use an abstract syntax tree, and go the frontend/backend approach (frontend==language definition and parsing, backend==code emmission) which enables you to add more languages and targes (native binary, or maybe just your next awesome version of your VM).

And, b t w.

Quote:
 2) What about expressions ? what's the most efficient way to handle them?.

There are different ways, see wikipedia and the kaleidoscope tutorial. You could do Operator Precedence Parsing. Or use a hack that early FORTRANS used.

But if you use something EBNF based, precedence will come naturally from your language definition.

Quote:
 4) Any tips on how to organize such a project?

Frontend / (Middleend /)? Backend.

#### Share this post

##### Share on other sites

Thanks for the replies!...

Quote:
 I have no idea what you're asking.

What i was asking for , is a way to parse complex expressions and generate the proper bytecode.
For example , a simple expression like this one:

uptr x = 128 * 2;

...can be handled easily , ... but what happens when
we have complex expressions , like this one:

uptr x = (128 * 2 - (88 + 12) / 2) * 2;

Obviously the parser(in combination with compiler) must be smart in order to generate the proper
bytecode , since it requires loads of stack related operations(when needed).

Anyway , i think i found what i was asking for(thanks to phresnel), ...i have
to implement an operator precedence function.

Quote:
 If you are looking for fun and experience...

Just for the experience of course :).

Quote:
 In general, have a look at the freely available book Compiler Construction by Niklaus Wirth.

Great! , looks really interesting!...

Quote:
 Frontend / (Middleend /)? Backend.

Frontend i guess :P...

#### Share this post

##### Share on other sites
Quote:
 Frontend i guess :P...

Ah by that I meant the architecture of the framework. The frontend would contain the parser, which e.g. generates an abstract syntax tree. Then, optimizations could be done on the AST. After all, the backend is going to take the AST and generate target machine code.

#### Share this post

##### Share on other sites
Something like flex+bison will allow you to define very specific code emision rules for EVERY syntax rule.
The standard approach is to make the bison grammer output a parser that contains the C++ to generate an Abstract Syntax Tree. The tree structure defines the expression in operator precedence order (order is defined in your syntax file you give to bison).

128 * 2 - ((88 + 12) / 2) * 2;becomes:       (-)    ============    |          |   (*)        (*) ======     ======      |    |     |    |128   2    div   2         ======         |    |         +    2       =====       |   |      88   12

You then recurse over the tree to emit the bytecode. with functions that would look like:
EmitSymbols ( treeNode *pNode ){  switch ( pNode->type )  {    case SYM_MINUS: EmitSymbol_Minus(pNode->child); break;    case STM_MULT: EmitSymbol_Mult(pNode->child); break;    // etc....  }}EmitSymbol_Minus( treeNode *pNode ){  EmitSymbols ( pNode->left );  EmitByteCode_Minus();  EmitSymbols ( pNode->right );}

#### Share this post

##### Share on other sites

Thanks , that makes sense!...
I think that i understand the concept of generating the tree ,
im just not sure how i should generate the final bytecode...hmm i'll try
to find a way(i guess...)...

Here's some "code" that i plan to use for the stream->tokenizer...

struct TTokenType{	TTokenType(){data=NULL; type = enums::TTypes::unknown_sym; prc_level = -9;}	TTokenType(const TTokenType t,const ptr prc_lvl){type = t; prc_level = prc_lvl;data=NULL;}	TTokenType(const TTokenType t,const ptr prc_lvl,const uiptr* &value){type = t; prc_level = prc_lvl;data=strdup(value);}	enums::TTypes type;	ptr           prc_level;	uiptr*        data;	};struct TNodeElement{	TNodeElement(){next=previous=child=NULL;}	TTokenType    token;	TNodeElement* next;	TNodeElement* previous;	TNodeElement* child;};TNodeElement* head , *tail

............

void tokenizeStream(const uiptr* &stream,const uptr stream_length,TNodeElement* tree);{	tokenize each character and update the tree...:	while in bounds	skip white space		if is alpha	{		token = getToken		check to see if identifier exists		if yes		{			v = get value(token)			TTokenType  tt(enums::value,0,v));			register token tt		}					otherwise rollback token			}else{	switch **character	case '(' :	{		TTokenType  tt(enums::paren_open,-1));		register token tt	}	}

void imaginaryHandleByteCodeFunction();...

#### Share this post

##### Share on other sites
Just as a note:

Most CS programs require a compiler theory class, and it generally includes this type of thing.

There are many textbooks that could explain these topics better than simplified online tutorials. Good inexpensive textbooks will cover everything from the basics of making an assembler for the VM all the way through optimizing the final generated VM code.

A little bit of Google can help you know if the book is any good, and used or old editions are generally very cheap.

#### Share this post

##### Share on other sites

• Advertisement
• Advertisement

• ### Popular Contributors

1. 1
Rutin
26
2. 2
3. 3
4. 4
JoeJ
18
5. 5
• Advertisement

• 14
• 21
• 11
• 11
• 9
• ### Forum Statistics

• Total Topics
631763
• Total Posts
3002190
×

## Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!