Sign in to follow this  
assainator

compiler parsing problem

Recommended Posts

heey all, With my scripting engine nearing completion, i tought it would be time to do some research in compiler theory. I am reading the book 'Game scripting Mastery' from wich i got a lot of information about building the Virtual Machine. It also shows how to build a compiler. So i started reading those chapters and i'm coding away happy =). The only tiny problem is: I'm using vb.net and the book uses C++, therefor i'm using a slightly different approach: My approach

this is the code to be processed
""""""
void test ( int i )
{
	return i * 2 ;
}
""""""

Step 1: First I split all code into sepperate lines. From those lines I extract all expresssions 

(Identifiers, integer, floats, strings, ect.). That way i get a large list that would look like this
'void'
'test'
'('
'int'
'i'
')'
'{'
'return'
'i'
'*'
'2'
';'
'}'

Step 2: Then I try to find out what different types(int, float) the tokens are there are and make a 

list of that to:
string
string
string
string
string
string
string
string
string
string
Integer
string
string

step 3: Then I start analyse the strings, check if they are keywords and types and modify the list 

as it should:
keyword
string
string
type
string
string
string
keyword
string
string
Integer
string
string

step 4: Then I try to find out what strings are identifiers useing the front objects
keyword
identifier
string
type
identifier
string
string
keyword
identifier
string
Integer
string
string

step 5: Then I convert strings to operators
keyword
identifier
string
type
identifier
string
string
keyword
identifier
operator
Integer
string
string

step 6: Then I try process the final strings
keyword
identifier
Parameter_start
type
identifier
Paramter_end
function_body_start
keyword
identifier
operator
Integer
end_of_instruction
function_body_end

step 7: and if there are any strings left that don't make sense (a string that is not between 

quotes) A error would be rissen

Though at the moment i'm running dead at step 2. it detects everything as it should in the first line, but not in the second It finds these strings: 'void' 'test' '(' 'int i' ')' '{' but if i try to analyse the first string of the second line ('{'), I get into a problem: I check each character of the string trough: stringName(x) where x is the current index. stringName(0) should be '{' but i get nothing, so i simple print it to a textbox, and i get nothing between the quotes!. If i insert a break point, and watch the variable value, it does say that stringName contains '{'. Can anyone see what's going wrong? Source if needed:
Public Class Compiler
    Public ICodeResult As String = ""
    Public AsmCodeResult As String = ""
    Public ErrorString As String = ""
    Public ProccedTokens As String = ""

    Private CurrentLexeme As String = ""
    Private CurrentLexomeStartIndex As Integer

    Dim Tokenised() As TokenPair = {}

    Dim tokens() As String = {}
    Dim cToken As String = ""

    Dim NumberArray() As Char = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
    Dim CharacterArray() As Char = {}


    Public Sub CplrError(ByVal err As String)
        ErrorString &= err & Environment.NewLine
    End Sub

    Public Sub New()
        For x = 65 To 90
            Array.Resize(CharacterArray, CharacterArray.Length + 1)
            CharacterArray(CharacterArray.Length - 1) = Chr(x)
        Next

        For x = 97 To 122
            Array.Resize(CharacterArray, CharacterArray.Length + 1)
            CharacterArray(CharacterArray.Length - 1) = Chr(x)
        Next

        Array.Resize(CharacterArray, CharacterArray.Length + 1)
        CharacterArray(CharacterArray.Length - 1) = "_"
    End Sub

    Private Function IsIn(ByVal character As Char, ByVal ParamArray charArray As Char()) As Boolean
        For Each c In charArray
            If c = character Then
                Return True
            End If
        Next
        Return False
    End Function

    Private Function IsAlphaBethic(ByVal chr As Char)
        Return IsIn(chr, CharacterArray)
    End Function

    'This function converts the long string into lexemes
    Private Sub LexialAnalasis(ByVal code As String)
        Dim lines() As String = code.Split(Environment.NewLine)

        For l = 0 To lines.Length - 1
            Dim line As String = lines(l)

            For c = 0 To line.Length - 1
                Dim chr As Char = line(c)

                Dim ChrArray() As Char = {" ", "{", "}", "(", ")", "]", "[", ","}
                If IsIn(chr, ChrArray) Then
                    ICodeResult &= "Found delimiter in line " & l & " on position" & c & Environment.NewLine

                    ''We need the delimiter for later use
                    'Array.Resize(tokens, tokens.Length + 1)
                    'tokens(tokens.Length - 1) = chr

                    If Not cToken = "" Or cToken = Nothing Then
                        Array.Resize(tokens, tokens.Length + 1)
                        tokens(tokens.Length - 1) = cToken
                        AsmCodeResult &= "Found token '" & cToken & "'" & Environment.NewLine
                        cToken = ""
                    End If
                Else
                    cToken &= chr
                End If

                'If this is the last character, the token should be put into the array as wel
                If c = line.Length - 1 And Not cToken = "" Then
                    If Not cToken = Nothing Then
                        Array.Resize(tokens, tokens.Length + 1)
                        tokens(tokens.Length - 1) = cToken
                        AsmCodeResult &= "Found token '" & cToken & "'" & Environment.NewLine
                        cToken = ""
                    End If
                End If

            Next

        Next

    End Sub


    Public Sub Tokeniser()
        Dim tmpTok As New TokenPair(TokenType.Init, "")

        Dim CurrentGuess As TokenType = TokenType.Init

        Dim hasFloatPoint = False

        'process each tokens
        For Each tmpToken In tokens
            'reset flags
            hasFloatPoint = False
            CurrentGuess = TokenType.Init

            'Check if this token is empty, just in case it passed the lexialanalasis
            If Not tmpToken = "" Then

            'process each character
            For x = 0 To tmpToken.Length - 1

                Select Case CurrentGuess
                        Case TokenType.Init

                            If tmpToken(x) = Environment.NewLine Then
                                x += 1
                            End If

                            If IsNumeric(tmpToken(x)) Then
                                CurrentGuess = TokenType.Int


                            ElseIf tmpToken(x) = "." Then
                                CurrentGuess = TokenType.Float


                            ElseIf IsAlphaBethic(tmpToken(x)) Then
                                CurrentGuess = TokenType.Identifier

                            Else
                                CplrError("Invalid Entry #10001 '" & tmpToken & "' on position '" & (x).ToString & "' = '" & tmpToken(x) & "'")
                                MsgBox(tmpToken)
                                Exit For

                            End If

                    Case TokenType.Int
                        If IsNumeric(tmpToken(x)) Then
                            CurrentGuess = TokenType.Int
                            If x = tmpToken.Length - 1 Then
                                ProccedTokens &= "Found Integer '" & tmpToken.ToString() & "'" & Environment.NewLine
                            End If

                        ElseIf tmpToken(x) = "." Then
                            CurrentGuess = TokenType.Float

                        Else
                            'This means that the rest are characters, and identifiers cannot start with a digit
                            CplrError("A Identifier cannot start with a digit")
                            Exit For
                        End If

                    Case TokenType.Float
                        If IsNumeric(tmpToken(x)) Then
                            'if is already a floating point
                            CurrentGuess = TokenType.Float

                            If x = tmpToken.Length - 1 Then
                                ProccedTokens &= "Found float '" & tmpToken.ToString() & "'" & Environment.NewLine
                            End If

                        ElseIf tmpToken(x) = "." Then
                            CplrError("A floating point integer cannot have 2 radix points")
                            Exit For

                        Else
                            CplrError("Invalid entry #10002 '" & tmpToken(x) & "'")
                        End If

                    Case TokenType.Identifier
                        If x = tmpToken.Length - 1 Then
                            ProccedTokens &= "Found Identifier '" & tmpToken & "'" & Environment.NewLine
                        End If
                End Select

                Next
            End If

        Next

    End Sub

    Public Function CheckDelimiter(ByVal chr As String) As Boolean
        Dim delimiterArray() As Char = {" ", "{", "}", "(", ")", "]", "[", ","}

        If IsIn(chr, delimiterArray) Then
            Return True
        End If

        Return False
    End Function


    Public Sub C_to_Icode(ByVal code As String)
        LexialAnalasis(code)
        Tokeniser()
    End Sub
End Class




thanks in advance

Share this post


Link to post
Share on other sites
I recommend picking up a book titled "Programming Languages: Principles and Paradigms". It covers grammars and many key ideas you need to know to make an actual language. Probably more well rounded than the book you currently have.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this