How/when do compilers do syntax checking?

Started by
4 comments, last by Telastyn 12 years, 3 months ago
Hi, I've finished a small scripting language resembling a C-language. I named it clsl (stands for C-Like Scripting Language). I miss one critical implementation though .... and that's the syntax checker!!

So I'm asking you guys, since I really know nowhere else to look. How do compilers perform syntax checking? when do they do it? Do they:
-check syntax of all source files, then only proceeds to compile if there's no coding typo/typing error
or do they:
-compile the source, catch error along the way, and then decide whether to continue/stop the compilation.
Right now, this is what I'm using. The compiler compiles all source, and abort the process once it encounters any error (and also write the error to log file). Thing is, I think there are way too much error cases to catch, and thus this approach is kinda ugly and of course there are MANY times where my script compiler just crash and doesn't report anything (a non-reproducible bug/flaw). Do you have a suggestion of how to write a good syntax checker? thanks btw biggrin.png
the hardest part is the beginning...
Advertisement
The second one. In compiler theory and textbooks all the phases are separate and distinct. In real life, the good compilers can do several things simultaneously. They spit out each one as soon as it is encountered so it won't crash without message. It is also a good practice to continue after an error to provide the programmer with information about as many issues as possible since the error message may stem from an issue at another location.
Have a read of this and follow some of the more obvious links therein. Traditionally, by the time the AST is built, the syntax checking will have been done.
When you parse your files for the statements you already should be doing some form of syntax checking, otherwise you might be processing an invalid file format.
There are a lot of libraries that generate a lexer/parser for you which basicially do some form of grammar checking. Have a look at lex & yacc or antlr. For my latest script language I used Antlr and it's a very good library!

Crafter 2D: the open source 2D game framework

?Github: https://github.com/crafter2d/crafter2d
Twitter: [twitter]crafter_2d[/twitter]

+1 for the responses. You've shed some light here. I'll be back here for more questions later....tongue.png

EDIT : it seems that I'm on the right track. Just need to catch more error cases :)
the hardest part is the beginning...
There are three steps in traditional compilers where these sort of things are caught. Most of them report all errors at one level together rather than stopping.

- Lexing step: First thing the compiler does is read in the characters and split them into logical bits. If your language prevents something odd like certain unicode characters, there will be an error here that prevents other steps.
- Parsing step: The second step is to take those logical bits and pull out the language syntax. If you require to have a classname between 'class' and the open bracket, that sort of thing will be detected here.
- Lexical Analysis: As you pull the syntax out into abstract structures, you can run into issues like methods with the same name, or constructor looking syntax that doesn't have the same name as the class it's in. All of these errors tend to be more ad-hoc and are the 'rules' of the language rather than the syntax itself.

For me, I have a error collection that is returned from each step. If it is populated, I don't proceed to the next step.

This topic is closed to new replies.

Advertisement