Jump to content
  • Advertisement
Sign in to follow this  
GameMasterXL

How does C++ read its syntax in?

This topic is 4861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am just wondering does C++ read the whole source file into one line inside of a stack? or does it read each individual line? Like this #include <iostream> using namespace std; int main() { cout << "Hello World!!" << endl; return 0; } would it read this in like this #include <iostream>using namespace std;int main(){cout << "Hello World!!" << endl;return 0;} Or just like this: #include <iostream> // link code here... after linkage now using namespace std; // validate this line // skip white space int main() // validate this line { // read in start block character cout << "Hello World!!" << endl; // validate this line return 0; // read in this line } // read in end block character

Share this post


Link to post
Share on other sites
Advertisement
Any C++ parser worth its salt completely ignores whitespace (except where necessary). So the end result is much like your first option, except that of course the preprocessor interprets the # directives and acts accordingly. (In this case, by inserting the contents of iostream). Validation doesn't occur until after the code has been converted into internal symbols.

Share this post


Link to post
Share on other sites
Typically, it's an even more complex version of your second example. [smile]

One think you need to know, though, is that preprocessor directives are handled before the compiler even sees the code. The compiler wouldn't see #include <iostream>. Instead the data stream it receives directly include the contents of the iostream header. Likewise, all macros are evaluated.

Another thing that is important to know is that line breaks are whitespace like spaces and tabs. What matters to the compiler are statements, which are broken by semicolons for individual statements, or bounded by braces for compound statements.

If you want more details you need to get started on compiler theory. [smile]
Check out flex and bison. Even reading the docs should give you some insight.

Share this post


Link to post
Share on other sites
I'm not an expert but I'll try to explain it the best I can =)

It's a bit more complicated then that. The entire source code is first broken down into a list of tokens. So the list might go something like this-


int
main
(
)
{
cout
<<
"Hello World!!"
<<
endl
;
return
0
;
}


The #include <iosream> is a preprocessor directive and should be handled before tokenizeing I believe.

The syntex is is then checked using a parser that runs through each token and makes sure it's a valid syntex/grammar. If the syntax is valid it builds a parse tree which is then converted into assembly. The assembly is then translated into byte code (in interpreted languages like java/c#) or machine language/binary.

There are different types of parseing techniques but one called Recursive Descent is one of the more popular.

Above is a *very* non-descriptive explanation of how it works to get you started on your journey. You'll want to look into these topics: compiler theory, context-free grammars, recursive descent parsers, syntax/parse trees.

It may help if you look into or have some background in computational theory whcih is taught at a lot of schools. The end of that class (for me at least) lead right into compiler theory.

Good luck!

Share this post


Link to post
Share on other sites
@Fruny: The second example implies that the code is parsed line by line, ne? As you then proceeded to state, it's parsed statement by statement (to generalize, of course); that, to me, is more concretely symbolized by the first example...

Share this post


Link to post
Share on other sites
Quote:
Original post by TDragon
that, to me, is more concretely symbolized by the first example...


I think it closer to the second example since it implied that there were breaks in the parsing process. It's just that the breaks are on statements, not on new lines.

Share this post


Link to post
Share on other sites
Well i was just interested since i am currently building my own recursive-descent parser [smile] and am having trouble with finding a solution to validating if, for, else statments. The way the parser is now it is soposed to read one line of code from the file then validate it then store the data or output the results. But if it is an if statment i would need to get another line of code from the file agian and agian and agian untill i reach my end statment but i can't figure out how i will do this. Since my compiler calls functions within itself it would just start fresh and read a new line and it wouldn't know if it was looking for a end statment or not. Can anyone give me any ideas on how to achive this?

So the program is read line by line then?

In my book it said that C++ dosn't know what new-lines are so that made me think does it read all the statments onto one single line inside a buffer for valiadtion.

Share this post


Link to post
Share on other sites
Quote:
Original post by GameMasterXL
Well i was just interested since i am currently building my own recursive-descent parser [smile] and am having trouble with finding a solution to validating if, for, else statments. The way the parser is now it is soposed to read one line of code from the file then validate it then store the data or output the results.

But that's not how a true recursive-descent parser works. A recursive descent parser will read a token at a time, not a line at a time.

Share this post


Link to post
Share on other sites
Usually how it works is that it tries to build the syntax tree as it reads in individual lexemes. The parser requests lexemes from the lexer independent of the amount of whitespace in between the lines. For an if/else construct (ignoring comments), what probably happens is that it sees the if, asks for the next lexeme and if the lexeme is not a ( then it errors out. Then it jumps into a parse expression mode, and parses the expression inside the ()s, grabs the next ) and then asks for another lexeme. If the lexeme is a { then it jumps into a block parsing mode, if the lexeme is anything else, it tries to parse it as a statement.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!