Archived

This topic is now archived and is closed to further replies.

Gammastrahler

Deleting comments from C source files

Recommended Posts

Gammastrahler    150
hi, i´m currently writing a small app that reads in C and C++ source files and then removes all comments from it. I´m using a FSM approach, and it works almost fine, but since my FSM goes into the state STATE_SLASH when the first "/" is encountered, those will be eaten and not added to my output file since i must assume it could mark the beginning of a comment. But if this is a arithmetic operator, it will be deleted too! How could i distingush between those cases? Could someone give me a hint? thanks Gammastrahler [edited by - Gammastrahler on September 30, 2002 11:47:24 AM]

Share this post


Link to post
Share on other sites
Gammastrahler    150
well, real soon now, i want to code a scripting engine.

but i want to start as simple as possible so i need to remove the comments first

i have read some documents about the method of a lexer. The classic algorithm is a FSM.

OK, i could well check for /* or // but that does not work when comments close, for example you can also have some spaces between the * and the / or even a newline so you can´t just test for the next char.

so it is more suitable for a FSM.

[edited by - Gammastrahler on September 30, 2002 12:15:26 PM]

Share this post


Link to post
Share on other sites
andrew_j_w    122
In your state_slash look at the next non-blank character, if it is a ''/'' or a ''*'' then you''ve got a comment, otherwise just output a ''/'' followed by the character you''ve just read.

HTH
Andrew

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Read in characters in the SCAN state until you get a slash ''/'' char, on reading the slash, enter state SLASH as before, in state SLASH, read a second character (after storing the first in a temp char). If this character is a ''*'', enter state C_COMMENT, if it is another ''/'', enter the state CPP_COMMENT, if it is neither, write the original character, followed by the new one to your output file and return to the SCAN state.

In state CPP_COMMENT read and bin all chars until the CR LF chars

In state C_COMMENT read and bin all chars including CR and LF until you find a ''*'' followed immediately by a ''/'' using the same principle as above.

This does not completely solve the problem. Concider the following line:

fprintf(fh, "// A comment written to a file\n\r";

This would become

fprintf(fh, "

Oops

Another state IN_STRING could be used to prevent this (by entering IN_STRING whenever you see a ''"'' and staying there until you get another one.

Even this will not work if you have a string containing the \" character, so you need to check for that too!

Have fun!

Share this post


Link to post
Share on other sites
Robbo    122

Yes, I think your FSM shouldn''t be flipped into comment mode until you''ve recognised the entire token (ie. // or /*). You should be changing the FSM state based on leximes or fundamental constructs rather than individual characters. So (1) split the line into leximes and then (2) execute your FSM rules based on those leximes.

If you want to get clever, you can use a binary expression tree and "shunting" to get operator precedence etc.

Share this post


Link to post
Share on other sites