Deleting comments from C source files

Started by
7 comments, last by Gammastrahler 21 years, 6 months ago
hi, i´m currently writing a small app that reads in C and C++ source files and then removes all comments from it. I´m using a FSM approach, and it works almost fine, but since my FSM goes into the state STATE_SLASH when the first "/" is encountered, those will be eaten and not added to my output file since i must assume it could mark the beginning of a comment. But if this is a arithmetic operator, it will be deleted too! How could i distingush between those cases? Could someone give me a hint? thanks Gammastrahler [edited by - Gammastrahler on September 30, 2002 11:47:24 AM]
Advertisement
Ummmm, look for "//" or "/*" instead of just "/"?

botman
Is there a reason for writing this app? It sounds like a horrible idea! :-)

Helpful links:
How To Ask Questions The Smart Way | Google can help with your question | Search MSDN for help with standard C or Windows functions
well, real soon now, i want to code a scripting engine.

but i want to start as simple as possible so i need to remove the comments first

i have read some documents about the method of a lexer. The classic algorithm is a FSM.

OK, i could well check for /* or // but that does not work when comments close, for example you can also have some spaces between the * and the / or even a newline so you can´t just test for the next char.

so it is more suitable for a FSM.

[edited by - Gammastrahler on September 30, 2002 12:15:26 PM]
In your state_slash look at the next non-blank character, if it is a ''/'' or a ''*'' then you''ve got a comment, otherwise just output a ''/'' followed by the character you''ve just read.

HTH
Andrew
Read in characters in the SCAN state until you get a slash ''/'' char, on reading the slash, enter state SLASH as before, in state SLASH, read a second character (after storing the first in a temp char). If this character is a ''*'', enter state C_COMMENT, if it is another ''/'', enter the state CPP_COMMENT, if it is neither, write the original character, followed by the new one to your output file and return to the SCAN state.

In state CPP_COMMENT read and bin all chars until the CR LF chars

In state C_COMMENT read and bin all chars including CR and LF until you find a ''*'' followed immediately by a ''/'' using the same principle as above.

This does not completely solve the problem. Concider the following line:

fprintf(fh, "// A comment written to a file\n\r";

This would become

fprintf(fh, "

Oops

Another state IN_STRING could be used to prevent this (by entering IN_STRING whenever you see a ''"'' and staying there until you get another one.

Even this will not work if you have a string containing the \" character, so you need to check for that too!

Have fun!


Yes, I think your FSM shouldn''t be flipped into comment mode until you''ve recognised the entire token (ie. // or /*). You should be changing the FSM state based on leximes or fundamental constructs rather than individual characters. So (1) split the line into leximes and then (2) execute your FSM rules based on those leximes.

If you want to get clever, you can use a binary expression tree and "shunting" to get operator precedence etc.

Thanks for your replies

I will tryout your suggestions (i have also included C_COMMENT and CPP_COMMENT ).

Greets
Gammastrahler
Also, take a gander at Lex/Yacc. I know of at least one major engine (Torque) who''s scripting interpreter was written using it.
daerid@gmail.com

This topic is closed to new replies.

Advertisement