Deleting comments from C source files
hi, i´m currently writing a small app that reads in
C and C++ source files and then removes all comments from it.
I´m using a FSM approach, and it works almost fine, but since
my FSM goes into the state STATE_SLASH when the first "/" is encountered, those will be eaten and not added to my output file since i must assume it could mark the beginning of a comment.
But if this is a arithmetic operator, it will be deleted too! How could i distingush between those cases?
Could someone give me a hint?
thanks
Gammastrahler
[edited by - Gammastrahler on September 30, 2002 11:47:24 AM]
Is there a reason for writing this app? It sounds like a horrible idea! :-)
Helpful links:
How To Ask Questions The Smart Way | Google can help with your question | Search MSDN for help with standard C or Windows functions
Helpful links:
How To Ask Questions The Smart Way | Google can help with your question | Search MSDN for help with standard C or Windows functions
well, real soon now, i want to code a scripting engine.
but i want to start as simple as possible so i need to remove the comments first
i have read some documents about the method of a lexer. The classic algorithm is a FSM.
OK, i could well check for /* or // but that does not work when comments close, for example you can also have some spaces between the * and the / or even a newline so you can´t just test for the next char.
so it is more suitable for a FSM.
[edited by - Gammastrahler on September 30, 2002 12:15:26 PM]
but i want to start as simple as possible so i need to remove the comments first
i have read some documents about the method of a lexer. The classic algorithm is a FSM.
OK, i could well check for /* or // but that does not work when comments close, for example you can also have some spaces between the * and the / or even a newline so you can´t just test for the next char.
so it is more suitable for a FSM.
[edited by - Gammastrahler on September 30, 2002 12:15:26 PM]
In your state_slash look at the next non-blank character, if it is a ''/'' or a ''*'' then you''ve got a comment, otherwise just output a ''/'' followed by the character you''ve just read.
HTH
Andrew
HTH
Andrew
Read in characters in the SCAN state until you get a slash ''/'' char, on reading the slash, enter state SLASH as before, in state SLASH, read a second character (after storing the first in a temp char). If this character is a ''*'', enter state C_COMMENT, if it is another ''/'', enter the state CPP_COMMENT, if it is neither, write the original character, followed by the new one to your output file and return to the SCAN state.
In state CPP_COMMENT read and bin all chars until the CR LF chars
In state C_COMMENT read and bin all chars including CR and LF until you find a ''*'' followed immediately by a ''/'' using the same principle as above.
This does not completely solve the problem. Concider the following line:
fprintf(fh, "// A comment written to a file\n\r";
This would become
fprintf(fh, "
Oops
Another state IN_STRING could be used to prevent this (by entering IN_STRING whenever you see a ''"'' and staying there until you get another one.
Even this will not work if you have a string containing the \" character, so you need to check for that too!
Have fun!
In state CPP_COMMENT read and bin all chars until the CR LF chars
In state C_COMMENT read and bin all chars including CR and LF until you find a ''*'' followed immediately by a ''/'' using the same principle as above.
This does not completely solve the problem. Concider the following line:
fprintf(fh, "// A comment written to a file\n\r";
This would become
fprintf(fh, "
Oops
Another state IN_STRING could be used to prevent this (by entering IN_STRING whenever you see a ''"'' and staying there until you get another one.
Even this will not work if you have a string containing the \" character, so you need to check for that too!
Have fun!
Yes, I think your FSM shouldn''t be flipped into comment mode until you''ve recognised the entire token (ie. // or /*). You should be changing the FSM state based on leximes or fundamental constructs rather than individual characters. So (1) split the line into leximes and then (2) execute your FSM rules based on those leximes.
If you want to get clever, you can use a binary expression tree and "shunting" to get operator precedence etc.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement