[C] Skipping comments in a text file

Started by
16 comments, last by yewbie 15 years, 8 months ago
I need to read values from a text file while skipping comments. A comment starts with a '#' and continues until the end of the line, but the catch is that it can start anywhere, not just at the beginning of a line. For example, this is a valid file:
# comment
 5
6   #comment
  #comment
    7
I want to write a function that skips all comments up to a new value, but I can't think of a clean way to do it. Here's what I came up with (no error checking):

void skipLine(FILE *file) {
    char ch;

    while ((ch = (char)fgetc(file)) != '\n')
	;
}


void skipComments(FILE *file) {
    char ch;

    while (1) {
	fscanf(file, "%c", &ch);
	if (ch == '#')
            skipLine(file);
        else {
            ungetc(ch, file);
	    return;
        }
    }
}

What bothers me the most is the use of ungetc() - it just doesn't feel very elegant. Is there another way to do this? Note that this has to be done in straight C. Thanks in advance.
Advertisement
Mu. Don't make a "skipComments" function, make a "readLine" function which strips out any comments it happens to come across. Or "readInt" or whatever.
Or, for non-whitespace sensitive parsing, use recursion:

int skip(FILE *fp){    int c=fgetc(fp); while(isspace(c)) c=fgetc(fp);    if(c=='#')        {        while(c!=EOF && c!='\n') c=fgetc(fp);        return skip(fp); // teh recursive bitz        }    return c;}


Not tested, but the general idea. Deals with several comments in a row. Rest of the parser then uses skip like:

type next(FILE *fp){    int c=skip();    if(c==EOF) return EndOfFile; // etc}


Approach I've been using with my scripting languages for years and my computer hasn't exploded yet.
Quote:Original post by EasilyConfused
...


I think your function reads an extra character, which is a problem because the file can contain numbers with several digits. This is why I used ungetc().

Quote:Original post by Sneftel
Don't make a "skipComments" function, make a "readLine" function which strips out any comments it happens to come across.


But wouldn't such a function also need to use skipComments()?
Quote:Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?
Quote:Original post by Sneftel
Quote:Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?


Maybe I'm missing the point, but if you want to read three numbers from a text file, you have to separate the characters that belong to the numbers from the characters that don't. This is what I refer to as skipping and is the functionality that I'm trying to implement.

So, I don't understand what you're trying to say here.
/* pretend that this is actually safe I'm insufficiently bored to write    the C code necessary for the memory allocation to be correct */char * read_line(FIlE * file) {  char * line = malloc(LINE_SIZE);  fgets(line, LINE_SIZE, file);  char * comment_start = strstr(line, "#");  if (comment_start) *comment_start = 0;  return line;}
Quote:Original post by Gage64
Quote:Original post by Sneftel
Quote:Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?


Maybe I'm missing the point, but if you want to read three numbers from a text file, you have to separate the characters that belong to the numbers from the characters that don't. This is what I refer to as skipping and is the functionality that I'm trying to implement.

So, I don't understand what you're trying to say here.

No, no. Listen carefully.


Suppose I have a file which consists of the following text, in its entirety: "a3d4ba6gh". I am responsible for adding up all the numbers (which are assumed to each be one digit), while ignoring all letters.

Here's the first way I can do it: I write a function SkipLetters, which tries to discard any letters from the stream, without killing any digits. I write a function GetChar, which gets the next char and returns it. I call these two functions alternately, each time assuming the result of GetChar is a digit (because I've discarded any letters). The problem is implementing SkipLetters, assuming I don't want to use ungetc.

Here's the second way I can do it: I write a function GetDigitOnly, which, in a loop, tries to read a digit from the stream. Each time it fails to get a digit (because it has instead gotten a letter) it just tries again. When it successfully reads a digit, it returns it. I simply call this function until I run out of file. And I didn't need to use ungetc.

Note that this is exactly what EC's code is doing, except with recursion converted to iteration.
I have something similar,my comments start with // and they can be on the beginning of the line or anywhere on the line.The code is in c++ put it can be converted to C easily.In fact it was using C some time ago but I changed it to use streams.
Here is the function
bool STUtil::skipComments(std::istream& stream)	{		bool bComment = false;				int iCharRead = 0;		char c = '0';				do{			stream>>c;//fgetc or something			iCharRead++;						if(c == '/'){				stream>>c;//fgetc or something				iCharRead++;				if(c == '/'){					//yay its a comment					return false;				}			}			else if(c == TAB || c  == SPACE || c == LINEFEED)				bComment = true;			else if(stream.eof())				return false;			else				bComment = false;		}while(bComment);		stream.seekg(-iCharRead,std::ios_base::cur);//fseek?		return true;	}
Quote:Original post by SiCrane
...


I think that's exactly what I wanted (but I'm too tired to be sure). Thank you.

Quote:Original post by Sneftel
Here's the second way I can do it: I write a function GetDigitOnly, which, in a loop, tries to read a digit from the stream. Each time it fails to get a digit (because it has instead gotten a letter) it just tries again. When it successfully reads a digit, it returns it. I simply call this function until I run out of file. And I didn't need to use ungetc.


What if the numbers can be more then one digit and I want to use fscanf() to read them and not do the parsing myself?

This topic is closed to new replies.

Advertisement