Sign in to follow this  
Gage64

[C] Skipping comments in a text file

Recommended Posts

I need to read values from a text file while skipping comments. A comment starts with a '#' and continues until the end of the line, but the catch is that it can start anywhere, not just at the beginning of a line. For example, this is a valid file:
# comment
 5
6   #comment
  #comment
    7
I want to write a function that skips all comments up to a new value, but I can't think of a clean way to do it. Here's what I came up with (no error checking):
void skipLine(FILE *file) {
    char ch;

    while ((ch = (char)fgetc(file)) != '\n')
	;
}


void skipComments(FILE *file) {
    char ch;

    while (1) {
	fscanf(file, "%c", &ch);
	if (ch == '#')
            skipLine(file);
        else {
            ungetc(ch, file);
	    return;
        }
    }
}

What bothers me the most is the use of ungetc() - it just doesn't feel very elegant. Is there another way to do this? Note that this has to be done in straight C. Thanks in advance.

Share this post


Link to post
Share on other sites
Mu. Don't make a "skipComments" function, make a "readLine" function which strips out any comments it happens to come across. Or "readInt" or whatever.

Share this post


Link to post
Share on other sites
Or, for non-whitespace sensitive parsing, use recursion:


int skip(FILE *fp)
{
int c=fgetc(fp); while(isspace(c)) c=fgetc(fp);

if(c=='#')
{
while(c!=EOF && c!='\n') c=fgetc(fp);
return skip(fp); // teh recursive bitz
}

return c;
}


Not tested, but the general idea. Deals with several comments in a row. Rest of the parser then uses skip like:


type next(FILE *fp)
{
int c=skip();

if(c==EOF) return EndOfFile; // etc
}


Approach I've been using with my scripting languages for years and my computer hasn't exploded yet.

Share this post


Link to post
Share on other sites
Quote:
Original post by EasilyConfused
...


I think your function reads an extra character, which is a problem because the file can contain numbers with several digits. This is why I used ungetc().

Quote:
Original post by Sneftel
Don't make a "skipComments" function, make a "readLine" function which strips out any comments it happens to come across.


But wouldn't such a function also need to use skipComments()?

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?


Maybe I'm missing the point, but if you want to read three numbers from a text file, you have to separate the characters that belong to the numbers from the characters that don't. This is what I refer to as skipping and is the functionality that I'm trying to implement.

So, I don't understand what you're trying to say here.

Share this post


Link to post
Share on other sites

/* pretend that this is actually safe I'm insufficiently bored to write
the C code necessary for the memory allocation to be correct */
char * read_line(FIlE * file) {
char * line = malloc(LINE_SIZE);
fgets(line, LINE_SIZE, file);
char * comment_start = strstr(line, "#");
if (comment_start) *comment_start = 0;
return line;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
Quote:
Original post by Sneftel
Quote:
Original post by Gage64
But wouldn't such a function also need to use skipComments()?

No. There's no such thing as skipping. Skipping is simply what happens when you fail to take an action as a result of reading something. If I read three numbers, then add the second and third ones and print the result, have I not skipped the first number?


Maybe I'm missing the point, but if you want to read three numbers from a text file, you have to separate the characters that belong to the numbers from the characters that don't. This is what I refer to as skipping and is the functionality that I'm trying to implement.

So, I don't understand what you're trying to say here.

No, no. Listen carefully.


Suppose I have a file which consists of the following text, in its entirety: "a3d4ba6gh". I am responsible for adding up all the numbers (which are assumed to each be one digit), while ignoring all letters.

Here's the first way I can do it: I write a function SkipLetters, which tries to discard any letters from the stream, without killing any digits. I write a function GetChar, which gets the next char and returns it. I call these two functions alternately, each time assuming the result of GetChar is a digit (because I've discarded any letters). The problem is implementing SkipLetters, assuming I don't want to use ungetc.

Here's the second way I can do it: I write a function GetDigitOnly, which, in a loop, tries to read a digit from the stream. Each time it fails to get a digit (because it has instead gotten a letter) it just tries again. When it successfully reads a digit, it returns it. I simply call this function until I run out of file. And I didn't need to use ungetc.

Note that this is exactly what EC's code is doing, except with recursion converted to iteration.

Share this post


Link to post
Share on other sites
I have something similar,my comments start with // and they can be on the beginning of the line or anywhere on the line.The code is in c++ put it can be converted to C easily.In fact it was using C some time ago but I changed it to use streams.
Here is the function

bool STUtil::skipComments(std::istream& stream)
{
bool bComment = false;

int iCharRead = 0;
char c = '0';

do{
stream>>c;//fgetc or something
iCharRead++;

if(c == '/'){
stream>>c;//fgetc or something
iCharRead++;
if(c == '/'){
//yay its a comment
return false;
}
}
else if(c == TAB || c == SPACE || c == LINEFEED)
bComment = true;
else if(stream.eof())
return false;
else
bComment = false;
}while(bComment);

stream.seekg(-iCharRead,std::ios_base::cur);//fseek?
return true;
}


Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
...


I think that's exactly what I wanted (but I'm too tired to be sure). Thank you.

Quote:
Original post by Sneftel
Here's the second way I can do it: I write a function GetDigitOnly, which, in a loop, tries to read a digit from the stream. Each time it fails to get a digit (because it has instead gotten a letter) it just tries again. When it successfully reads a digit, it returns it. I simply call this function until I run out of file. And I didn't need to use ungetc.


What if the numbers can be more then one digit and I want to use fscanf() to read them and not do the parsing myself?

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
What if the numbers can be more then one digit and I want to use fscanf() to read them and not do the parsing myself?
*shrug* fscanf doesn't support that. That has no bearing on your particular problem, though, so I'm not sure why you ask.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by Gage64
What if the numbers can be more then one digit and I want to use fscanf() to read them and not do the parsing myself?
*shrug* fscanf doesn't support that. That has no bearing on your particular problem, though, so I'm not sure why you ask.


In your example you're right, but for my example it will work (I assume there's some space between a number and the # following it).


skipComments(file);
fscanf(file, "%d", &num1);
skipComments(file);
fscanf(file, "%d", &num2);
skipComments(file);
fscanf(file, "%d", &num3);



Of course, like you said, this will be cleaner using a readInt() function, but that function will also need to use skipComments().

Share this post


Link to post
Share on other sites
Read the line. Strip any trailing comment. If the result is empty, move to the next line. Otherwise, use sscanf to read the number. Seriously, I don't mean to be snide, but how are you not getting this?

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
how are you not getting this?


My thoughts exactly...

Quote:
Read the line. Strip any trailing comment. If the result is empty, move to the next line. Otherwise, use sscanf to read the number.


I thought as much, but I don't know if it's any better than the snippet I posted above, given that more than one number can appear on a line.

Anyway, thank you and everyone else for their help.

Share this post


Link to post
Share on other sites
Can there be anything in between numbers on a line?

Can there be anything before numbers on a line?

Is a particular number of numbers expected on any given line? Is there a limit?

Does the program care how many numbers are on each line? Or is it just trying to grab all the numbers in the file and put them in a single sequence?

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Can there be anything in between numbers on a line?

Can there be anything before numbers on a line?


Only white-space.

Quote:
Is a particular number of numbers expected on any given line? Is there a limit?

Does the program care how many numbers are on each line? Or is it just trying to grab all the numbers in the file and put them in a single sequence?


There are only 4 "tokens". The first is a magic number consisting of two characters ("P3" or "P6"), then 3 more numbers - width, height and depth. Each token is not necessarily on a separate line.

A comment can appear anywhere before the depth (I'm guessing it can't appear before the magic number, but this isn't specified so I'm not assuming it).

Also, I assume that there's some space after a token and before a trailing '#'. This isn't specified either, but if it's not true than (I think) I can't use fscanf() anyway, and wanting to use it is basically the reason I started this thread. So let's assume it's true just for the sake of discussion.

Share this post


Link to post
Share on other sites

#include <stdio.h>
#include <ctype.h>

int skip(FILE *fp)
{
int c=fgetc(fp); while(isspace(c)) c=fgetc(fp);

if(c=='#')
{
while(c!=EOF && c!='\n') c=fgetc(fp);
return skip(fp); // honestly, this will work :)
}

return c;
}

enum { eof,number,error };

int next(FILE *fp,char *token)
{
int c=skip(fp);

if(c==EOF) return eof;

if(isdigit(c)) // or isalnum() or whatever
{
while(isdigit(c)){ *token++=c; c=fgetc(fp); }
ungetc(c,fp);

*token='\0';
return number;
}

return error;
}

int main()
{
char token[256];
FILE *fp=fopen("test.txt","r");

if(fp)
{
int n=next(fp,token);
while(n!=eof)
{
if(n==error){ printf("syntax error\n"); fclose(fp); return -1; }

printf("[%s]\n",token);
n=next(fp,token);
}

fclose(fp);
}
}




Obviously you want to add checking for buffer overflow when building the token.

Share this post


Link to post
Share on other sites
I used something like this, and if it does not = // just else it.


fptr1 = fopen ("data/accounts.txt" , "r");
if (fptr1 == NULL) //if we failed to open the file for some reason
{
cout << "**Failed To Load Accounts.txt File**\n";
ErrorToFile("**Failed To Load Accounts.txt File**");//send the error to our file
return;
}
else
{
char line[MAX_STRING];
while(fgets(line, MAX_STRING, fptr1))
{
if(line[0] == '/' && line[1] == '/') //we are skipping this line
{ // skip comment line
continue;
}
}
}

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this