Jump to content
  • Advertisement
Sign in to follow this  

Simple parsing of a text file

This topic is 5041 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Nothing special so far, except that here is the parse function:

int main()
	char s[1024];
	Parser p;
	Parser::Identifier id;

	ifstream file("test.txt");

	while( file.getline(s, 1024) ){

		while( id=p.parse() ){
			if( id == -1 || id == -2 )
			if( id == Parser::NUMBER )
				cout << "NUMBER: ";
			else if( id == Parser::STRING )
				cout << "STRING: ";
			else if( id == Parser::QUOTED_STRING )
				cout << "QUOTED_STRING: ";
			else if( id == Parser::OPERATOR )
				cout << "OPERATOR: ";

			// print parser.text
			cout << p.text << endl;
			//cout << (int)p.text[0] << endl;

	cout << endl << endl;

return 0;

enum Identifier{
	NUMBER = (Instruction::NUM_OF_INSTRUCTIONS + 1),	///< Found a number
	STRING,			///< Found a string (letters and numbers)
	QUOTED_STRING,	///< Found a string surrounded by quotes: "hello"
	OPERATOR,		///< Found an operator.

	NUM_OF_IDENTIFIERS	///< Implicit number of identifiers

// Note: buffer is a char[1024], text and start are char*
// data members.
Identifier parse()
	int i, j;
	Identifier id;
	// Skip all whitespaces
	for(i=0; start==' ' || start=='\t' ; i++ );

	if( start == '\n' || start == '\0' )
		return (Identifier) -1;

	// if first character is a letter
	if( isalpha(start) != 0 ){
		// Copy characters until is not a letter or a number (can have numbers in strings) anymore
		for(j=0; isalpha(start)!=0 || isdigit(start)!=0 ; j++, i++)
			buffer[j] = start;

		id = STRING;

	// If first character is a number
	else if( isdigit(start) != 0 ){
		// Copy character until is not a digit
		for(j=0; isdigit(start) != 0; j++, i++)
			buffer[j] = start;

		id = NUMBER;

	// if first character is a "
	else if( start == '"' ){
		// to pass first "
		// Copy all characters until another quote
		for( j=0; start != '"' ; j++, i++)
			buffer[j] = start;


	// if none of them
		return (Identifier) -2;

	// end buffer string
	buffer[j] = '\0';

	// Free the text
	if( text != NULL ){
		text = NULL;

	// Allocate space and copy 'buffer' string to 'text' for outside access
	text = strdup(buffer);

	// advance start pointer to start of next part to parse
	start += (i+1);		// + 1 else the last letter looked at last time will be the first this time
	// return error
	return id;

The point is to be able to supply a line to the parser and then call the parse function several times until it returns -1. Each time parse is called, it works on the next bit of the line until it hits a newline char or a null char. Here is the test file:





And finally, here is the output:

STRING: hello
STRING: ²²²²
NUMBER: 67464
STRING: ²²²²
STRING: hr56355dfsd
STRING: ²²²²
STRING: dfgs56
STRING: ²²²²
QUOTED_STRING: dghd4564m,lh89
STRING: ²²²²

Can anyone think of reason why I'm getting these superscript (or sub, not sure which is which) 2's? I assume its something to do with the attempt to print out the 'text' pointer that points to nothing, but I'm not sure why its happening.

Share this post

Link to post
Share on other sites
I can't tell you too much without seeing the class declaration, but I think you are on the right track for the debugging.

However, better to stop this problem where it starts - get rid of all this weird manipulation of the "text" member, which apparently is a char * (since you're using strdup). In C++ there is generally no good reason for this low-level hackage. Make the member be a std::string, and use its .assign() method to change the contents (or the assignment operator, if you extract a std::string from the input buffer).

Better yet, why not let cin do (most of) the parsing for you?


Define a base class "Token", and subclasses Number, Operator, etc.
(Each of these contains a single data member of the appropriate
type, which holds the information for that token e.g. the
numeric value of a Number). The base class instances are empty.

Try to cin into int variable
If successful: return Number(the int variable)
cin into char variable
If it's a valid operator: return Operator(the char variable)
else if it's a double-quote:
Use 3-arg form of cin.getline() to read up to the next double-quote: std::cin.getline(temp, std::numeric_limits<int>.max(), '"');
return QuotedString(string(the read-in stuff))
(otherwise, we have a normal string...)
Make a new string with the char variable
cin a single word (by cin into a std::string) and append it to the char
return String(the string)

Share this post

Link to post
Share on other sites
I don't know, but perhaps Flex & Bison could be of use to you? [smile]

EDIT: Hehe, in case you don't know what Flex & Bison is..
They're parsing utilities. Flex can be used to quickly get tokens from a text file, and combined with bison, with which you design a structure of how tokens is combined, you can create incredible powerful parsing routines.
Parsing c-style code has never been easier [grin]

Share this post

Link to post
Share on other sites
my guess is that the line

ifstream file("test.txt");

is opening your text file in binary. in binary, on some systems, a newline char created in text mode is encoded as the sequence of charachters '0x0d' followed by '0x0a'. the '0x0a' character is the actual newline character. this is transparent when using the file in text mode.

and so your parse function is breaking when it reaches one char before the newline character (the '0x0d' char) and your STRING test case is accepting the '0x0d' char as a one character string before it actually reaches '0x0a' (newline).

to fix this, you could either open the file in text mode, or change the lines in your parse() function from:

if( start == '\n' || start == '\0' )
return (Identifier) -1;


if (start == '\n' || start == '\0' || (start == '0x000d' && start[i+1] == '0x000a'))
return (Identifier) -1;


Share this post

Link to post
Share on other sites
Some theories...


my guess is that the line

ifstream file("test.txt");

is opening your text file in binary.

This should not be the case. In order to open a file in binary mode with fstream, you need to specify the param ios::binary, like this
ifstream file("test.txt", ios::binary);

About the OP's code. You write

else if( start == '"' ){
// to pass first "

I'm not sure, but shouldn't you put a backslash(escape sequence) before the "-sign?

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!