Text File Parsing

Started by
2 comments, last by Zahlman 18 years, 7 months ago
Hello, I am writing a parser for my text based file format, and am getting into knots: its getting so damn complex, especially when I want to deal with whitespace I need loops everywhere. This is what the format looks like:

B0{S:8:6:(0.0.0):
[(0.0.0),(256.0.0),(0.0.256),(256.0.256),(0.256.0),(256.256.0),(0.256.256),(256.256.256)]:
{24:[0,1,2,3,0,2,1,3,4,5,6,7,4,6,5,7,0,4,1,5,2,6,3,7]}:
[{4:[0,1,3,2]:textureName:(u.v)},
{4:[2,4,5,7]:textureName:(u.v)},
{4:[7,6,0,1]:textureName:(u.v)},
{4:[1,5,4,2]:textureName:(u.v)},
{4:[2,3,7,6]:textureName:(u.v)},
{4:[6,0,2,6]:textureName:(u.v)}]}
Right now I am using a combination of fscanfs() and fgetc() and its becomming very complex. Is there a better way to go about all this? I am reading straight out of the file into my data structures, rather than writing the file to an array in mem then scanning through that. I dont think that would make things nay easier. Thanks Any ideas?
Advertisement
use flex and bison for text parsing ...
The standard approach is to first write an underlying 'lexer' which takes care of identify the basic language components (integers, identifiers, single characters) and handles whitespace/comments automatically.

So basically you'd an interface kind of like this:
struct token { enum {  INTEGER,  IDENTIFIER,  CHARACTER,  EOF } type; const char *idValue; signed intValue; char charValue;};token parseToken();
Often you'll need lookahead too, i.e. the ability to push back the last parsed token back onto the stream.
This is pretty much what lex/flex does for you anyway. And for relatively simple languages like this it should be fairly easy to write it manually.
Look up Boost::spirit; others here have had good results with it.

This topic is closed to new replies.

Advertisement