Sign in to follow this  
Weet

Parsing Text

Recommended Posts

Hi everyone, I need to write, or find a library, that allows me to parse words into individual (pre-defined) pieces so that I can manipulate/exchange/remove them individually. What would be the best way to go about this in C++?

Share this post


Link to post
Share on other sites
Hi

Well first of all the std::fstream is very good for this sort of thing. If you look here you can see the implementation of the get line function and in particular the overload that has the delimiter parameter.

This can be used to tokenise the incoming data.

Another option is to copy the entire file to a string, if it isnt too big, then use the string methods to manipulate it.

Hope that helps.

ace

Share this post


Link to post
Share on other sites
What you are looking for is a lexer.

I suggest that you use one of the many free or open source lexer generators that exist around, such as lex or flex. They let you specify the words or patterns you want to recognize, and transform these words into tokens as they encounter them.

Most implementations are linear in complexity.

Share this post


Link to post
Share on other sites
Do you know of any good, complete, tutorials for manipulating strings in C++?

I've dealt to a degree with this in other simpler languages but never in C++ before.

Share this post


Link to post
Share on other sites
Slightly off-topic:

Why do people suggest full on lexers for simple string parsing? IMO, they're decidedly not for beginners, as they're one of the most arcane things commonly available. Even a good regex library would be better it seems...

Am I just a bonehead? Just something that's always bothered me.

Share this post


Link to post
Share on other sites
Good question.. I've been looking through some information on lexers and it's making my head hurt :D.

I guess what I could use is a very complete tutorial on string manipulation. Anyone have a recommended tut?

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
Slightly off-topic:

Why do people suggest full on lexers for simple string parsing? IMO, they're decidedly not for beginners, as they're one of the most arcane things commonly available. Even a good regex library would be better it seems...

Am I just a bonehead? Just something that's always bothered me.


I beg to disagree. The way a lexer works, as seen from the outside, is extremely simple: you decide what the lexer should do when it encounters a given pattern by writing some source code, as you would when using a regexp library. For instance:

foo Foo( );

Now, you generate a lexer and compile it, and make it run on some text: whenever it encounters the word "foo", it will execute the code Foo( );

Also, lexer generators do not need filler code other than "%%" (filler code being anything that is not a regexp definition or something to be done when a regexp is detected), while regexp libs require you to do regexp object building from strings (or worse) and keeping track of where each word ends.

Take a look at these examples to find out how hard to understand flex code is (and how much harder than regexps).

Share this post


Link to post
Share on other sites
There's a difference between reading something and writing it. I've been coding in something off and on for years, and that's still mostly gibberish to me. I can only imagine how much beginners will grasp of it.

Perhaps it's similar to my experience with regexes, which were similarly gibberish until I found a resource that actually explained things well amongst the many that did not help at all...

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
There's a difference between reading something and writing it. I've been coding in something off and on for years, and that's still mostly gibberish to me. I can only imagine how much beginners will grasp of it.

Perhaps it's similar to my experience with regexes, which were similarly gibberish until I found a resource that actually explained things well amongst the many that did not help at all...


I'd have to agree with Telastyn. Lexers are arcane nightmares for beginners. Granted, once you understand lex/yacc or flex/bison, you can whip up a simple parser in minutes rather than hours, which is great. But it might take someone days or weeks to piece together how they work and figure out all the limitations.

I eventually got tired of dealing with the pure arcanity those systems run on and ditched my flex/bison, ANTLR, and a couple others and wrote my own that is _much_ more straight forward for someone coming from a C++ (not as much C) background.

My advice, go look at the docs for STL String. This is one of my favorite quick reference pages:

http://www.msoe.edu/eecs/ce/courseinfo/stl/string.htm

I hope that helps out some.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this