Parsing Text
Hi everyone,
I need to write, or find a library, that allows me to parse words into individual (pre-defined) pieces so that I can manipulate/exchange/remove them individually. What would be the best way to go about this in C++?
Hi
Well first of all the std::fstream is very good for this sort of thing. If you look here you can see the implementation of the get line function and in particular the overload that has the delimiter parameter.
This can be used to tokenise the incoming data.
Another option is to copy the entire file to a string, if it isnt too big, then use the string methods to manipulate it.
Hope that helps.
ace
Well first of all the std::fstream is very good for this sort of thing. If you look here you can see the implementation of the get line function and in particular the overload that has the delimiter parameter.
This can be used to tokenise the incoming data.
Another option is to copy the entire file to a string, if it isnt too big, then use the string methods to manipulate it.
Hope that helps.
ace
What you are looking for is a lexer.
I suggest that you use one of the many free or open source lexer generators that exist around, such as lex or flex. They let you specify the words or patterns you want to recognize, and transform these words into tokens as they encounter them.
Most implementations are linear in complexity.
I suggest that you use one of the many free or open source lexer generators that exist around, such as lex or flex. They let you specify the words or patterns you want to recognize, and transform these words into tokens as they encounter them.
Most implementations are linear in complexity.
Do you know of any good, complete, tutorials for manipulating strings in C++?
I've dealt to a degree with this in other simpler languages but never in C++ before.
I've dealt to a degree with this in other simpler languages but never in C++ before.
Slightly off-topic:
Why do people suggest full on lexers for simple string parsing? IMO, they're decidedly not for beginners, as they're one of the most arcane things commonly available. Even a good regex library would be better it seems...
Am I just a bonehead? Just something that's always bothered me.
Why do people suggest full on lexers for simple string parsing? IMO, they're decidedly not for beginners, as they're one of the most arcane things commonly available. Even a good regex library would be better it seems...
Am I just a bonehead? Just something that's always bothered me.
Good question.. I've been looking through some information on lexers and it's making my head hurt :D.
I guess what I could use is a very complete tutorial on string manipulation. Anyone have a recommended tut?
I guess what I could use is a very complete tutorial on string manipulation. Anyone have a recommended tut?
Quote:Original post by Telastyn
Slightly off-topic:
Why do people suggest full on lexers for simple string parsing? IMO, they're decidedly not for beginners, as they're one of the most arcane things commonly available. Even a good regex library would be better it seems...
Am I just a bonehead? Just something that's always bothered me.
I beg to disagree. The way a lexer works, as seen from the outside, is extremely simple: you decide what the lexer should do when it encounters a given pattern by writing some source code, as you would when using a regexp library. For instance:
foo Foo( );
Now, you generate a lexer and compile it, and make it run on some text: whenever it encounters the word "foo", it will execute the code Foo( );
Also, lexer generators do not need filler code other than "%%" (filler code being anything that is not a regexp definition or something to be done when a regexp is detected), while regexp libs require you to do regexp object building from strings (or worse) and keeping track of where each word ends.
Take a look at these examples to find out how hard to understand flex code is (and how much harder than regexps).
There's a difference between reading something and writing it. I've been coding in something off and on for years, and that's still mostly gibberish to me. I can only imagine how much beginners will grasp of it.
Perhaps it's similar to my experience with regexes, which were similarly gibberish until I found a resource that actually explained things well amongst the many that did not help at all...
Perhaps it's similar to my experience with regexes, which were similarly gibberish until I found a resource that actually explained things well amongst the many that did not help at all...
Quote:Original post by Telastyn
There's a difference between reading something and writing it. I've been coding in something off and on for years, and that's still mostly gibberish to me. I can only imagine how much beginners will grasp of it.
Perhaps it's similar to my experience with regexes, which were similarly gibberish until I found a resource that actually explained things well amongst the many that did not help at all...
I'd have to agree with Telastyn. Lexers are arcane nightmares for beginners. Granted, once you understand lex/yacc or flex/bison, you can whip up a simple parser in minutes rather than hours, which is great. But it might take someone days or weeks to piece together how they work and figure out all the limitations.
I eventually got tired of dealing with the pure arcanity those systems run on and ditched my flex/bison, ANTLR, and a couple others and wrote my own that is _much_ more straight forward for someone coming from a C++ (not as much C) background.
My advice, go look at the docs for STL String. This is one of my favorite quick reference pages:
http://www.msoe.edu/eecs/ce/courseinfo/stl/string.htm
I hope that helps out some.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement