Archived

This topic is now archived and is closed to further replies.

Using stringstream for parsing

This topic is 5980 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a stream (file stream, let''s say) that''s sending me text data. The data is formatted as follows: "1,2,3;text\n" 1, 2 and 3 are variable-length integers. text is variable-length text. The remaining delimiters always appear. Question: shouldn''t I be able to use STL''s iostreams to parse this? I figured some combination of stringstream and string would allow something like this:
string line = "1,2,3;text\n";
/* magic stringstream stuff */
/* int data[3] now equals { 1, 2, 3 } */
/* string text now equals "text"
 
One would think the operator >> would come in handy, but I can''t figure out how to make it tokenize anything but whitespace. There has to be something I''m missing; all that iostream stuff and I still don''t have sscanf functionality?!

Share this post


Link to post
Share on other sites
Take a look at istream::getline.

The >> operator assumes the delim is whitespace.

I would suggest using binary data as it is easier to read/parse.

I assume you want to do text io for easy editing. Well, I have a class that can do binary/text I/O using <<,>> using the same piece of code.

It''s at code section at www.flipcode.com. That version contains a small problem and requires a minor fix. If are interested, I can post my updated version.

Share this post


Link to post
Share on other sites
quote:
Original post by Void
Take a look at istream::getline.


I need to use the commas, semicolons and newline as delimiters, not the newline.

I tried istream::get, where you''re allowed to specify your delimiter, but that doesn''t empty out the input buffer. In other words, if I tried to use get 3 times with comma, comma, semicolon, I would get "1" "1" "1,2,3". No good.
quote:

The >> operator assumes the delim is whitespace.


Which is the problem.
quote:

I would suggest using binary data as it is easier to read/parse.


That would be nice, but this is in fact a stream of data coming from a device driver for an embedded application. I''m not free to change the stream since I''m not generating it.
quote:

I assume you want to do text io for easy editing.


I want to be able to parse a character stream without having to use stdio.

Share this post


Link to post
Share on other sites
istream::get does not remove the delim from the stream. You need to remove the delim manually yourself.

istream::getline will remove the delim from the stream.

I tested the code below on a text file will has data

123;456;

  
#include <fstream>
#include <iostream>
using namespace std;

int main()
{
ifstream is("test.txt");

if ( !is )
cout << "Error opening file" << endl;

char buf[20];
is.getline(buf, 20, '';'');

cout << buf << endl; // print 1,2,3


is.getline(buf, 20, '';'');
cout << buf << endl; // print 4,5,6


return 0;
}


The other way to use >> is to use a custom locale/facet. I have little exp in facets, so can''t help you much. The recent CUJ articles teaches about locales/facets. You can check it out at www.cuj.com

Share this post


Link to post
Share on other sites
Just a technicality... What you''re having trouble with is lexing, not parsing. Yeah, yeah, it''s a technicality.

But why not use Flex? It''s a great tool for problems like this. It makes them basically trivial. Sorry for the lack of a link, but it shouldn''t be hard to find.

Share this post


Link to post
Share on other sites
quote:
Original post by Stoffel
I need to use the commas, semicolons and newline as delimiters, not the newline.

istream::getline has an optional 3rd parameter where you can specify the delimiter.

quote:
I tried istream::get, where you''re allowed to specify your delimiter, but that doesn''t empty out the input buffer. In other words, if I tried to use get 3 times with comma, comma, semicolon, I would get "1" "1" "1,2,3". No good.

If it doesn''t empty out the input buffer, then your copy of the library is broken. In your example, using get() with a comma as a delimiter should extract the "1" and leave the comma. (The difference between that and getline is just that getline() extracts the delimiter too.) Have you applied the patches on the Dinkumware site?

Here: a snippet of ugly code that works for me (whether it does what you''re asking for or not, I have no clue ) If it doesn''t work for you, then perhaps you need to apply a fix.
  
using namespace std;

string line = "1,2,3;text\n";
stringstream strstrm(line);

char num1[100];
char num2[100];
char num3[100];
strstrm.getline(&num1[0], 100, '','');
strstrm.getline(&num2[0], 100, '','');
strstrm.getline(&num3[0], 100, '';'');
char txt[100];
strstrm.getline(&txt[0], 100); // uses \n delimiter


cout << num1 << "/" << num2 << "/" << num3 << "/" << txt << endl;

Share this post


Link to post
Share on other sites
(Not editing the above post, cos editing posts with source tags often gives me grief).
PS. I appreciate that iostreams seems to have no good way to read directly into types other than strings. This is the best hack I could think of (same functionality as above code):

  
int n[3];
string txt;
char placeholder; // represents a 1-char delimiter

strstrm >> n[0] >> placeholder >> n[1] >> placeholder >> n[2] >> placeholder >> txt;
cout << n[0] << "/" << n[1] << "/" << n[2] << "/" txt << endl;

Share this post


Link to post
Share on other sites
SWEET! That last one almost did it. I had to do something different to handle text, but here it is:

  
string hello = "1,2,3;hi there";
stringstream strm (hello);
int a,b,c;
char delim;
strm >> a >> delim >> b >> delim >> c >> delim;
stringstream text;
strm.get (*text.rdbuf ());


a, b, and c have the correct values. delim is a throw-away. text.str () returns "hi there".

Something I had to find by digging through the source code: MSVC 6.0 (SP4) STL lists the following overloads for istream:
  
basic_istream& get(basic_streambuf<E, T> *sb);
basic_istream& get(basic_streambuf<E, T> *sb, E delim);

This is handy, because ios::rdbuf () returns a basic_streambuf*. However, this overload WOULD NOT WORK. Digging through the source code, I found that istream::get actually accepts a REFERENCE, not a POINTER. I don''t know if this is a documentation error or a coding error. In order to make it work, I had to dereference text.rdbuf ().

Regardless, I am now a happy clam because this works. Thanks for your help, all.

Share this post


Link to post
Share on other sites