string tokenizing and parsing...

Started by
6 comments, last by Zahlman 16 years, 8 months ago
Hello. I'm currently working on creating a simple text file format that I can use for 3D meshes. It's very, very simple, and will only be used temporarily, and will soon be thrown away. However, I ran into an interesting problem while doing this. The format is very basic. There is one vertex per line, and the vertices each have a xpos, ypos, and zpos component, as well as a color component, each separated by tabs. So, the file would look something like this (excluding header info):

 0.0   2.0   10.0   0xff00ff00
-1.0  -2.0   10.0   0xff00ff00
 1.0  -2.0   10.0   0xff00ff00
My plan was to first tokenize the file using \n as the delimiter, and then tokenize that token into sub-tokens using the tab character as the delimiter. The reason I think that this will not work is because, if I grab a vertex with strtok using the \n character, then strtok will keep track of what char* I am tokenizing as well as the address of the last token, so that if I call strtok again, passing NULL as the string to tokenize, then it will automatically get the next token for me. However, if, after grabbing that vertex, I use strtok to further split the token into each of its components (x, y, z, color), then I have to pass the char* of the vertex token into strtok. That's fine, except that strtok's static char*'s are now keeping track of the vertex token itself and not the whole file string. So, when it comes time to grab the next vertex, simply passing NULL to strtok won't work. I have no problem changing the file format to be easier to tokenize, however I am curious to see if anyone knows of an elegant solution for this problem.
Advertisement
sscanf()
Excellent, thank you.
With a file format that simple with each field separated by some form of whitespace and each vertex on a new line why not just use the standard build in iostream functionality. Personally I would advise against using sscanf if using c++ (thank you Hollower) and instead go with an easier/safer solution.

Just open the file as an ifstream and read in the data token by token.

As an example
#include <fstream>int main(){   std::ifstream file("Vertices.txt");   if (file)   {      // read header data to figure out number of vertices for example      int numVertices = 0;      file >> numVertices;      // read in tokens here      for (int i=0; i < numVertices; ++i)      {         int x, y, z, color;         file >> x >> y >> z >> color;         // create a new vertex here with the parameters however you want         Vertex v(x, y, z, color);      }      file.close();   }   return 0;}



Something like the above example is what I would probably do. This code was written without being tested so not sure if it compiles or anything, but the basic idea should be there with some minor tweaking that may be needed for your case.

[Edited by - vtchill on August 19, 2007 1:27:49 PM]
That seems like a good way to go as well. Does the >> operator stop on all whitespace characters, i.e. tabs and \n's and \r's?
Quote:Original post by CDProp
That seems like a good way to go as well. Does the >> operator stop on all whitespace characters, i.e. tabs and \n's and \r's?


Yep, it stops on all whitespace. Another reason to use it is that you don't need to deal with character arrays.
This would be one of those situations where knowing which language you are using is important. The use of strtok and char pointers in the first post indicates C, and fpsgamer's reference to sscanf is appropriate for a C solution. vtchill's example is entirely C++, however.
Reading things a line at a time is a fine idea (makes it easier to recover from errors in the file), and you don't need to store a count of vertices at the beginning, with modern C++ tools such as std::vector (although it might help in terms of efficiency). There's also no need to .close() file stream objects explicitly, in normal cases. We can take your "re-parse a line" approach in modern C++ by using std::getline() to read lines of the file, and creating std::stringstream objects from them.

#include <fstream>#include <sstream>#include <vector>#include <string>std::vector<Vertex> load(const std::string& filename) {  // We will use two-phase construction to set up the ifstream in a more  // convenient way: it will automatically throw an exception if anything  // goes wrong, e.g. file not found or corrupted somewhere.  std::ifstream file;  file.exceptions(std::ios::badbit | std::ios::failbit);  file.open(filename.c_str());  std::vector<Vertex> result;  std::string line;  while (std::getline(file, line)) {    int x, y, z;    string colour;    if (!(std::stringstream(line) >> x >> y >> z >> colour)) {      // line has bad data. You might throw an exception, or just ignore the line    } else {      result.push_back(Vertex(x, y, z, colourToInt(colour)));    }  }  return result;}


But yeah, don't use strtok(). Its own documentation, on many systems, says never to use it. :)

This topic is closed to new replies.

Advertisement