File parsing with STL

Started by
7 comments, last by Amnesiac5 16 years, 11 months ago
I've been working on parsing obj files. My first attempt mostly used the STL, subsequently I discovered Nate Robbins's glm which is non-STL based and an order of magnitude faster, at least. I'm still pretty green, so is it more likely that I've written the code poorly (I've tried to be structured in my approach), or (as I suspect) is STL a poor choice for speed for this kind of thing (file handling and parsing)? More importantly, this is not a task I wanted to undertake in the first place. Granted it's proven to be a learning experience and introduction to the STL and file handling, but on balance I think I would have preferred using my time more constructively. I can't believe that it is so frequently necessary to re-invent the wheel, yet again, for umpteenth time. So, why aren't there readily available libraries available to handle loading meshes and textures? Or am I being dense and missing something obvious? Amnesiac5
Constipation is the thief of time.Diaorrhea waits for no man.
Advertisement
Perhaps you were trying to process the file while reading it, which is slow because the program has to wait all the time for the disk retrieving the next little piece of data. It's usually much faster to binary-read the complete file at once and process the data in memory. If the file is too large you could read in in blocks of sufficiently large size. Other than that I don't suspect the STL would make such a drastic difference.
Most people experience "slow" SC++L performance because they're either doing something that is slow that is unrelated to the SC++L, and not realizing it (as the above poster suggested), or they're using the SC++L itself incorrectly (often by not understanding the performance guarantees of various containers and thus essentially making poor algorithmic optimizations).

Perhaps if you showed us some code, we could provide a more accurate analysis.
Quote:Original post by Amnesiac5
Is STL a poor choice for speed for this kind of thing (file handling and parsing)?


I would think the STL is a poor choice for file handling and parsing, since it offers no facilities to support those things.

Supports for file handling and parsing are available in the C++ standard library -- and available for well over a decade now, considering the draft standard was published and remained effectively unchanged for years before the standard was approved in 1997. File handling using the C++ standard library is as fast as and often faster than that using the C standard library.

As to the availability of libraries for parsing and loading files: there's a plethora. Most are written for C programmers, but that's okay as long as they work. Complaining about a having to reinvent them because of a dearth is nonsense. (Ooh, two sequipedalian words in one paragraph, I've excelled myself). Choose one, use it, and move on.

Stephen M. Webb
Professional Free Software Developer

Quote:I can't believe that it is so frequently necessary to re-invent the wheel, yet again, for umpteenth time. So, why aren't there readily available libraries available to handle loading meshes and textures? Or am I being dense and missing something obvious?


NIH (not invented here) syndrome, which leads to little code being available publically and without restrictions, suitable for your particular platform and needs.

Side-effect is that such code is usually not portable, of poor quality (either directly or indirectly), or simply has some weird restrictions (GPL).

Large companies probably re-use some of such low-level code. But everyone else is doing it "to learn" or "needs faster implementation" or "doesn't like X" and rolls their own.

Note about code quality: Directly poor code is one that is bugged, doesn't respect language conventions ( mixing C and C++ ). or has other blatant problems. Indirectly poor code is "perfect", passes all lint tests, is extremly OO, with UML, unit tests, etc., but completely fails to solve the problem. For example, a mesh loader that uses structures completely unreated to DirectX, and hides all the details from the user making such conversions tedious without modifying the code.

Developing good, re-usable code is hard. Especially in C/C++, with all the myriad of APIs that exist, and compatibility issues that need to be overcome.

Other than that: (cookie cutter response) STL isn't slow. The way you used it was (perhaps STL was wrong choice altogether).

And: Writing file loaders for many formats is such a trivial task, that nobody bothers to make a re-usable library.
When you say slow, do you mean slow in debug or release?

For some of the newer implementations of the STL the debugging iterators and container checking can add massive overheads for debug builds, this is particularly noticeable with the STL that ships with VC2005.

Apart from that if you can, profile your app - what exactly is slow?

My typical performance checklist runs something like:

* Am I copying things too often? Am I passing by const ref everywhere I should?
* Am I using iterators everywhere I should (or am I using checked calls like .at())
* Am I using containers with the correct performance guarantees? (for example, am I removing things from the front of vectors)
* Am I using containers with known performance issues? (std::deque performs better than std::list in lots of cases)
* Am I frequently sorting or searching containers? (should I consider using a std::(multi)set / std::(multi)map
* If I'm using associative containers do the TR1 hash containers help?
* If I have to roll my own special type of container based on the standard library (trees or graphs), have I chosen an efficient representation for the dataset I am expecting (adjancency lists or matrices rather than just edge lists).
* Am I storing values by (smart)pointer where appropriate in my containers to minimise copy overheads
* Am I generating too many temporaries? (Can I use expression templates to help)
* Do I have realistic expectations of the platform agnostic streams library? Can I get better performance out of platform specific code.
Sorry it's taken me a while to reply, I have been reading everyone's feedback, but it's taken me a while to dig out the original projects I was working on and check a few things.

Thanks to all who responded, thanks, you've provided me with some valuable insights and food for thought.

Some specific points...
Quote:Original post by Bregma
I would think the STL is a poor choice for file handling and parsing, since it offers no facilities to support those things.
I thought that ifstream (for file handling) and string (for parsing) were part of the STL, and reading elsewhere, I'm not the only one. I stand corrected (live and learn).

I am using vector (for storing the results) and algorithm (for parsing).
Quote:Original post by Bregma
As to the availability of libraries for parsing and loading files: there's a plethora. Most are written for C programmers, but that's okay as long as they work. Complaining about a having to reinvent them because of a dearth is nonsense.
Why?
Quote:Original post by Bregma
Choose one, use it, and move on.
I've looked but found precious few (for different file formats) hence my own attempts. I briefly looked at flex, but that seemed to be overkill. Beyond that, as I thought I'd intimated, I've been unable to find any obj loaders apart from Nate Robbins's GLM (and Rob the Bloke's obj loader). It's just one more thing amongst a myriad of drains on my time (and not just whilst programming, I have a real life too, you know). Perhaps you could point me in the right direction.

On reflection, I think I was being overly ambitious. After reading the obj file spec, I became aware of a few shortcomings in GLM and then became fixated on the idea that any obj loader should support all features, have rigorous error trapping (is exception handling STL, C++ standard library or something in its own right?), etc.. I think this was motivated mainly by the wish to use models found on the web. Future efforts will be directed to a much simpler, greatly reduced set of features which I'll extend if needed.
Quote:Original post by Antheus
(cookie cutter response) STL isn't slow. The way you used it was (perhaps STL was wrong choice altogether).
Which is my suspicion, as I hope I intimated.
Quote:Original post by Antheus
Writing file loaders for many formats is such a trivial task, that nobody bothers to make a re-usable library.
Gee, I wish I was as clever as you [wink]. I'm afraid I find this anything but trivial... but that is probably because my ambition outstripped my nascent ability.

Again, thanks to all.

A5
Constipation is the thief of time.Diaorrhea waits for no man.
Quote:
I thought that ifstream (for file handling) and string (for parsing) were part of the STL

ifstream and its ilk are part of the standard C++ library (SC++L), not the STL. The STL is an older library, parts of which were subsumed into the SC++L -- it's a minor terminology issue (they are commonly used interchangeably, but technically they're not).

To get to your original question of
Quote:
why aren't there readily available libraries available to handle loading meshes and textures? Or am I being dense and missing something obvious?

the answer is twofold.

First, as another poster said, it's pretty trivial to actually load "stuff" from files that are formatted a certain way, assuming you know the format of the file. Does the format stipulate the next data element will be a byte? Then read a byte and store that in a variable named appropriately (according to what the spec says that byte is used for), and so on. That's just brute-force stuff that's pretty easy to do.

But the second problem is that that loading the data from the file isn't the important part. Putting that data into your own internal format is. Very often, file formats are designed for storage, not for actual use. For example, the .obj format is not arranged in a way that makes it directly suitable for rendering without converting the data some other format. Actual, useful loading subsystems will load data from a particular format into whatever format and structures the rest of your code uses to render things efficiently (for mesh formats, at least).

See the problem here? One cannot make a reusable, generic library that loads mesh data from a standard format to your internal format -- that would require knowledge of your specific format, and would thus remove generality from the library. You could write a library that loaded on-disk representations of .obj files into in-memory representations of .obj files, sure, but that wouldn't be useful: as I touched on, file formats are designed for storage, not rendering, so a pure in-memory .obj (or .md2 or .md3, whatever) is likely inefficient for practical, real-world use(*). And an in-memory representation of the .obj file requires you to do just as much parsing on your end to convert to your data structures, so there is really nothing gained by doing it.

(*) This doesn't stop people from trying to write, for example "MD2Model" classes that hold the data in a format similar to how the MD2 model was stored on disk and render from there. It's functional, but ugly and not particularly efficient in general.

Quote:Original post by jpetrie
(*) This doesn't stop people from trying to write, for example "MD2Model" classes that hold the data in a format similar to how the MD2 model was stored on disk and render from there. It's functional, but ugly and not particularly efficient in general.
Fair dos.
Makes sense.
Point taken. [oh]
Sums it all up really.[dead]

A5
Constipation is the thief of time.Diaorrhea waits for no man.

This topic is closed to new replies.

Advertisement