Jump to content
  • Advertisement
Sign in to follow this  

Split by spaces

This topic is 626 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I was just parsing huge text files without some sort of formatting.  The only thing seemed apparent is that in each row the fields are separated by spaces, and there is a fixed number of fields in each row.  The number of spaces are not the same throughout the file.  To throw a bit more challenge to the mix, there is a date field somewhere in there in the form of "Wednesday February 10, 2000". It looked something like this:

"   John   10    20    Wednesday Febuary 10, 2000            Dexter  19.9    "

I was reading the file, one line at a time, and start extracting the fields. I knew to not split by a space because otherwise the date field would be separated.  I got an idea, how about I split by two spaces?  This would put the date field into one field as expected, and automatically trim all those extra spaces in the middle.  The code worked like a charm.  Sometimes you'd get an extra space tagged to a field, but it could be easily removed with a simple call to trim.

I wrote the tests to try various numbers of spaces, and the code can even detect invalid lines and drop them as not to corrupt the data.  This was done through a simple check of the number of fields parsed, which should return 6 for all of them.

Great. So I started parsing the real text files, and..got an empty result back, as in no lines were being parsed.  Huh, what did I do wrong?  I came back to the test data, and messed with it in many other different ways, including putting more spaces at the front and at the end of each line.  The tests all came back successfully.  The code successfully parsed the fields.  Then, what did go wrong?

An hour and dozen of printf statements later, I found out that all the lines from the real text files came back with 7 fields instead of the expected 6, and the last field is an empty string.  "How did that get in there?"  I copy-pasted some lines from the real files to my tests to make sure that I did not go crazy, and sure enough those lines failed tests even though they looked identical.  What made them different? I turned on "show whitespace" on editor, and saw no tab characters, and I started thinking if there was some alternative unicode for a space that's messing with me.

Then I started counting the trailing spaces..and they came back odd.


Edited by alnite

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!