Sign in to follow this  
Domarius

Just want to iterate through a string :(

Recommended Posts

Domarius    192
I've been looking at tutorials and String documentation, and I just get weird errors. I won't bother posting them because I'm guessing this must be simple enough and I'm going about it all wrong. I read a line from a file (Yay I can do that) Now all I want to do is go through one character at a time, copying each character to another string... basically I'm parsing it - yes it's tokenising basically but I'm reading a CSV, so I need to ignore commas between quotes. The logic of my code is fine - I just don't know the proper syntax for iterating through a string and copying one character to another. I keep getting errors... What do I use? String stream? For or while loop? Built in C++ string functions or direct character access[?] Here's what I've got so far. Currently it crashes.
		std::ifstream fin(file);
		if(!fin)
		{
			return false;
		}
		//TODO: Need to put a check for some validation - just one row that has GameLauncherDatabase
		String line = "";
		getline(fin, line, '\n');
		if(line != "GameLauncherDatabase")
		{
			fin.close();
			return false;
		}

		bool withinQuotes = false;
		String word = "";
		while(!fin.eof())
		{
			withinQuotes = false;
			getline(fin, line, '\n');
			m_hge->System_Log("%s\n",line);
			m_hge->System_Log("line.size() = %i\n",line.length());
			for(size_t i = 0; i < line.size(); i++)
			{
				if(line.substr(i,1) == "\"")
				{
					withinQuotes = !withinQuotes;
					m_hge->System_Log("withinQuotes = %i\n",withinQuotes);
				}
				else if(line.substr(i,1) == "," && !withinQuotes)
				{
					m_hge->System_Log("%s\n", word);
					word = "";
					m_hge->System_Log("Comma\n");
				}
				else
				{
					word.append(line.substr(i,1));
					//word.append("a");
					m_hge->System_Log("word = %s\n", word);
				}
			}
			m_hge->System_Log("%s\n", word);
			//*wordIter = word;
		}
		fin.close();

Share this post


Link to post
Share on other sites
jpetrie    13161
Among other ways...

std::string a = /* ... */;
std::string b;
for(int i = 0; i < a.size(); ++i)
b += a[i];

But you should really post the code you're trying to compile and the errors you get, otherwise its going to be hard to help you.

Share this post


Link to post
Share on other sites
ToohrVyk    1596
My intuition would be:

std::string extractNext(std::string & from)
{
std::string::size_type end = from.find_first_of("\",");
while (end != std::string::npos && from[end] == '\"')
{
if (end = from.find_first_of("\"",end+1) == std::string::npos)
; // throw an exception, because of unterminated quote
end = from.find_first_of("\",",end+1);
}
std::string next = from.substr(0,end);
from = from.substr(end);
return next;
}

std::string text;
while (text.size() > 0) vect.push_back(extractNext(text));

Share this post


Link to post
Share on other sites
Domarius    192
Thanks guys - geez that was quick, I only just posted and was about to try and delete it to go and whittle down my problem properly by writing just a simple console app that does a mock up string parsing, without the other shit in there. I think I posted to early - 2am and getting frustrated I guess.

But those posts are probably enough info for me to go on. In any case I'm going to make a console app to test it out.

Thanks for the prompt replies. :)

Share this post


Link to post
Share on other sites
songho    268
I would use standard C++ iterator, something like;

std::string str = "This is a string.";
std::string::const_iterator cursor = str.begin();

// move to next char and print it out
cursor++;
std::string letter = *cursor;
std::cout << letter << std::endl; // it should be "h"

And, I made a string tokenizer using C++ string class and iterator, it behaves like strtok() in standard C library. Check it out;
String Tokenizer

Share this post


Link to post
Share on other sites
Domarius    192
Quote:
Original post by jpetrie
Among other ways...

std::string a = /* ... */;
std::string b;
for(int i = 0; i < a.size(); ++i)
b += a[i];



Ah yes about that - I tried that earler, and VS 2003 gives me a compiler warning about the i < a.size(), saying it's a conversion between signed and unsigned. I've been taught to get rid of warnings - what do I do there?

Share this post


Link to post
Share on other sites
ToohrVyk    1596
Quote:
Original post by Domarius
Ah yes about that - I tried that earler, and VS 2003 gives me a compiler warning about the i < a.size(), saying it's a conversion between signed and unsigned. I've been taught to get rid of warnings - what do I do there?


Use std::string::size_type instead of int.

Share this post


Link to post
Share on other sites
Zahlman    1682
Aside from what was said about actually finding a good way to do the iteration, I wanted to illustrate some general matters of technique.


std::ifstream fin(file);
if(!fin) {
return false;
}
// Unless you made a typedef, "string" should be lowercase. :)
std::string line = "";
// Take advantage of default parameters. They have the values they do for a
// reason, generally speaking.
getline(fin, line);
if (line != "GameLauncherDatabase") {
// No need to close the file stream explicitly. Only do this when
// a file *must* be released *before* the end of the variable's
// lifetime, or when the stream object will be "reused" (generally a
// bad idea anyway).
return false;
}

bool withinQuotes = false;
string word = "";
// Don't use .eof() for loop conditions. Use the file read for the condition.
// This way is idiomatic and avoids bugs - .eof() is only true after a read
// has already failed. In practice, this typically means you process the
// last line twice.
while (getline(fin, line)) {
withinQuotes = false;
// I took out your logging stuff. Please don't emulate C stdio for
// your logging functions. Emulate C++ streams instead: much cleaner.
// If you really need the power of stdio's formatting, use boost::format
// and gain typesafety (and the ability to work with user-defined
// classes) at the same time.
// In particular, "m_hge->System_Log("%s\n", line);" was probably itself
// a likely cause of crashes. std::string instances are not char*s.
for (size_t i = 0; i < line.size(); i++) {
// Cache things that you'll use more than once, so that you
// avoid repeating yourself and give a meaningful name to
// important entities.
std::string current = line.substr(i, 1);
if (current == "\"") {
withinQuotes = !withinQuotes;
} else if (current == "," && !withinQuotes) {
word = "";
} else {
// Don't use .append() with strings, normally. Use
// the operator overload instead; it reads better.
word += current;
}
}
}



(You might also want to consider more carefully exactly how the CSV spec works. It's surprisingly complex and I'm pretty sure you've missed a lot of corner cases.)

Share this post


Link to post
Share on other sites
Domarius    192
You guys have been really helpful considering the mess I posted - it was 2am and I was fed up, and I admit I could have done at least a little bit more towards reaching the answer.

But this is all awesome info and I am pretty sure I'll be able to get this done when I get back from work.

And to Zalman who went the extra step to help me with practice (the things I have not replied to you can be sure I've taken on board by default :) );

// Unless you made a typedef, "string" should be lowercase. :)

Yeah I did make a typedef :) ... our instructor (back when I used to study) did this so I have just been following suit, everything he does, till I know better.

// No need to close the file stream explicitly. Only do this when
// a file *must* be released *before* the end of the variable's
// lifetime, or when the stream object will be "reused" (generally a
// bad idea anyway).

Oh... I am just in the habit of cleaning up everything... if I delete all variables then I want to close all files too... is it bad practice?

// I took out your logging stuff. Please don't emulate C stdio for
// your logging functions. Emulate C++ streams instead: much cleaner.
// If you really need the power of stdio's formatting, use boost::format
// and gain typesafety (and the ability to work with user-defined
// classes) at the same time.
// In particular, "m_hge->System_Log("%s\n", line);" was probably itself
// a likely cause of crashes. std::string instances are not char*s.

Ah I am using the game engine HGE at hge.relishgames.com
I already posted there about how to get the logging working (it's a singleton)

// Cache things that you'll use more than once, so that you
// avoid repeating yourself and give a meaningful name to
// important entities.

Ah yes, thanks for the reminder, I know what you mean. Should be storing line.substr(i,1) as "currentChar" or something.

// Don't use .append() with strings, normally. Use
// the operator overload instead; it reads better.
word += current;

Heheh yeah I know - but you know what? I was tired and desperate and trying random things. Not good.


As for the CSV standard - yeah I realise it might break on some details, and if I really wanted to read a CSV file properly, I already downloaded an awesome pre-made class that someone made for free that apparently is very good.
But I have a very specific format of CSV for this app I'm making (my app cannot make use of any other format - must have string, string, int, int for each row), and while it would be more reliable to use that CSV thing I downloaded, I wouldn't be learning much, except how to use someone elses class. I wanted to take advantage of the situation and make something very simple as a matter of excersise. I even don't care if it crashes if it is forced to read in an innapropriate CSV - the user is never required to edit the CSV manually so they shouldn't be going there, and if anything goes wrong, the app is able to re-create the database automatically (it's based on a file structure that should be present).

Okay I'm using the CSV as sort of a database, and the entire thing is loaded into data structures in an array (an object for each item in the database, with public variables for each value, eg .name, .description, etc.), so the CSV is only read once at the start of the app, and the data is manipulated in memory after that. I'm sure my way will "work", but I'd be interested to hear your advice on the more "ideal" way to achieve such a thing.

Oh, and I'll remember your quote there about having char in my code ;)

Share this post


Link to post
Share on other sites
Zahlman    1682
Quote:
Original post by Domarius
Yeah I did make a typedef :) ... our instructor (back when I used to study) did this so I have just been following suit, everything he does, till I know better.


Consider questioning things openly. Acceptance is not learning. If something is justified, then your instructor should be able to state the justification.

The typedef here might be to allow for changing the underlying type later - I can indeed imagine that happening in at least one way (to std::wstring, for Unicode "support" - and I use the term loosely). It might be to avoid typing std:: everywhere, but there's a better way to do that (a using-declaration: 'using std::string;'). Or it might just be because your instructor thinks the uppercase name looks better. Whatever the case, it's worth finding out what the reasoning actually was.

Quote:

Oh... I am just in the habit of cleaning up everything... if I delete all variables then I want to close all files too... is it bad practice?


In C++, objects are supposed to be designed to clean up their own messes. When you dynamically allocate something, you need to 'delete' it (or 'delete[]' for an array allocation - the two are NOT interchangable), yes; but the way to do things is *not* to try to figure out everywhere the object needs to be cleaned up and insert the call, but instead arrange for it to happen *automatically*, using destructors. That's what they're in the language for.

In the case of iostreams, the destructor simply calls .close() (or has the same effect). The net result is that you don't have to worry about putting in the call at every point where the stream could go out of existence. Because of exceptions, there are actually a LOT more of these than you think. It's inconsistent to handle the "normal" ones explicitly and forget about exception situations; so it makes more sense to not handle anything explicitly at all - why do extra work that accomplishes nothing?

Done properly, we accomplish a very powerful idiom called RAII - Resource Acquisition Is Initialization - a misnomer, because the idiom is usually seen as much more about the cleanup than it is about the setup (which is "obvious"). Example. (As you can tell from my sig, I'm quite fond of this particular example ;) ) What's going on is that, because the string's destructor deallocates the memory, we don't have to and shouldn't think about that task. It's not the responsibility of our code.

Quote:

But I have a very specific format of CSV for this app I'm making (my app cannot make use of any other format - must have string, string, int, int for each row), and while it would be more reliable to use that CSV thing I downloaded, I wouldn't be learning much, except how to use someone elses class.


If it's your file format, don't feel compelled to emulate CSV in any way, shape or form. Use things that are easy to parse naturally in C++. Reading up until a specific character is easy (just specify that third argument for std::getline :) ), and reading ints that are separated by whitespace is easy. If possible, use a character to separate the strings that can't appear within them - you can choose what you like, after all.

If there is no such character, then you will need some sort of escaping system. You might want to consider how the C++ language itself does things in its syntax - that way is really robust (you can be sure that every possible string is representable), but it's hard to parse from within C++ (maddeningly enough). A simpler way is to treat a doubled-up separator as an escape code for the separator: after reading up to a separator, check if the next "item" is empty, and if it is, append the separator character and keep going. However, this way will not allow you to represent empty strings. But in general, try to avoid scanning the thing character by character; it's usually too much work. (Although it *is* pretty much required for the C++-syntax-style approach.)

Share this post


Link to post
Share on other sites
Domarius    192
Thanks Zahlman, this has been very informative.

Quote:
Original post by Zahlman
Consider questioning things openly. Acceptance is not learning. If something is justified, then your instructor should be able to state the justification.
Oh by that I did not mean I was blindly following what I was told - all I meant that I was just doing it for now - till I get in contact with him again or find out for myself, hence the "till I know better". :) I'll email him right now.

Quote:
In C++, objects are supposed to be designed to clean up their own messes. When you dynamically allocate something, you need to 'delete' it (or 'delete[]' for an array allocation - the two are NOT interchangable), yes; but the way to do things is *not* to try to figure out everywhere the object needs to be cleaned up and insert the call, but instead arrange for it to happen *automatically*, using destructors. That's what they're in the language for.

I understand now. I guess some "legacy" habits are coming into play there. I see now that we consider the open file, part of the object, and hence, will be taken care of, just like all other objects, when we call "delete" on it.

Quote:
If it's your file format, don't feel compelled to emulate CSV in any way, shape or form. Use things that are easy to parse naturally in C++. Reading up until a specific character is easy (just specify that third argument for std::getline :) ), and reading ints that are separated by whitespace is easy. If possible, use a character to separate the strings that can't appear within them - you can choose what you like, after all.

If there is no such character, then you will need some sort of escaping system. You might want to consider how the C++ language itself does things in its syntax - that way is really robust (you can be sure that every possible string is representable), but it's hard to parse from within C++ (maddeningly enough). A simpler way is to treat a doubled-up separator as an escape code for the separator: after reading up to a separator, check if the next "item" is empty, and if it is, append the separator character and keep going. However, this way will not allow you to represent empty strings. But in general, try to avoid scanning the thing character by character; it's usually too much work. (Although it *is* pretty much required for the C++-syntax-style approach.)


The reason I chose CSV is because it's a format I am familiar with, and I like to use supported formats if I can - I will have the option to play with the values using Excel, for debugging purposes. It's just the principal of choosing a widely used format, really - leaves options :) No it's not extensively thought out - I just didn't see it being much trouble to use a CSV file so I thought I may as well.

But that part of your post gave me something to think about. I can see how all these string functions have been created to save us scanning character by character ourselves, it keeps code simpler.
But this is such a small app with a very finite use, and the logic seems to be working - it seemed to be finding the right amount of words and using the quotes to ignore commas - I just couldn't store the characters in the string. But I will in a moment, I expect.

[Edited by - Domarius on May 5, 2007 2:05:50 AM]

Share this post


Link to post
Share on other sites
Domarius    192
Thanks everyone for their help - I went with the for loop method rather than the pointer arithmetic method, everything is fine now, and I found one of the crashes to be from sending the log function a string instead of a char array, ie. I was missing the bit in bold;

std::string text= "hello";
hge->System_Log("%s", text.c_str());

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this