is this an efficient function?

Started by
4 comments, last by Stashi 22 years, 4 months ago
im working on ''tokenizing'' i guess (whatever the term is) a sentence, but im using my own function..
  
vector <string>	strArray;
string	strInput,
	strTmp;
int	vector_size=0;

void tokenize()
{
	int mov = 0, sx = 0; 
	int endx = 0, arx = 0;
	while(mov < vector_size) {
		mov++;
		if (strInput[mov] == '' '' || strInput[mov] == ''\n'' || strInput[mov] == ''\0'') {
			strTmp.erase();
			for (int x=sx; x<mov; x++) 
			{
				strTmp = strTmp + strInput[x];
			}
			strArray[arx] = strTmp;
			arx++;
			sx=mov+1;
		}		
	}
}
  
this coding works great right now, if i enter in for instance, "my name is bob" it will store "my", "name", "is", "bob", in 4 different elements of the array... but is this function really what tokenizing is? is there a more efficient manner of doing it?
Advertisement
Actually, you''re doing way too much work that''s already been done for you. Use the STL containers, and use their functions as well. The code I gave you in your other thread works fine; why not use it?

You don''t need the call to erase(), and there''s a substr() function defined for string. There''s also find_first_of(), find_last_of(), find_first_not_of() and find_last_not_of() which find first or last occurence of a character in or not in a set.

Start Here!
sorry but your tokenization coding for the STL (C++) method doesnt work at all for me

in fact i dont see how a few of the things in there would work (maybe im just dumb heh), but for instance...

while(true) {

what does that mean? while _what_ is true??
While the expression ''true'' is true. So basically, it will run forever, until it hits the break statement.
ReactOS - an Open-source operating system compatible with Windows NT apps and drivers
ok i made some changes to your code and now it works perfectly.

thanks alot!

oh one last thing.. is there an std string function that will give me a count of how many times a certain character appears in a string?

this is my function as of now..

  void tokenize(){  string seps;  seps = " ,+";  int nTokens = 0;  int pos0 = 0, pos1 = 0;  while(pos1 < vlen)  {    pos1 = strInput.find_first_of(seps.c_str(), pos0);    strArray[nTokens] = strInput.substr(pos0, (pos1 - pos0));    // get ready for next extraction    	pos0 = pos1 + 1;	if(pos1 == string::npos)		break;	nTokens++;    }  }  


as you can see, the tokens are plugged into an array..but the function kind of assumes that the array knows how many words are already in the string. should i do a array resize each pass of the loop? or should i make another function elsewhere to determine how many words there are, then resize the matrix before calling the tokenize function?


Edited by - Stashi on November 27, 2001 12:31:26 AM
strArray.push_back(strInput.substr(pos0, (pos1 - pos0))); 

Adds the new string to the end of the array and resizes the vector if necessary. As an aside, whenever std::vector<> reaches capacity and needs to grow, it doubles in size.

Doing while(true) is perfectly legal and removes the redundant check pos < vlen. The issue is that you need to check if you''ve hit the end of the string as soon as you get the next occurence of a separator, so why check twice?
// This code increments i until it is equal to 10:int i = 0;while(true){  if(i == 10)    break;  ++i;} 


Also, you might want to make your tokenize function take the inputs as parameters so you can use it on different strings. And get rid of vlen, because it ties your function to a specific vector.

If you want, I can send you some STL-based code that shows you how I used my tokenize function to write a concordance. The cool thing was the way I wrote the function allowed you to do this:
string tok;vector<string> tokens;int nTokens;while((tok = tokenize(str)) != ""){  tokens.push_back(tok);  ++nTokens;} 

Let me know.

Start Here!

This topic is closed to new replies.

Advertisement