string questions: how to count the number of instances of a string in another?

Started by
17 comments, last by JohnBolton 18 years, 1 month ago
if i have string s1("hello from the otherside, hello, hello there!"); string s2("hello"); how can i find out the number of times s2 occurs in s1. I know if i use an iterator pos and do the pos = search(s1.begin(), s1.end(), s2.begin(), s2.end()) pos will return me the starting index of hello. but how can i continue to loop to find the other occurances? i guess i will have to think about this by putting the updated start index in the s1.begin field.
Advertisement
Quote:Original post by baker
if i have

string s1("hello from the otherside, hello, hello there!");

string s2("hello");

how can i find out the number of times s2 occurs in s1.
I know if i use an iterator pos and do the

pos = search(s1.begin(), s1.end(), s2.begin(), s2.end())

pos will return me the starting index of hello. but how can i continue to loop to find the other occurances? i guess i will have to think about this by putting the updated start index in the s1.begin field.
I don't think std::string has any functions that will do this for you per se, but you're probably on the right track with starting the next find where the previous left off, and iterating until std::string::npos is returned. If you want the tokens themselves you could turn the string into a stringstream and extract them; then it would of course be simple to count how many of any particular token there was. If you're interested in other libraries as well, the boost string algo library offers find_all() and find_iterator, which would probably do what you want.
A good way to do this would be to start by searching for the final character in s2 ('o'), beginning the search at s1[s2.length() - 1], and then continuing the search at s1[(s2.length() - 1) + 1] until s1[s1.length() - 1] is encountered.

If at any time a match to 'o' is found, then the previous chars in s1 and s2 are compared, repeatedly until a whole match is found. If a mismatch occurs anywhere during this process, the search continues back where it left off.

s1: "a big hello world"s2: "hello"comparing...s1   s2   match? cur_index--------------------------'g'  'o'  false  s1[cur_index = s2.length() - 1]' '  'o'  false  s1[cur_index++]'h'  'o'  false  "'e'  'o'  false  "'l'  'o'  false  "'l'  'o'  false  "'o'  'o'  true .... searching backwards... s1[cur_index--]'l'  'l'  true   s1[cur_index--]'l'  'l'  true   "'e'  'e'  true   "'h'  'h'  true .... match found at: cur_index


By stepping through this sample output, it also becomes apparent that it may be worth searching for both 'h' and 'o' at every cur_index. The last 4 char comparisons in the above sample wouldn't have been necessary if a forward-looking comparison using 'h' had been been done between s1[2] and s2[2]. My bad.
Just keep it simple.

int countstring(const std::string &st1, const std::string &st2){	int iCount = 0;	size_t iCurrentPos = 0;	while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)	{		++iCurrentPos; // should probably ++ this by the st2.length, but this works.		++iCount;	}	return iCount;}int main(const char **_args, int _argn){	std::string s1("hello from the otherside, hello, hello there!");	std::string s2("hello");	std::cout << "Found: " << countstring(s1, s2) << std::endl;}
What about something simple like this:

int Foo(std::string const & pBaseString, std::string const & pTargetString){	int iResult = int();	if(pBaseString.empty() || pTargetString.empty())		return iResult;	std::string::size_type stIndex = pBaseString.find(pTargetString); 	while(stIndex != std::string::npos) 	{ 		iResult ++; 		stIndex = pBaseString.find(pTargetString, stIndex + pTargetString.size()); 	}		return iResult;}


If you wanted to find a specific word within, say, a sentence, I guess you could also check for spaces, but incase you're looking for any occurace in a string, this will work...

For example,
if pBaseString is ("This is a simple test containing some keyword") and pTargetString is ("is"), then you'd get 2 occurances, as "is" is contained in "This" also...

Edit: Looks like I'm a little late...
hot damn you guys are good.
hey drevil question: i am trying to understand the function in my head. is this sort of right?

the while condition is going to go from 0 position of string 1 to the last position of string 1 which has the special value of npos. each time the while statement loops, it will return the first space number + 1 if the string is found. so if st1 = "hello hello there from mars"

iCurrentpos = 0 because 'hello' is found. next loop, i currentpos will start at 'e' and start search from there? 'e' being the next character after the first character of the first found string of hello? is this correct?




while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)
{
++iCurrentPos; // should probably ++ this by the st2.length, but this works.
++iCount;
}

while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)
{
++iCurrentPos; // should probably ++ this by the st2.length, but this works.
++iCount;
}

iCurrentPos is set to the position where the first instance of str2 is found.
str2 is found
++icount, ++icurrentpos to move the position from where you are searching
(if no instances are found then string::npos is called (no position? lol ))

C++.toEnglish(); //lol
"hello hello from mars"
icurrentpos = 0,
find("hello", 0)..
hello found from start position 0, at position 0
icount + 1
current position + 1
loop again..
find("hello", 1)..
new search string = "ello hello from mars"

understand =]
ow but this is easy.
what if you have to search for the word "hellhellhello" in the sentence "is this hellhellhellhellhello or is it"

then you can't do it in the normal way because:
______________________'i'  |  'h'  |  false's'  |  'h'  |  false' '  |  'h'  |  false't'  |  'h'  |  false'h'  |  'h'  |  false'i'  |  'h'  |  false's'  |  'h'  |  false' '  |  'h'  |  false'h'  |  'h'  |  true'e'  |  'e'  |  true'l'  |  'l'  |  true'l'  |  'l'  |  true'h'  |  'h'  |  true'e'  |  'e'  |  true'l'  |  'l'  |  true  |  you should have started to match from here'l'  |  'l'  |  true  |  ______________________'h'  |  'h'  |  true  |   'h'  |  'h'  |  true'e'  |  'e'  |  true  |   'e'  |  'e'  |  true'l'  |  'l'  |  true  |   'l'  |  'l'  |  true'l'  |  'l'  |  true  |   'l'  |  'l'  |  true'h'  |  'o'  |  false |   'h'  |  'h'  |  true'e'  |  'h'  |  false |   'e'  |  'e'  |  true'l'  |  'h'  |  false |   'l'  |  'l'  |  true'l'  |  'h'  |  false |   'l'  |  'l'  |  true'h'  |  'h'  |  true  |   'h'  |  'h'  |  true'e'  |  'e'  |  true  |   'e'  |  'e'  |  true'l'  |  'l'  |  true  |   'l'  |  'l'  |  true'l'  |  'l'  |  true  |   'l'  |  'l'  |  true'o'  |  'h'  |  false |   'o'  |  'o'  |  true' '  |  'h'  |  false |'o'  |  'h'  |  false |   match found !!!'r'  |  'h'  |  false' '  |  'h'  |  false'i'  |  'h'  |  false's'  |  'h'  |  false' '  |  'h'  |  false'i'  |  'h'  |  false't'  |  'h'  |  falsenot found!!!



[Edited by - delta user on March 6, 2006 1:53:20 PM]
Someone who uses a, euhm..., delta!?
Read here:
http://en.wikipedia.org/wiki/String_searching_algorithm

and here:
http://www.dogma.net/markn/articles/suffixt/suffixt.htm

This topic is closed to new replies.

Advertisement