Sign in to follow this  

string questions: how to count the number of instances of a string in another?

This topic is 4299 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

if i have string s1("hello from the otherside, hello, hello there!"); string s2("hello"); how can i find out the number of times s2 occurs in s1. I know if i use an iterator pos and do the pos = search(s1.begin(), s1.end(), s2.begin(), s2.end()) pos will return me the starting index of hello. but how can i continue to loop to find the other occurances? i guess i will have to think about this by putting the updated start index in the s1.begin field.

Share this post


Link to post
Share on other sites
Quote:
Original post by baker
if i have

string s1("hello from the otherside, hello, hello there!");

string s2("hello");

how can i find out the number of times s2 occurs in s1.
I know if i use an iterator pos and do the

pos = search(s1.begin(), s1.end(), s2.begin(), s2.end())

pos will return me the starting index of hello. but how can i continue to loop to find the other occurances? i guess i will have to think about this by putting the updated start index in the s1.begin field.
I don't think std::string has any functions that will do this for you per se, but you're probably on the right track with starting the next find where the previous left off, and iterating until std::string::npos is returned. If you want the tokens themselves you could turn the string into a stringstream and extract them; then it would of course be simple to count how many of any particular token there was. If you're interested in other libraries as well, the boost string algo library offers find_all() and find_iterator, which would probably do what you want.

Share this post


Link to post
Share on other sites
A good way to do this would be to start by searching for the final character in s2 ('o'), beginning the search at s1[s2.length() - 1], and then continuing the search at s1[(s2.length() - 1) + 1] until s1[s1.length() - 1] is encountered.

If at any time a match to 'o' is found, then the previous chars in s1 and s2 are compared, repeatedly until a whole match is found. If a mismatch occurs anywhere during this process, the search continues back where it left off.


s1: "a big hello world"
s2: "hello"

comparing...
s1 s2 match? cur_index
--------------------------
'g' 'o' false s1[cur_index = s2.length() - 1]
' ' 'o' false s1[cur_index++]
'h' 'o' false "
'e' 'o' false "
'l' 'o' false "
'l' 'o' false "
'o' 'o' true .... searching backwards... s1[cur_index--]

'l' 'l' true s1[cur_index--]
'l' 'l' true "
'e' 'e' true "
'h' 'h' true .... match found at: cur_index







By stepping through this sample output, it also becomes apparent that it may be worth searching for both 'h' and 'o' at every cur_index. The last 4 char comparisons in the above sample wouldn't have been necessary if a forward-looking comparison using 'h' had been been done between s1[2] and s2[2]. My bad.

Share this post


Link to post
Share on other sites
Just keep it simple.


int countstring(const std::string &st1, const std::string &st2)
{
int iCount = 0;
size_t iCurrentPos = 0;
while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)
{
++iCurrentPos; // should probably ++ this by the st2.length, but this works.
++iCount;
}
return iCount;
}

int main(const char **_args, int _argn)
{
std::string s1("hello from the otherside, hello, hello there!");
std::string s2("hello");
std::cout << "Found: " << countstring(s1, s2) << std::endl;
}


Share this post


Link to post
Share on other sites
What about something simple like this:


int Foo(std::string const & pBaseString, std::string const & pTargetString)
{
int iResult = int();

if(pBaseString.empty() || pTargetString.empty())
return iResult;

std::string::size_type stIndex = pBaseString.find(pTargetString);

while(stIndex != std::string::npos)
{
iResult ++;

stIndex = pBaseString.find(pTargetString, stIndex + pTargetString.size());
}

return iResult;
}






If you wanted to find a specific word within, say, a sentence, I guess you could also check for spaces, but incase you're looking for any occurace in a string, this will work...

For example,
if pBaseString is ("This is a simple test containing some keyword") and pTargetString is ("is"), then you'd get 2 occurances, as "is" is contained in "This" also...

Edit: Looks like I'm a little late...

Share this post


Link to post
Share on other sites
hey drevil question: i am trying to understand the function in my head. is this sort of right?

the while condition is going to go from 0 position of string 1 to the last position of string 1 which has the special value of npos. each time the while statement loops, it will return the first space number + 1 if the string is found. so if st1 = "hello hello there from mars"

iCurrentpos = 0 because 'hello' is found. next loop, i currentpos will start at 'e' and start search from there? 'e' being the next character after the first character of the first found string of hello? is this correct?




while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)
{
++iCurrentPos; // should probably ++ this by the st2.length, but this works.
++iCount;
}

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
while((iCurrentPos = st1.find(st2.c_str(), iCurrentPos)) != std::string::npos)
{
++iCurrentPos; // should probably ++ this by the st2.length, but this works.
++iCount;
}

iCurrentPos is set to the position where the first instance of str2 is found.
str2 is found
++icount, ++icurrentpos to move the position from where you are searching
(if no instances are found then string::npos is called (no position? lol ))

C++.toEnglish(); //lol
"hello hello from mars"
icurrentpos = 0,
find("hello", 0)..
hello found from start position 0, at position 0
icount + 1
current position + 1
loop again..
find("hello", 1)..
new search string = "ello hello from mars"

understand =]

Share this post


Link to post
Share on other sites
ow but this is easy.
what if you have to search for the word "hellhellhello" in the sentence "is this hellhellhellhellhello or is it"

then you can't do it in the normal way because:

______________________
'i' | 'h' | false
's' | 'h' | false
' ' | 'h' | false
't' | 'h' | false
'h' | 'h' | false
'i' | 'h' | false
's' | 'h' | false
' ' | 'h' | false
'h' | 'h' | true
'e' | 'e' | true
'l' | 'l' | true
'l' | 'l' | true
'h' | 'h' | true
'e' | 'e' | true
'l' | 'l' | true | you should have started to match from here
'l' | 'l' | true | ______________________
'h' | 'h' | true | 'h' | 'h' | true
'e' | 'e' | true | 'e' | 'e' | true
'l' | 'l' | true | 'l' | 'l' | true
'l' | 'l' | true | 'l' | 'l' | true
'h' | 'o' | false | 'h' | 'h' | true
'e' | 'h' | false | 'e' | 'e' | true
'l' | 'h' | false | 'l' | 'l' | true
'l' | 'h' | false | 'l' | 'l' | true
'h' | 'h' | true | 'h' | 'h' | true
'e' | 'e' | true | 'e' | 'e' | true
'l' | 'l' | true | 'l' | 'l' | true
'l' | 'l' | true | 'l' | 'l' | true
'o' | 'h' | false | 'o' | 'o' | true
' ' | 'h' | false |
'o' | 'h' | false | match found !!!
'r' | 'h' | false
' ' | 'h' | false
'i' | 'h' | false
's' | 'h' | false
' ' | 'h' | false
'i' | 'h' | false
't' | 'h' | false

not found!!!




[Edited by - delta user on March 6, 2006 1:53:20 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by baker
hey drevil question: i am trying to understand the function in my head. is this sort of right?


Yea, you got it.

Out of laziness I only incremented the current position by 1. Optimally you'd probably want to increment by the string length as raz0r did. If you didn't increment at all the loop would go on forever because it would keep finding the string in the same spot. Also don't forget the error checking for empty string, as raz0r also showed.

delta user: What are you talking about? Our solutions work fine on that, returning 1, unless I misunderstood your post.

Share this post


Link to post
Share on other sites
Quote:
Original post by DrEvil
delta user: What are you talking about? Our solutions work fine on that, returning 1, unless I misunderstood your post.


I think so too?

"is this hellhellhellhellhello or is it"
..................hellhellhello

Share this post


Link to post
Share on other sites
i m sorry. my mistake.
i thought you would jump to far in the string after a mismatch.

i was reading the code of drEvil.
he has commented his code with "should probably ++ this by the st2.length"

and it wouldnt find my line of text that way because it skips a part of the code

just think about it.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
basic_string::find
size_type find(E c, size_type pos = 0) const;
size_type find(const E *s, size_type pos = 0) const;
size_type find(const E *s, size_type pos, size_type n) const;
size_type find(const basic_string& str, size_type pos = 0) const;
Each member function finds the first (lowest beginning position) subsequence in the controlled sequence, beginning on or after position pos, that matches the operand sequence specified by the remaining operands. If it succeeds, it returns the position where the matching subsequence begins. Otherwise, the function returns npos.



What is so hard about RTFM????

Share this post


Link to post
Share on other sites
im slow in the head. sorry. thanks everyone for the help.



Quote:
Original post by Anonymous Poster
basic_string::find
size_type find(E c, size_type pos = 0) const;
size_type find(const E *s, size_type pos = 0) const;
size_type find(const E *s, size_type pos, size_type n) const;
size_type find(const basic_string& str, size_type pos = 0) const;
Each member function finds the first (lowest beginning position) subsequence in the controlled sequence, beginning on or after position pos, that matches the operand sequence specified by the remaining operands. If it succeeds, it returns the position where the matching subsequence begins. Otherwise, the function returns npos.



What is so hard about RTFM????


Share this post


Link to post
Share on other sites
Quote:
Original post by delta user
i m sorry. my mistake.
i thought you would jump to far in the string after a mismatch.

i was reading the code of drEvil.
he has commented his code with "should probably ++ this by the st2.length"

and it wouldnt find my line of text that way because it skips a part of the code

just think about it.


What?
Evil's method will still return the right answer. Since he increments the search position by one, the current word will be invalid in the next pass.

"Hello this is a test" -> found "Hello".
"ello this is a test" -> no match for "Hello".

I'm probably misunderstanding you, but whatever...

[Edited by - raz0r on March 6, 2006 2:18:33 PM]

Share this post


Link to post
Share on other sites
delta user, I think the problem you are talking about would have to be a problem internal to the string.find() function. DrEvil's code has nothing in it about checking individual characters. It searchs for a whole string, it does not check against characters. I also think that you are misrepresenting the way string.find() works. While I don't know the details of the implementation of the function, I can assure you that it doesn't search in the way your diagram illustrates, it would have to backtrack. And the reason is because it doesn't work, like you said.

I tested his function on your strings, it does work, even with incrementing by the whole string.

His function does this:

starting from index 0:
012345678
"hellhellhello"
"hellhellhellhellhello or is it"

find returns index 8
increments index
increments count

starts from index 9 this time:
"hellhellhello"
"ellhellhello or is it"

find returns no match, end while

return count of 1

Share this post


Link to post
Share on other sites
this works just fine for me


#include <stdio.h>
#include <unistd.h>
#include <string.h>

unsigned int strcount(char *h, char *n)
{
unsigned int c, l = strlen(n);

for (c = 0; h = strstr(h, n); ++c) h += l;

return c;
}

int main(int argc, char **argv)
{
char *test = "in the bay we get hyphy stupid dumb and hyphy\n";

printf("%s", test);
printf("e: %d\n", strcount(test, "e"));
printf("hyphy: %d\n", strcount(test, "hyphy"));
printf("z: %d\n", strcount(test, "z"));
}




it outputs
Quote:

[r3d@localhost functions]$ ./strcount
in the bay we get hyphy stupid dumb and hyphy
e: 3
hyphy: 2
z: 0
[r3d@localhost functions]$

Share this post


Link to post
Share on other sites

This topic is 4299 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this