Archived

This topic is now archived and is closed to further replies.

nino

Finding one string inside another

Recommended Posts

char string1[] = "Programmming"; char string2[] = "gram"; how would i find where string2 is inside string1? i know i could use the find() function but i need to write my own function to do it and i''m stumped!! can somebody please help? thanks NiNo

Share this post


Link to post
Share on other sites
Is this a do my homework for me post? Otherwise I see no reason not to use string::find () or strstr ()

Hint: Loop though each character in string1 and look for the first character in string2. If you find the first character, check the next character, etc., etc.

Share this post


Link to post
Share on other sites
no its not a "do my homewotrk for me question" its a "i''ve been trying to figure this out for hours and i''m stuck and its due tomorrow" question

i know i should loop through like you said but i just cant figure out how to do it right for some reason

NiNo

Share this post


Link to post
Share on other sites
Here is the source of Microsoft''s strstr function:

  
char * __cdecl strstr (
const char * str1,
const char * str2
)
{
char *cp = (char *) str1;
char *s1, *s2;

if ( !*str2 )
return((char *)str1);

while (*cp)
{
s1 = cp;
s2 = (char *) str2;

while ( *s1 && *s2 && !(*s1-*s2) )
s1++, s2++;

if (!*s2)
return(cp);

cp++;
}

return(NULL);

}

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by nino
no its not a "do my homewotrk for me question" its a "i''ve been trying to figure this out for hours and i''m stuck and its due tomorrow" question

Hmmm... It''s not homework, but it''s DUE tomorrow?
Oops...

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
There are several ways of doing this. The first approach is the one detailed by JY, and is a brute force approach. A more efficient approach would be to use the Knuth-Morris-Prath or Boyer-Moore algormithms. Both of these depends on first examining the pattern (the string you are looking for) for patterns. For natural language texts (e.g. english) BM is usually significantly more efficient than both other methods.

Here is some source-code for BM (searching ASCII-strings)
  
int BM_Search (char P[], int plen, char T[], int tlen) {
// Returns the index of the first occurence of P in T

// Returns -1 if P is not found in T

int i, j;
int skip[255];
for (i = 0; i < 255; i++) {
for (j = 0; j < plen; j++)
skip[P[j]] = plen - j - 1;
}

i = plen-1;
j = plen-1;

while (j >= 0 && i < tlen) {
if (T[i] == P[j]) {
i--;
j--;
} else {
if (plen - j > skip[T[i]])
i = i + plen - j;
else
i = i + skip[T[i]];
j = plen - 1;
}
}
if (j == -1)
return (i+1);
else
return -1;
}

Without going in to too much detail on how this algorithm works it basically looks at the character in the text on which a mismatch occurs, and moves the pattern forward until the last occurence of that character in the pattern is lined up with it.
Oh, and it also checks from the end of the pattern backwards, rather than from the start of the pattern forwards.

Maybe a small example would help a little.
Assuming the text-string you are searcing in is
I REALLY LOVE DEBUGGING MY CODE
and the string you are searching for is
MY CODE

  
First, the algorithm creates a skip-array that will look like this
skip[''E''] = 0
skip[''D''] = 1
skip[''O''] = 2
skip[''C''] = 3
skip['' ''] = 4
skip[''Y''] = 5
skip[''M''] = 6
all other skips have the value 7

Starting from the start we get
I REALLY LOVE DEBUGGING MY CODE
MY CODE

Starting at the end of the pattern we get a mismatch on L (in the text).
Since skip[''L''] is 7 we move the pattern 7 positions to the right, thus getting
I REALLY LOVE DEBUGGING MY CODE
MY CODE
At this point we get a mismatch on '' '' (in the text).
Since skip['' ''] is 4 we move the pattern 4 positions, getting
I REALLY LOVE DEBUGGING MY CODE
MY CODE
Note that this skip moved the last occurence of '' '' in the pattern to match the '' '' we just found in the text. This is really the meat and bones of the algorithm, as whenever we find a character in the text we know we have in the pattern, we move the pattern so that that character matches in the pattern, and if we find a character in the text that isn''t in the pattern we move the pattern just past that point in the text.

Anyway, that position above gives us a mismatch on ''U'' which doesn''t exist in our pattern, so again we move 7 positions getting
I REALLY LOVE DEBUGGING MY CODE
MY CODE

skip[''M''] is 6, so we move 6 spaces and get
I REALLY LOVE DEBUGGING MY CODE
MY CODE
And we''re done.
This algorithm is clearly a lot faster than a simple brute-force approach as the pattern is literally flying along often moving 7 spaces at the time without comparisons, while the brute-force approach would never move more than 1 before comparing again.



Anyway, hope that was helpful.

-Neophyte


- Death awaits you all with nasty, big, pointy teeth. -

Share this post


Link to post
Share on other sites
Bugger, forgot to log in...
Anyway, the AP above was me, and boy! was that formatting ever screwed up. Sorry about that.

-Neophyte

- Death awaits you all with nasty, big, pointy teeth. -

Share this post


Link to post
Share on other sites
I think I would with the tried and tested method. It''s perfectly efficient enough.

Share this post


Link to post
Share on other sites
quote:
Original post by Ysaneya
Looks like bullshit to me. The skip local array is never initialized, and the for i loop has no use.

Y.




Doesn't matter what it looks like to you. It works.


But you're correct in saying that first i-loop is totally unnecessary. It doesn't do anything.


Edited by - Red Ant on November 15, 2001 6:02:49 AM

Share this post


Link to post
Share on other sites
You''re correct, that was a typo (or several) on my part. The correct loop should look like this:
  
// Originally

// for (i = 0; i < 255; i++) {

// for (j = 0; j < plen; j++)

// skip[P[j]] = plen - j - 1

// }

for (i = 0; i < 255; i++)
skip[i] = plen;
for (i = 0; i < plen; i++)
skip[P[i]] = plen - i - 1;


-Neophyte

- Death awaits you all with nasty, big, pointy teeth. -

Share this post


Link to post
Share on other sites