# Algorithm: Find target string from array w/longest substring of query

This topic is 4450 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

So, this sounds like a TopCoder question, but it's real. I have a string entered by the user, and I have an array of strings coming from a database. I want to find the string in the database that contains the longest substring of the user string (ties result in an arbitrary one of the winning results). For instance:
vector<string> vDatabase;
vDatabase.push_back("hello");
vDatabase.push_back("house");
vDatabase.push_back("dilts");
vDatabase.push_back("yellow");

string result = LongestMatchingSubstring("llowz", vDatabase);
assert(result == "yellow");

See the idea? A brute-force algorithm would go something like this:
string LongestMatchingString(string search, vector<string> db)
{
vector<string>::iterator result = db.begin();

for(int len = 1; len <= search.size(); len++)
{
for(vector<string>::iterator iDb = db.begin(); iDb != db.end(); iDb++)
{
for(int iBegin = 0; iBegin < search.size(); iBegin++)
{
string tofind = search.substr(iBegin, len);
for(vector<string>::iterator iFindSubstr = db.begin(); iFindSubstr != db.end(); iFindSubstr++)
{
if(iFindSubstr->find(tofind, 0) != string::npos)
{
result = iFindSubstr;
goto NextLen;
}
}
}
}
NextLen:
}
return(*result);
}

That is a lot of loops. And I invoked the dread goto construct to optimize it! I just whipped that out off the top of my head, so it may or may not work. The point is, does someone want to show off their programming chutzpah and make an algorithm that's more elegant and faster?

##### Share on other sites
Quote:
 Original post by SiCrane Longest common substring problem.

Close, but not quite. The Longest Common Substring problem tries to find the longest substring of a list of strings. My problem is to find the one string in the list that has the longest substring in common with an arbitrary string outside the list.

##### Share on other sites
Quote:
Original post by BeanDog
Quote:
 Original post by SiCrane Longest common substring problem.

Close, but not quite. The Longest Common Substring problem tries to find the longest substring of a list of strings. My problem is to find the one string in the list that has the longest substring in common with an arbitrary string outside the list.

Actually SiCrane is correct. The idea would be to check the longest common substring for each string in your list against the arbitrary string. You would then pick the string with the best result.

An kind of efficient algorithm O( string1.length * string2.length * number of strings) is the dynamic programming formulation of the global sequence alignment problem. Using a mismatch score of 0 and a match score of 1 you have the equivalent of longest common substring. Your best matching string would be the one that gives the highest alignment score.

-Mike

##### Share on other sites
A link to an example of the dynamic programming formulation for the global sequence alignment problem:

http://www.sbc.su.se/~pjk/molbioinfo2001/dynprog/dynamic.html

-Mike

1. 1
2. 2
3. 3
4. 4
Rutin
11
5. 5

• 12
• 19
• 10
• 14
• 10
• ### Forum Statistics

• Total Topics
632666
• Total Posts
3007718
• ### Who's Online (See full list)

There are no registered users currently online

×