Sign in to follow this  
BeanDog

Algorithm: Find target string from array w/longest substring of query

Recommended Posts

So, this sounds like a TopCoder question, but it's real. I have a string entered by the user, and I have an array of strings coming from a database. I want to find the string in the database that contains the longest substring of the user string (ties result in an arbitrary one of the winning results). For instance:
vector<string> vDatabase;
vDatabase.push_back("hello");
vDatabase.push_back("house");
vDatabase.push_back("dilts");
vDatabase.push_back("yellow");

string result = LongestMatchingSubstring("llowz", vDatabase);
assert(result == "yellow");
See the idea? A brute-force algorithm would go something like this:
string LongestMatchingString(string search, vector<string> db)
{
  vector<string>::iterator result = db.begin();

  for(int len = 1; len <= search.size(); len++)
  {
    for(vector<string>::iterator iDb = db.begin(); iDb != db.end(); iDb++)
    {
      for(int iBegin = 0; iBegin < search.size(); iBegin++)
      {
        string tofind = search.substr(iBegin, len);
        for(vector<string>::iterator iFindSubstr = db.begin(); iFindSubstr != db.end(); iFindSubstr++)
        {
          if(iFindSubstr->find(tofind, 0) != string::npos)
          {
            result = iFindSubstr;
            goto NextLen;
          }
        }
      }
    }
NextLen:
  }
  return(*result);
}
That is a lot of loops. And I invoked the dread goto construct to optimize it! I just whipped that out off the top of my head, so it may or may not work. The point is, does someone want to show off their programming chutzpah and make an algorithm that's more elegant and faster?

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Longest common substring problem.

Close, but not quite. The Longest Common Substring problem tries to find the longest substring of a list of strings. My problem is to find the one string in the list that has the longest substring in common with an arbitrary string outside the list.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by BeanDog
Quote:
Original post by SiCrane
Longest common substring problem.

Close, but not quite. The Longest Common Substring problem tries to find the longest substring of a list of strings. My problem is to find the one string in the list that has the longest substring in common with an arbitrary string outside the list.


Actually SiCrane is correct. The idea would be to check the longest common substring for each string in your list against the arbitrary string. You would then pick the string with the best result.

An kind of efficient algorithm O( string1.length * string2.length * number of strings) is the dynamic programming formulation of the global sequence alignment problem. Using a mismatch score of 0 and a match score of 1 you have the equivalent of longest common substring. Your best matching string would be the one that gives the highest alignment score.

-Mike

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
A link to an example of the dynamic programming formulation for the global sequence alignment problem:

http://www.sbc.su.se/~pjk/molbioinfo2001/dynprog/dynamic.html

-Mike

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this