Sign in to follow this  
lucym

[web] The "did you mean" feature

Recommended Posts

lucym    122
Hello, I want to add the 'did you mean' feature (like on Google) on my shareware site, where a lot of users have mistypes, and they get 0 results... so they exit the site very fast. Do you have any idea how I could implement such a thing? Does it require a dictionary or something? I found some scripts, but none of them gave relevant results... just stupid, irrelevant suggestions. Any help would be appreciated... Thanks, Lucy

Share this post


Link to post
Share on other sites
TheGilb    372
You would probably need a program that can read metatag keywords from your web pages and store them all in a database (which you can now think of as your dictionary). You would then need to run some closest string matches on your dictionary (google: "closest string match"), and return the results.

Or you could embed Google search within your site, which pretty much will do all of this for you.

Hope that helps, and good luck :-)

Share this post


Link to post
Share on other sites
lucym    122
i cannot embed google search on the page, it would ruin the concept of the site, but I heard Google and Yahoo offer APIs for related searches, and suggestions, so I will try that.

Share this post


Link to post
Share on other sites
Cygnus_X    359
There is a function called levenshtein which calculates the levenshtein distance between 2 strings. In other words, if a user types 'aple', and they meant to type 'apple', the levenshtein distance between these two words would be 1 because only 1 letter substitution/insert/deletion is required to convert the word from aple to apple.

The theory behind this is you create a database of search terms (ie, a dictionary or otherwise) and check user input against it. So, when a user types in a word, if it does not appear in the database, you look for the smallest levenshtein difference between their input and what you have on record. Once you find it, you then ask: Did you mean (insert result here).

Share this post


Link to post
Share on other sites
jamminjulia    122
You'll need a dictionary of the keywords that might be typed...I'd scan your site's content to build it. Then calculate the soundex or metaphone of the words in your dictionary (see: http://en.wikipedia.org/wiki/Phonetic_algorithm). Compare that result to the soundex or metaphone index of the search terms... you may have to check the words in the query one at a time (I don't think most phonetic algorithms work on phrases)... Then look up matching soundex/metaphone index in your dictionary.

From there, you'll need to figure out which 'corrected' result to return if there are multiple results. But I think this is the method I would use...

Good luck!

Share this post


Link to post
Share on other sites
jamminjulia    122
You'll need a dictionary of the keywords that might be typed...I'd scan your site's content to build it. Then calculate the soundex or metaphone of the words in your dictionary (see: http://en.wikipedia.org/wiki/Phonetic_algorithm). Compare that result to the soundex or metaphone index of the search terms... you may have to check the words in the query one at a time (I don't think most phonetic algorithms work on phrases)... Then look up matching soundex/metaphone index in your dictionary.

From there, you'll need to figure out which 'corrected' result to return if there are multiple results. But I think this is the method I would use...

Good luck!

Share this post


Link to post
Share on other sites
Trillian    410
I don't know if it's going to be useful to you, but I saw in a documentary on google ("Google : Behind the screen") that they implemented their "did you mean" feature mostly by tracking particular user patterns : when someone searches for something, doesn't click on any link, then slightly modifies the search query and clicks on the first link of the new results, for example. This gets interpreted by their system as if the user mistyped something and corrected it. By making statistics on such behaviours, they've built a database of wrong terms and their correct versions.

Share this post


Link to post
Share on other sites
lucym    122
OK, the winner is Yahoo Spelling Suggestion (which uses Yahoo API). This gives your site the possibility to suggest exactly what Yahoo Search suggests :cool: Awesome, I would say!

Here is a short summary about using it, if you guys want to use it too: it requires a Yahoo API key, which can be got free. It is limited to 5000 queries per day, so I suggest caching the suggestion (which I implemented, as my site has more then 5000 searches per day). The page returned by Yahoo needs to be parsed, and see if there is a suggestion given. This is a page that gives a suggestion, this is one that has no suggestion.

You can see it live on my shareware site: http://www.coredownload.com/ try some mistypes, like "enail backup", "rgistry cleaner" or whatever you want. I also implemented a function to make bold italic the words that are different between the search query and the suggestion from Yahoo. ;)

If you need assistance on implementing this on your site, let me know.

Thanks, Lucy

Share this post


Link to post
Share on other sites
ascorbic    307
We had to do somehing like that for an AI assignment last term. We used the information from this website below. It gives 80-90% accuracy and is actually quite easy to implement.

Link

Hope this helps. At least it's an interesting read...

Share this post


Link to post
Share on other sites
lucym    122
very interesting, it's true. but i think yahoo/google api give better results, especially for search terms with more then 1 word. also, the fact that their suggestions are based on search patterns makes me think they have better suggestions.

thanks,lucy

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this