Jump to content
  • Advertisement
Sign in to follow this  
Prozak

[web] Implementing a WebSite Search Feature

This topic is 4316 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm having problems thinking out the details of search technology on a site. Some simple features: * Find articles with this word: blue * Find articles with these words: blue gene * Find articles containing this exact string: "muscle group" * Find articles containing this word in this exact case: 'nVidia' * Find articles containing these words in their exact case: 'aTI' 'iDD2007' * Find articles containing these words with exceptions: blue -green -'Blue' * Tags: Articles and various site areas can be tagged by the users themselves, with keywords that facilitate the search of those site resources. * Automatic Conversion (AC): For example if I searched for "rocky 4" I might not find the article labeled "Rocky IV". AC is designed to translate numerals into roman or back to Arabic format. AC also tries to convert certain letter combinations. For example, a Portuguese village named "Cabeça de Cão" (Dog's Head) is first converted to "Cabeca de Cao", because foreign keyboards have problems with the "ç" and "ã" characters. * Search word correction: I have no good idea how to do this. In principle, if you searched for a certain word like "chiken" and you misspelled it, the search system should correct you to "chicken". My problems in designing a algorithm to perform the above is the sheer volume of data. The base layer for this work is PHP and a mySQL database. So we are supposed to read large amounts of text from the DB, and look for strings in it. This can get slow pretty quick. So I guess the road to go here is pre-processing, but how? Should the system, at submit time, remove common words like "the a this what" etc..? And place the resulting mish-mash of words onto a separated table field? At submit time the author is supposed to supply tags and keywords to help identify the article. When the article is live, users can also attach their own tags to the article. I also plan on having an Ajax feature built into the Search, so that someone searching for an article using a certain set of keywords can flag a search result as being the search result that took him to what he was searching for, allowing me to further reinforce the keywords the user used in the search to find that particular article. Any ideas on this?

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by Prozak
I'm having problems thinking out the details of search technology on a site.

Some simple features:
* Find articles with this word: blue
* Find articles with these words: blue gene
* Find articles containing this exact string: "muscle group"
* Find articles containing this word in this exact case: 'nVidia'
* Find articles containing these words in their exact case: 'aTI' 'iDD2007'
* Find articles containing these words with exceptions: blue -green -'Blue'

All of those are possible in simple select statements. Depending on your database structure, all you have to do is search out on multiple fields and union different tables.

Quote:

* Tags: Articles and various site areas can be tagged by the users themselves, with keywords that facilitate the search of those site resources.

Create a "Tag" table, and make a tag-resource relationship table with a foreign key to you tag table, with a table name and with a foreignkey to that table. Now you can select all tagged data out by selecting all data from you relationship table with the tag foreignkey of your choice.

Quote:

* Automatic Conversion (AC): For example if I searched for "rocky 4" I might not find the article labeled "Rocky IV". AC is designed to translate numerals into roman or back to Arabic format. AC also tries to convert certain letter combinations. For example, a Portuguese village named "Cabeça de Cão" (Dog's Head) is first converted to "Cabeca de Cao", because foreign keyboards have problems with the "ç" and "ã" characters.

With the first one, I think your only choice is to parse it in PHP and select "rocky 4" or "rocky iv".
The second one might be doable by converting both searchstring and the data to search in to a very limited charset, using the sql function convert().

Quote:

* Search word correction: I have no good idea how to do this. In principle, if you searched for a certain word like "chiken" and you misspelled it, the search system should correct you to "chicken".

The easiest method is proberly having a table of commonly mispelled words, select out on misspell and search for both using OR.

Quote:

My problems in designing a algorithm to perform the above is the sheer volume of data. The base layer for this work is PHP and a mySQL database.

So we are supposed to read large amounts of text from the DB, and look for strings in it. This can get slow pretty quick.

Reading it out might and searching through php is a nono. Look into the LIKE and REGEX operators of mysql.

Share this post


Link to post
Share on other sites
If you know you're always going to be running on MySQL, look up FULLTEXT indexes and the MATCH function in the MySQL documentation.

Actually, here you go.

If that facility can't be bent to serve your purpose, then you're either looking at using LIKE and REGEX matches if your data sets are small enough, or you'll end up having to implement a full indexing system yourself, which is a task complex enough to fill entire books, so avoid it if you can.

John B

Share this post


Link to post
Share on other sites
Thank you both for your replies. Rating++.

Yes, it seems an easy innocent enough question, but once you look at the size of the data set where you're going to be searching, the task quickly grows in difficulty.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!