Sign in to follow this  
ahmedkl

[java] Search program

Recommended Posts

I have been assigned a project to make a program that searches through documents , basically it will be like a mini search engine. What i have to do is first parse an html page (removing all tags and stuff) and save it in a txt file [DONE] now how will i search through them , for example if 2 txt file contains some same information then how will i make the program to display that data from both the files? which data structure should i use ? and what can be done to make the searching fast.

Share this post


Link to post
Share on other sites
Hmm, ever heard of a KWIK index?

Actually, you probably don't even need to bother with that. There are only a few hundred thousand words in the English vocabulary, apparently, of which your index will probably only see a small subset - you can easily binary search this.
So, create an alphabetical index of unique words which appear in the files, and associate each with the line+column number of each occurrence of the word. You'll need some sort of secondary storage for this.

You could also probably use some sort of hash table for this index instead - anything which can be searched with better time complexity than a linear search of the files themselves, really.

You'll then need to devise semantics for your search system, and use that combined with your index to devise the system's logic.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this