• Advertisement
Sign in to follow this  

[java] Search program

This topic is 2935 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been assigned a project to make a program that searches through documents , basically it will be like a mini search engine. What i have to do is first parse an html page (removing all tags and stuff) and save it in a txt file [DONE] now how will i search through them , for example if 2 txt file contains some same information then how will i make the program to display that data from both the files? which data structure should i use ? and what can be done to make the searching fast.

Share this post


Link to post
Share on other sites
Advertisement
Hmm, ever heard of a KWIK index?

Actually, you probably don't even need to bother with that. There are only a few hundred thousand words in the English vocabulary, apparently, of which your index will probably only see a small subset - you can easily binary search this.
So, create an alphabetical index of unique words which appear in the files, and associate each with the line+column number of each occurrence of the word. You'll need some sort of secondary storage for this.

You could also probably use some sort of hash table for this index instead - anything which can be searched with better time complexity than a linear search of the files themselves, really.

You'll then need to devise semantics for your search system, and use that combined with your index to devise the system's logic.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement